<a href="https://colab.research.google.com/github/senanayake/colab-notebooks/blob/main/Graph_Based_Upgrade_Paths.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Problem Overview: Org-Wide Java Dependency Upgrade Planner

Large organizations often maintain **hundreds of Java applications** across many Bitbucket repositories. Each application declares its dependencies in one or more `pom.xml` files, and over time:

- Different teams upgrade at different speeds.
- Multiple incompatible versions of the same library coexist.
- Security vulnerabilities and breaking changes accumulate.

We want to build a **dependency intelligence and upgrade planning system** that:

1. **Discovers** all Java dependencies across the org by parsing POM files.
2. Constructs two key graph structures:
   - A **Version DAG** for each library (captures upgrade paths between versions).
   - A **Bipartite Projection Graph** connecting **applications ↔ specific library versions**.
3. **Analyzes** fragmentation, risk, and upgrade opportunities.
4. **Recommends coordinated “upgrade waves”** so multiple teams can move together.
5. **Feeds execution tools** (OpenRewrite recipes, IDE plugins, Copilot/bmad helpers) to automate migrations.

This notebook section defines the **data model** and **example behavior** using a small toy dataset of 4 applications and several Java libraries. Later sections will implement code that builds and analyzes these graphs.

---

# 2. Core Concepts

## 2.1 Version DAG (per library)

For each library (identified by `groupId:artifactId`), we define a **Version DAG**:

- **Nodes** = specific versions  
  Example: `spring-boot:2.3.4`, `spring-boot:2.5.0`, `spring-boot:2.7.18`.
- **Directed edges** = allowed upgrade steps  
  Example: `2.3.4 → 2.5.0`, `2.5.0 → 2.7.18`.
- **Optional weights** = migration difficulty.

Interpretation:

> The Version DAG models “how you can move through the version space” for a given library.

We will define DAGs for Spring Boot, Jackson, Hibernate, and Log4j in our example.

---

## 2.2 Bipartite Projection Graph (Applications ↔ Library Versions)

We also define a **bipartite graph** with two types of nodes:

- **Applications**  
  Example: `OrdersAPI`, `PaymentsService`, `InventoryService`, `ReportingJob`.

- **Library-version nodes**  
  Example: `spring-boot:2.3.4`, `jackson-databind:2.9.9`, `hibernate-core:5.4.0`.

Edges represent usage:

> `App A → Library L@Version V` means “App A depends on L version V”.

This graph helps us understand:

- Version fragmentation in the org.
- Which apps should be grouped into upgrade waves.
- Which upgrades will have the highest impact.
- Where OpenRewrite recipes will have high reuse value.

---

# 3. Example Dataset: 4 Applications

We define 4 example applications using various versions of 4 important Java libraries.

## 3.1 Current App Dependencies (simulated from POMs)

| Application        | Spring Boot        | Jackson       | Hibernate        | Logging        |
|--------------------|--------------------|---------------|------------------|----------------|
| OrdersAPI          | 2.3.4.RELEASE      | 2.9.9         | 5.3.0.Final      | log4j 1.2.17   |
| PaymentsService    | 2.3.4.RELEASE      | 2.10.1        | 5.4.0.Final      | log4j 1.2.17   |
| InventoryService   | 2.5.0              | 2.10.1        | (none)           | SLF4J only     |
| ReportingJob       | 1.5.22.RELEASE     | 2.8.11        | 5.2.0.Final      | log4j 1.2.17   |

Observations:

- Spring Boot has **three versions** in use.
- Jackson has **three versions** in use.
- Hibernate has **three versions** in use.
- Log4j v1 is used by **three apps** and is considered insecure.

These represent realistic fragmentation issues in large enterprise Java estates.

---

## 3.2 Target “Org Standard” Versions

Assume platform architects have defined these target versions:

- Spring Boot → **2.7.18**  
- Jackson → **2.13.5**  
- Hibernate → **5.6.15.Final**  
- Log4j → **2.17.2**  

The upgrade planner will attempt to standardize apps onto these versions.

---

# 4. Version DAGs for Each Library

Below are the simplified version DAGs for each library.

## 4.1 Spring Boot Version DAG

Nodes:
- `1.5.22.RELEASE`
- `2.3.4.RELEASE`
- `2.5.0`
- `2.7.18` (target)
- `3.2.0` (future)

Edges:
- `1.5.22 → 2.3.4`
- `2.3.4 → 2.5.0`
- `2.5.0 → 2.7.18`
- `2.7.18 → 3.2.0`

Visual:
1.5.22 ─▶ 2.3.4 ─▶ 2.5.0 ─▶ 2.7.18 ─▶ 3.2.0

---

## 4.2 Jackson Version DAG

Nodes:
- `2.8.11`
- `2.9.9`
- `2.10.1`
- `2.13.5` (target)

Edges:
- `2.8.11 → 2.9.9`
- `2.9.9 → 2.10.1`
- `2.10.1 → 2.13.5`

---

## 4.3 Hibernate Version DAG

Nodes:
- `5.2.0.Final`
- `5.3.0.Final`
- `5.4.0.Final`
- `5.6.15.Final` (target)

Edges:
- `5.2.0 → 5.3.0`
- `5.3.0 → 5.4.0`
- `5.4.0 → 5.6.15`

---

## 4.4 Log4j Upgrade DAG (v1 to v2)

Nodes:
- `log4j:1.2.17`
- `log4j-core:2.17.2` (target)

Edge:
- `log4j:1.2.17 → log4j-core:2.17.2`

This represents a **cross-artifact migration** suited to OpenRewrite automation.

---

# 5. Bipartite Projection Graph (Applications ↔ Library Versions)

We now connect each application to the specific versions it uses.

Example edges:

OrdersAPI → spring-boot:2.3.4
OrdersAPI → jackson-databind:2.9.9
OrdersAPI → hibernate-core:5.3.0
OrdersAPI → log4j:1.2.17

PaymentsService → spring-boot:2.3.4
PaymentsService → jackson-databind:2.10.1
PaymentsService → hibernate-core:5.4.0
PaymentsService → log4j:1.2.17

InventoryService → spring-boot:2.5.0
InventoryService → jackson-databind:2.10.1


ReportingJob → spring-boot:1.5.22
ReportingJob → jackson-databind:2.8.11
ReportingJob → hibernate-core:5.2.0
ReportingJob → log4j:1.2.17


This structure enables:

- Counting how many apps use each version.
- Detecting fragmentation.
- Grouping apps into **upgrade waves**.
- Identifying where OpenRewrite automation yields the most value.

---

# 6. Expected System Behavior on This Example

## 6.1 Detect Fragmentation & Risks

The system should detect:

- **Log4j v1** is used by 3 apps → urgent security upgrade wave.
- **Spring Boot** has 3 versions → consolidation needed.
- **Jackson** and **Hibernate** are also fragmented.

---

## 6.2 Recommend Upgrade Waves

### Wave 0: Security
- Migrate all apps using `log4j:1.2.17` → `log4j2:2.17.2`.
- Affects: `OrdersAPI`, `PaymentsService`, `ReportingJob`.

### Wave 1: Spring Boot Consolidation
Target: `2.7.18`.

Example upgrade paths:
- `ReportingJob: 1.5.22 → 2.3.4 → 2.5.0 → 2.7.18`
- `OrdersAPI: 2.3.4 → 2.5.0 → 2.7.18`
- `PaymentsService: 2.3.4 → 2.5.0 → 2.7.18`
- `InventoryService: 2.5.0 → 2.7.18`

### Wave 2: Jackson & Hibernate Alignment
Target Jackson: `2.13.5`  
Target Hibernate: `5.6.15.Final`

Apps will be grouped based on shared starting versions.

---

## 6.3 Identify Where OpenRewrite Recipes Are Worth Building

For each candidate upgrade, the system should:

- Count how many apps share the same source version.
- Estimate potential efficiency gain from a shared OpenRewrite recipe.
- Recommend recipe generation only when enough apps will benefit.

Examples:
- A single recipe for `spring-boot:2.3.4 → 2.7.18` is highly valuable (used by 2 apps).
- A recipe for `spring-boot:1.5.22 → 2.7.18` may be less valuable (used by only 1 app).

---

# 7. Next Steps in This Notebook

Next, we will:

1. Define the example data structures for DAGs and dependencies.
2. Build the Version DAGs in Python.
3. Build the bipartite Apps ↔ LibraryVersions graph.
4. Compute:
   - App counts per version
   - Fragmentation metrics
   - Simple upgrade wave suggestions
5. Prepare for later:
   - Shortest-path upgrade calculations
   - Wave clustering
   - OpenRewrite ROI heuristics


