## [03] Schema Diff Tests

In [0]:
from src.schema_diff import load_schema, compute_schema_diff
from pprint import pprint

#### Flat Schema Drift

This tests the schema drift detection engine on flat (non-nested) JSON schemas.

We compare:
- ```flat_base_old.json```: original schema
- ```flat_base_new.json```: new schema with:
  - ```name``` column removed
  - ```age``` column added
  - ```id``` column changed from ```IntegerType``` to ```LongType```

In [0]:
# --- Load Schemas ---
old_flat = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/flat_base_old.json")
new_flat = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/flat_base_new.json")

# --- Compute Drift ---
flat_diff = compute_schema_diff(old_flat, new_flat)

# --- Display Drift
print("Flat Schema Drift Detected:")
pprint(flat_diff)

In [0]:
# --- Assertions for Testing ---
assert flat_diff["added"] == ["age"]
assert flat_diff["removed"] == ["name"]
assert flat_diff["type_changed"] == [{"col": "id", "from": "IntegerType", "to": "LongType"}]

print("All flat schema drift tests passed.")

#### Nested Schema Drift

This test validates that the drift detection engine correctly identifies changes inside nested `struct` fields.

We compare:
- `nested_test_old.json`: original schema with:
  - `user.id` as `IntegerType`
  - `user.email` present
- `nested_test_new.json`: updated schema with:
  - `user.id` changed to `LongType`
  - `user.email` removed
  - `user.age` added

Expected drift:
- Added field: `user.age`
- Removed field: `user.email`
- Type change: `user.id` from `IntegerType` → `LongType`

In [0]:
# --- Load Nested Schemas ---
old_nested = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/nested_test_old.json")
new_nested = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/nested_test_new.json")

# --- Compute Nested Drift ---
nested_diff = compute_schema_diff(old_nested, new_nested)

# --- Display Drift
print("Nested Schema Drift Detected:")
pprint(nested_diff)

In [0]:
# --- Assertions for Testing ---
assert nested_diff["added"] == ["user.age"]
assert nested_diff["removed"] == ["user.email"]
assert nested_diff["type_changed"] == [{"col": "user.id", "from": "IntegerType", "to": "LongType"}]

print("All nested schema drift tests passed.")

#### Rename Schema Drift

This test verifies that the engine correctly identifies renamed columns using:
- `rename_test_old.json`: schema with `fullname`
- `rename_test_new.json`: schema with `name`
- `rename_hints.json`: mapping `"fullname" → "name"`

Expected output:
- `renamed`: `{"from": "fullname", "to": "name"}`
- No items in `added` or `removed`

In [0]:
from src.schema_diff import detect_renames

# --- Load Rename Test Schemas ---
old_rename = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/rename_test_old.json")
new_rename = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/rename_test_new.json")

# --- Compute Base Drift ---
rename_diff = compute_schema_diff(old_rename, new_rename)

# --- Load Rename Hints ---
rename_hints = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/rename_hints.json")

# --- Create field name → type maps (for top-level fields only)
old_fields_map = {f["name"]: f["type"] for f in old_rename["fields"]}
new_fields_map = {f["name"]: f["type"] for f in new_rename["fields"]}

# --- Run Rename Detection
detect_renames(rename_diff, old_fields_map, new_fields_map, rename_hints)

# --- Display Result
print("Rename Schema Drift Detected:")
pprint(rename_diff)

In [0]:
# --- Assertions
assert rename_diff["renamed"] == [{"from": "fullname", "to": "name"}]
assert rename_diff["added"] == []
assert rename_diff["removed"] == []

print("Rename schema drift test passed.")