## [03] Schema Diff Tests

In [0]:
from src.schema_diff import load_schema, compute_schema_diff
from pprint import pprint

#### Flat Schema Drift

This tests the schema drift detection engine on flat (non-nested) JSON schemas.

We compare:
- ```flat_base_old.json```: original schema
- ```flat_base_new.json```: new schema with:
  - ```name``` column removed
  - ```age``` column added
  - ```id``` column changed from ```IntegerType``` to ```LongType```

In [0]:
# --- Load Schemas ---
old_flat = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/flat_base_old.json")
new_flat = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/flat_base_new.json")

# --- Compute Drift ---
flat_diff = compute_schema_diff(old_flat, new_flat)

# --- Display Drift
print("Flat Schema Drift Detected:")
pprint(flat_diff)

In [0]:
# --- Assertions for Testing ---
assert flat_diff["added"] == ["age"]
assert flat_diff["removed"] == ["name"]
assert flat_diff["type_changed"] == [{"col": "id", "from": "IntegerType", "to": "LongType"}]

print("All flat schema drift tests passed.")

#### Nested Schema Drift

This test validates that the drift detection engine correctly identifies changes inside nested `struct` fields.

We compare:
- `nested_test_old.json`: original schema with:
  - `user.id` as `IntegerType`
  - `user.email` present
- `nested_test_new.json`: updated schema with:
  - `user.id` changed to `LongType`
  - `user.email` removed
  - `user.age` added

Expected drift:
- Added field: `user.age`
- Removed field: `user.email`
- Type change: `user.id` from `IntegerType` → `LongType`

In [0]:
# --- Load Nested Schemas ---
old_nested = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/nested_test_old.json")
new_nested = load_schema("/Workspace/Users/lakshya.jain@tmdc.io/schema_drift_agent/schemas/nested_test_new.json")

# --- Compute Nested Drift ---
nested_diff = compute_schema_diff(old_nested, new_nested)

# --- Display Drift
print("Nested Schema Drift Detected:")
pprint(nested_diff)

In [0]:
# --- Assertions for Testing ---
assert nested_diff["added"] == ["user.age"]
assert nested_diff["removed"] == ["user.email"]
assert nested_diff["type_changed"] == [{"col": "user.id", "from": "IntegerType", "to": "LongType"}]

print("All nested schema drift tests passed.")