# Day 80: Data Provenance Tracker

In scientific research, 'how' a result was reached is as important as the result itself. Reproducibility requires a clear audit trail of every modification made to the raw data.

In this lab, we implement a **Provenance Tracker** to:
1. **Register Raw Artifacts**: Capturing initial state with cryptographic checksums.
2. **Log Transformations**: Recording every operation, its parameters, and the resulting change.
3. **Reconstruct Lineage**: Building a causal graph showing the path from raw data to the final graph or table.

In [None]:
import sys
import os

# Add root directory to sys.path
sys.path.append(os.path.abspath('../../'))

from src.assurance.provenance import ProvenanceTracker

## 1. Initializing the Study data

We start with a raw dataset of experimental readings.

In [None]:
tracker = ProvenanceTracker()

raw_data = [1.2, 1.5, 0.9, 1.3, 2.1, 0.8]
raw_id = tracker.register_raw_data("sensor_beta_readings", raw_data)

print(f"Raw Data Registered: {raw_id}")
print(f"Checksum: {tracker.artifacts[raw_id].checksum[:16]}...")

## 2. Applying Transformations

We define two functions: one to scale the data and another to filter outliers.

In [None]:
def scale_data(data, multiplier):
    return [x * multiplier for x in data]

def filter_outliers(data, threshold):
    return [x for x in data if x < threshold]

# Apply scaling
scaled_id = tracker.apply_transform(
    op_name="normalization", 
    input_ids=[raw_id], 
    transform_fn=scale_data, 
    params={"multiplier": 10}
)

# Apply filtering
final_id = tracker.apply_transform(
    op_name="outlier_removal", 
    input_ids=[scaled_id], 
    transform_fn=filter_outliers, 
    params={"threshold": 15}
)

print(f"Final Dataset ID: {final_id}")
print(f"Final Results: {tracker.artifacts[final_id].content}")

## 3. Auditing the Lineage

Now we reconstruct the 'story' of the final dataset.

In [None]:
graph = tracker.get_lineage_graph(final_id)

print("Lineage Trail (Chronological):")
for i, step in enumerate(graph):
    if step['type'] == 'raw':
        print(f"{i}. [RAW DATA] ID: {step['id']}")
    else:
        print(f"{i}. [TRANSFORM] Op: {step['op']} with {step['params']}")

--- 
## ðŸ”¬ Scientific Block Complete!

You have successfully navigated **Block 1 of Phase 4**. You now have tools to verify claims, audit protocols, validate designs, summarize literature safely, and track data lineage.

Next, we move to **Healthcare AI Safety (Days 81-85)**.