# Generate Autonomous Curation Report

## Objective
This notebook demonstrates the complete, end-to-end workflow for generating a comprehensive, human-readable curation report for a scientific dataset. It combines the capabilities of multiple AI agents to transform a poorly documented file into a FAIR-compliant, fully described asset.

## The Workflow
1.  **Create a realistic test case**: A NetCDF file with minimal metadata and scattered documentation.
2.  **Initialize the Multi-Agent System**: Load the `QualityAssessmentAgent`, `DiscoveryAgent`, and `EnrichmentAgent`.
3.  **Execute the Autonomous Pipeline**:
    a.  The **Quality Agent** validates the file's integrity.
    b.  The **Discovery Agent** finds and analyzes companion documents (READMEs, scripts, citations).
    c.  The **Enrichment Agent** decodes variables, infers the scientific domain, and adds context.
4.  **Generate the Curation Report**: Collate all the information gathered by the agents into a single, detailed markdown report.

In [1]:
# Setup: Install dependencies and add library to path
!pip install -q netCDF4 h5py requests
import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd().parent / 'lib'))

In [2]:
# Import all necessary components
from ollama_client import OllamaClient
from quality_agent import QualityAssessmentAgent
from discovery_agent import DiscoveryAgent
from enrichment_agent import MetadataEnrichmentAgent
from create_demo_dataset import create_mystery_climate_dataset
from llm_enricher import DataInspector
from companion_extractor import CompanionDocExtractor

## Step 1: Create the Test Dataset
We will start by creating the 'mystery climate data' set, which is designed to mimic a real-world HPC output with poor metadata.

In [3]:
mystery_file = create_mystery_climate_dataset()

Creating mystery dataset: mystery_climate_data.nc
  (Intentionally minimal metadata for demo)
  ✓ Created NetCDF file: 64173.1 KB
  ✓ Variables: t2m, sst, pr, wspd (cryptic names!)
  ✓ Dimensions: time=365, lat=90, lon=180

  Creating companion documentation...
    ✓ README_climate_2023.md
    ✓ process_cmip6_ensemble.py
    ✓ CITATION.bib
    ✓ METADATA.txt

  ✓ Companion documentation created
    (README, script, citation, metadata)


## Step 2: Initialize the Multi-Agent System

In [4]:
try:
    ollama = OllamaClient()
    quality_agent = QualityAssessmentAgent(ollama)
    discovery_agent = DiscoveryAgent(ollama)
    enrichment_agent = MetadataEnrichmentAgent(ollama)
    print("✓ AI Agents initialized and ready.")
except Exception as e:
    print(f"✗ Failed to initialize agents: {e}")
    print("  Please ensure Ollama is running ('ollama serve')")

✓ Connected to Ollama at http://localhost:11434
  Available models: llama3.2:3b
  [QualityAgent] Registered tool: check_signature
  [QualityAgent] Registered tool: get_file_info
  [QualityAgent] Registered tool: inspect_content
  [EnrichmentAgent] Registered tool: get_structure
  [EnrichmentAgent] Registered tool: analyze_variable
  [EnrichmentAgent] Registered tool: domain_knowledge_lookup
✓ AI Agents initialized and ready.


## Step 3: Run the Autonomous Curation Pipeline

In [None]:
print("--- Running Curation Pipeline ---\n")

print("1. Quality Assessment...")
quality_result = quality_agent.assess_file(str(mystery_file))
print(f"✓ Quality Assessment Passed (Confidence: {quality_result.confidence:.2f})\n")

print("2. Companion Discovery...")
discovery_result = discovery_agent.discover_companions(str(mystery_file))
print(f"✓ Companion Discovery Complete (Found {len(discovery_result['relevant_companions'])} relevant documents)\n")

print("3. Metadata Enrichment...")
enrichment_result = enrichment_agent.enrich_file(str(mystery_file))
print(f"✓ Metadata Enrichment Complete (Confidence: {enrichment_result['confidence']:.2f})\n")

print("--- Pipeline Complete ---")

--- Running Curation Pipeline ---

1. Quality Assessment...

[QualityAgent] Starting analysis...

[QualityAgent] Step 1: Thinking...
[QualityAgent] Using tool: get_file_info
  Parameters: {'filepath': 'generated/sample_data/mystery_climate_data.nc'}
  Result: {'filename': 'mystery_climate_data.nc', 'extension': '.nc', 'size_bytes': 65713293, 'size_mb': 62.67}

[QualityAgent] Step 2: Thinking...
[QualityAgent] Using tool: check_signature
  Parameters: {'filepath': 'generated/sample_data/mystery_climate_data.nc'}
  Result: {'expected_type': 'netcdf', 'detected_type': 'netcdf', 'is_valid': True, 'issues': [], 'size': '62.67 MB'}

[QualityAgent] Step 3: Thinking...

[QualityAgent] Decision reached!
  Decision: MANUAL_REVIEW
  Confidence: 0.80
✓ Quality Assessment Passed (Confidence: 0.80)

2. Companion Discovery...

[SimpleDiscoveryAgent] Analyzing: mystery_climate_data.nc

Step 1: Finding candidate documents...
Found 7 candidate documents:
  - CITATION.bib
  - analyze_temperature.py
  - p

## Step 4: Generate the Curation Report

Now, we will collate all the information gathered by the agents into a single, comprehensive markdown report.

In [None]:
from IPython.display import display, Markdown
from report_generator import LLMReportGenerator

# Initialize the report generator with the ollama client
llm_report_generator = LLMReportGenerator(ollama)

# Generate the report
report_md = llm_report_generator.generate_report(
    mystery_file, 
    quality_result, 
    discovery_result, 
    enrichment_result
)

# Display the report
display(Markdown(report_md))