# Graph Augmentation Agent - Interactive Exploration

This notebook demonstrates the Graph Augmentation Agent step-by-step,
running each analysis type in a separate cell so you can see exactly
what happens at each stage.

## What This Agent Does

The agent analyzes unstructured documents and suggests graph augmentations for Neo4j:

| Analysis | Description | Output |
|----------|-------------|--------|
| **Investment Themes** | Identifies emerging investment trends | Themes with market data |
| **New Entities** | Suggests new node types for the graph | Node definitions |
| **Missing Attributes** | Finds attributes not captured in schema | Property suggestions |
| **Implied Relationships** | Discovers hidden connections | Relationship types |

## Key Features

- **Native Structured Output** - Uses `ChatDatabricks.with_structured_output()` for validated Pydantic models
- **LangGraph Workflow** - StateGraph orchestration with memory persistence
- **Modular Architecture** - Clean separation of concerns in `core/` module

---

## 1. Environment Setup

First, we configure the environment and verify Databricks authentication.

In [None]:
# Setup environment and configure Databricks authentication
from lab_6_augmentation_agent.core import (
    setup_environment,
    get_model_info,
    AnalysisType,
    ANALYSIS_CONFIGS,
)

# Load .env and configure auth (clears conflicting auth methods)
env_vars = setup_environment()

print("Environment configured:")
print(f"  DATABRICKS_HOST: {env_vars['DATABRICKS_HOST'][:30]}..." if env_vars['DATABRICKS_HOST'] else "  DATABRICKS_HOST: Not set")
print(f"  DATABRICKS_TOKEN: {'*' * 10}..." if env_vars['DATABRICKS_TOKEN'] else "  DATABRICKS_TOKEN: Not set")

In [None]:
# Display model configuration
model_info = get_model_info()

print("Model Configuration:")
print(f"  Model:  {model_info['model']}")
print(f"  Method: {model_info['method']}")
print(f"  Docs:   {model_info['docs']}")

---

## 2. Available Analysis Types

Let's examine the four analysis types and their configurations.

In [None]:
# Show all available analysis types
print("Available Analysis Types:")
print("=" * 60)

for analysis_type in AnalysisType:
    config = ANALYSIS_CONFIGS[analysis_type]
    print(f"\n{config.display_name}")
    print(f"  Type: {analysis_type.value}")
    print(f"  Schema: {config.schema.__name__}")
    print(f"  Query: {config.query[:60]}...")

---

## 3. Run Individual Analyses

Now we'll run each analysis type in a separate cell. This allows you to:
- See the timing for each analysis
- Examine results individually
- Re-run specific analyses if needed

We'll use the `run_single_analysis()` utility function which provides
a simplified interface for running one analysis at a time.

In [None]:
# Import the analysis utilities
from lab_6_augmentation_agent.core import (
    run_single_analysis,
    format_analysis_result,
    display_suggestions,
    get_high_confidence_items,
)

# Store results for later comparison
results = {}

### 3.1 Investment Themes Analysis

Identifies emerging investment trends from market research documents.

In [None]:
# Run Investment Themes analysis
print("=" * 60)
results['themes'] = run_single_analysis(AnalysisType.INVESTMENT_THEMES)
print("=" * 60)

In [None]:
# Display Investment Themes results in detail
display_suggestions(results['themes'], show_evidence=True)

### 3.2 New Entities Analysis

Suggests new node types that should be added to the Neo4j graph.

In [None]:
# Run New Entities analysis
print("=" * 60)
results['entities'] = run_single_analysis(AnalysisType.NEW_ENTITIES)
print("=" * 60)

In [None]:
# Display New Entities results with examples
display_suggestions(results['entities'], show_evidence=True, show_examples=True)

### 3.3 Missing Attributes Analysis

Finds attributes mentioned in profiles but missing from the database schema.

In [None]:
# Run Missing Attributes analysis
print("=" * 60)
results['attributes'] = run_single_analysis(AnalysisType.MISSING_ATTRIBUTES)
print("=" * 60)

In [None]:
# Display Missing Attributes results
display_suggestions(results['attributes'], show_evidence=True, show_examples=True)

### 3.4 Implied Relationships Analysis

Discovers relationships that are implied but not explicitly captured in the graph.

In [None]:
# Run Implied Relationships analysis
print("=" * 60)
results['relationships'] = run_single_analysis(AnalysisType.IMPLIED_RELATIONSHIPS)
print("=" * 60)

In [None]:
# Display Implied Relationships results
display_suggestions(results['relationships'], show_evidence=True, show_examples=True)

---

## 4. Results Summary

Let's summarize all results and identify high-confidence suggestions.

In [None]:
# Summary statistics
print("Results Summary")
print("=" * 60)

total_duration = 0
total_high_conf = 0

for name, result in results.items():
    high_conf = get_high_confidence_items(result)
    total_high_conf += len(high_conf)
    total_duration += result.duration_seconds
    
    status = "SUCCESS" if result.success else "FAILED"
    print(f"\n{name.upper()}:")
    print(f"  Status: {status}")
    print(f"  Duration: {result.duration_seconds:.1f}s")
    print(f"  High confidence items: {len(high_conf)}")

print(f"\n{'=' * 60}")
print(f"Total duration: {total_duration:.1f}s")
print(f"Total high-confidence items: {total_high_conf}")

In [None]:
# Show all high-confidence suggestions
print("High-Confidence Suggestions")
print("=" * 60)

for name, result in results.items():
    high_conf = get_high_confidence_items(result)
    if high_conf:
        print(f"\n{name.upper()}:")
        for item in high_conf:
            item_name = (
                item.get('name') or 
                item.get('label') or 
                item.get('property_name') or 
                item.get('relationship_type', 'Unknown')
            )
            print(f"  - {item_name}")

---

## 5. Using the Full Agent API

For production use, you can use the `GraphAugmentationAgent` class
which provides LangGraph workflow orchestration with memory persistence.

In [None]:
from lab_6_augmentation_agent.core import GraphAugmentationAgent

# Create agent with memory persistence
agent = GraphAugmentationAgent()
print("Agent created with LangGraph workflow")
print("Memory persistence enabled via MemorySaver")

In [None]:
# Run a single analysis through the agent
# This uses the full LangGraph workflow
result = agent.run_single_analysis(
    AnalysisType.NEW_ENTITIES,
    thread_id="notebook-demo"
)

print(f"Analysis complete")
print(f"Completed analyses: {result.get('completed_analyses', [])}")

In [None]:
# Access structured response
response = agent.get_structured_response("notebook-demo")
if response:
    print(f"Total suggestions: {response.total_suggestions}")
    print(f"High confidence: {response.high_confidence_count}")

# Get specific suggestion types
nodes = agent.get_suggested_nodes("notebook-demo")
print(f"\nSuggested nodes: {len(nodes)}")
for node in nodes:
    print(f"  - :{node.label}")

---

## 6. Export Results

Export results to JSON for further processing or Neo4j import.

In [None]:
import json

# Export individual analysis results
export_data = {
    'model': model_info,
    'analyses': {}
}

for name, result in results.items():
    export_data['analyses'][name] = {
        'success': result.success,
        'duration_seconds': result.duration_seconds,
        'data': result.structured_data,
        'high_confidence_count': len(get_high_confidence_items(result))
    }

# Preview the export
print(json.dumps(export_data, indent=2, default=str)[:2000])
print("\n... (truncated)")

In [None]:
# Save to file (uncomment to save)
# with open('notebook_results.json', 'w') as f:
#     json.dump(export_data, f, indent=2, default=str)
# print("Results saved to notebook_results.json")

---

## Next Steps

After identifying augmentation opportunities:

1. **Review suggestions** - Examine the high-confidence items
2. **Update Neo4j schema** - Add new node labels and relationship types
3. **Extract new entities** - Parse documents to create new nodes
4. **Write back to Neo4j** - Use the structured output for graph updates

### Documentation

- [ChatDatabricks API](https://api-docs.databricks.com/python/databricks-ai-bridge/latest/databricks_langchain.html)
- [Databricks Structured Outputs](https://docs.databricks.com/aws/en/machine-learning/model-serving/structured-outputs)
- [LangGraph StateGraph](https://langchain-ai.github.io/langgraph/concepts/low_level/)
- [LangGraph Checkpointing](https://langchain-ai.github.io/langgraph/concepts/persistence/)