# AI Scholarship Synthesis

**Tertiary Layer: Multi-Platform Research Synthesis**

This notebook demonstrates integration of:
- **Primary Layer**: Text atomization outputs (n-grams, entropy, glyphs)
- **Secondary Layer**: AI-generated scholarship (Perplexity, Claude, GPT)
- **Tertiary Layer**: Jupyter synthesis and visualization

## Recursive Research Architecture

```
Literary Text → Atomization → JSON/Markdown
                    ↓
               AI Research (Perplexity → Claude → GPT)
                    ↓
               Jupyter Synthesis ← Pattern Analysis
                    ↓
               Iterative Refinement
```

In [None]:
import sys
from pathlib import Path
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Add src to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root / 'src'))

from atomization import TextAtomizer
from scholarship import ScholarshipIngester, ScholarshipSynthesizer, MultiPlatformOrchestrator

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## 1. Load Atomization Data (Primary Layer)

In [None]:
# Load consolidated atomization archive
atomization_archive = project_root / 'data' / 'processed' / 'atomization' / 'all_outputs.json'

if atomization_archive.exists():
    with open(atomization_archive, 'r', encoding='utf-8') as f:
        atomization_data = json.load(f)
    
    print(f"Loaded {len(atomization_data)} atomization runs")
    
    # Convert to DataFrame for analysis
    df_atomization = pd.DataFrame([
        {
            'date': item['date'],
            'title': item['metadata']['title'],
            'word_count': item['metadata']['word_count'],
            'shannon_entropy': item['entropy']['shannon_entropy'],
            'lexical_diversity': item['entropy']['lexical_diversity'],
            'compression_ratio': item['entropy']['compression_ratio']
        }
        for item in atomization_data
    ])
    
    display(df_atomization.head())
else:
    print("No atomization data found. Run 01_text_atomization_workflow.ipynb first.")
    df_atomization = None

## 2. Load AI Scholarship (Secondary Layer)

In [None]:
scholarship_dir = project_root / 'output' / 'scholarship'

# Initialize synthesizer
synthesizer = ScholarshipSynthesizer(scholarship_dir)

print(f"Loaded {len(synthesizer.research_items)} research items")

if synthesizer.research_items:
    # Convert to DataFrame
    df_scholarship = pd.DataFrame([
        {
            'source': item.get('source'),
            'title': item.get('title'),
            'timestamp': item.get('timestamp', '')[:10],
            'tags': ', '.join(item.get('tags', [])),
            'citation_count': len(item.get('citations', []))
        }
        for item in synthesizer.research_items
    ])
    
    display(df_scholarship.head())
else:
    print("\nNo AI scholarship found yet.")
    print("\nTo add scholarship, see example workflow below.")
    df_scholarship = None

## 3. Cross-Layer Pattern Analysis

In [None]:
if df_atomization is not None and len(df_atomization) > 0:
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Entropy trends
    axes[0, 0].plot(df_atomization['date'], df_atomization['shannon_entropy'], marker='o', color='steelblue')
    axes[0, 0].set_title('Shannon Entropy Over Time')
    axes[0, 0].set_xlabel('Date')
    axes[0, 0].set_ylabel('Entropy')
    axes[0, 0].tick_params(axis='x', rotation=45)
    axes[0, 0].grid(alpha=0.3)
    
    # Lexical diversity
    axes[0, 1].plot(df_atomization['date'], df_atomization['lexical_diversity'], marker='s', color='coral')
    axes[0, 1].set_title('Lexical Diversity Over Time')
    axes[0, 1].set_xlabel('Date')
    axes[0, 1].set_ylabel('Diversity Ratio')
    axes[0, 1].tick_params(axis='x', rotation=45)
    axes[0, 1].grid(alpha=0.3)
    
    # Compression ratio
    axes[1, 0].plot(df_atomization['date'], df_atomization['compression_ratio'], marker='^', color='seagreen')
    axes[1, 0].set_title('Compression Ratio Over Time')
    axes[1, 0].set_xlabel('Date')
    axes[1, 0].set_ylabel('Ratio')
    axes[1, 0].tick_params(axis='x', rotation=45)
    axes[1, 0].grid(alpha=0.3)
    
    # Word count distribution
    axes[1, 1].hist(df_atomization['word_count'], bins=10, color='mediumpurple', edgecolor='black')
    axes[1, 1].set_title('Word Count Distribution')
    axes[1, 1].set_xlabel('Word Count')
    axes[1, 1].set_ylabel('Frequency')
    axes[1, 1].grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    plt.show()
else:
    print("Insufficient atomization data for visualization")

## 4. Multi-Platform Research Comparison

In [None]:
if df_scholarship is not None and len(df_scholarship) > 0:
    # Research by platform
    source_counts = df_scholarship['source'].value_counts()
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Source distribution
    source_counts.plot(kind='bar', ax=axes[0], color='skyblue')
    axes[0].set_title('Research Items by AI Platform')
    axes[0].set_xlabel('Platform')
    axes[0].set_ylabel('Count')
    axes[0].grid(axis='y', alpha=0.3)
    
    # Citation distribution
    axes[1].hist(df_scholarship['citation_count'], bins=10, color='lightcoral', edgecolor='black')
    axes[1].set_title('Citations per Research Item')
    axes[1].set_xlabel('Number of Citations')
    axes[1].set_ylabel('Frequency')
    axes[1].grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Frequent citations
    print("\n" + "=" * 60)
    print("FREQUENTLY CITED SOURCES")
    print("=" * 60)
    frequent_cites = synthesizer.extract_citations(min_frequency=2)
    for cite in frequent_cites[:5]:
        print(f"  [{cite.get('frequency')}x] {cite.get('title', 'Untitled')}")
else:
    print("No scholarship data to visualize")

## 5. Thematic Synthesis

In [None]:
if df_scholarship is not None and len(df_scholarship) > 0:
    # Analyze by theme
    theme_analysis = synthesizer.analyze_by_theme()
    
    print("RESEARCH THEMES")
    print("=" * 60)
    for theme, items in sorted(theme_analysis.items(), key=lambda x: len(x[1]), reverse=True):
        print(f"\n{theme.upper()} ({len(items)} items)")
        for item in items[:3]:  # Show top 3 per theme
            print(f"  - {item['title']} ({item['source'].title()})")

## 6. Generate Synthesis Report

In [None]:
if df_scholarship is not None:
    report = synthesizer.generate_synthesis_report()
    print(report)
    
    # Save to output
    report_path = project_root / 'output' / 'synthesis' / 'synthesis_report.md'
    report_path.parent.mkdir(parents=True, exist_ok=True)
    with open(report_path, 'w', encoding='utf-8') as f:
        f.write(report)
    
    print(f"\nReport saved to: {report_path}")

## 7. Example: Add AI Scholarship (Secondary Layer)

Demonstration of ingesting AI research outputs

In [None]:
from scholarship import AISource

# Initialize ingester
ingester = ScholarshipIngester(scholarship_dir)

# Example: Ingest Perplexity research
perplexity_research = """
## Homeric Formulae and Oral Tradition

Recent scholarship emphasizes the role of formulaic expressions in Homeric epic composition.
Milman Parry's foundational work established that epithets like "wine-dark sea" and 
"rosy-fingered dawn" functioned as mnemonic devices for oral poets.

Contemporary computational analysis confirms high repetition rates in the Odyssey,
with n-gram studies revealing recursive patterns at multiple scales.
"""

# Ingest
path = ingester.ingest_research(
    content=perplexity_research,
    source=AISource.PERPLEXITY,
    query="Homeric formulae and computational text analysis",
    title="Formulaic Expression in Homer",
    tags=['homer', 'oral tradition', 'computational analysis', 'n-grams'],
    citations=[
        {'title': 'The Making of Homeric Verse', 'author': 'Milman Parry', 'url': 'https://example.com/parry'},
        {'title': 'Digital Humanities and Classical Studies', 'url': 'https://example.com/dh-classics'}
    ]
)

print(f"Ingested research saved to: {path}")
print("\nRun this cell to add your own AI research outputs!")

## 8. Generate Multi-Platform Workflow Template

In [None]:
# Initialize orchestrator
orchestrator = MultiPlatformOrchestrator()

# Generate workflow for literary analysis
workflow_path = project_root / 'output' / 'scholarship' / 'workflow_literary_analysis.md'

orchestrator.generate_workflow_file(
    workflow_name='literary_analysis',
    output_path=workflow_path,
    params={
        'topic': 'homecoming motifs',
        'work': 'the Odyssey',
        'comparison_works': 'Aeneid, Divine Comedy, Ulysses'
    }
)

print(f"Workflow template generated: {workflow_path}")
print("\nUse this template to structure multi-platform AI research workflows.")

## Next Steps: Recursive Iteration

1. **Daily Atomization**: Continue processing literary excerpts → Primary Layer JSON/MD
2. **AI Research Cycles**: Use workflows to orchestrate Perplexity → Claude → GPT
3. **Pattern Mapping**: Track how atomization metrics correlate with scholarly themes
4. **Recursive Refinement**: Feed synthesis insights back into atomization parameters

### Breadth-and-Depth Method
- **Breadth**: 61 days of identical *Odyssey* runs (baseline)
- **Depth**: Test pits with *Metamorphoses*, *Canterbury Tales*, *Finnegans Wake*
- **Remapping**: Synthesize cross-work patterns, adjust glyph fusion logic