# KATO Hierarchical Training - Single-Pass Learning

This notebook demonstrates single-pass hierarchical training with KATO nodes.

**Key Concepts:**
- node0 learns sentences (tokenized)
- node1 learns paragraphs (sequences of sentence pattern names)
- node2 learns chapters (sequences of paragraph pattern names)
- node3 learns books (sequences of chapter pattern names)

**Flow:** Text ‚Üí Segment ‚Üí node0 learns ‚Üí pattern_name ‚Üí node1 STM ‚Üí node1 learns ‚Üí pattern_name ‚Üí node2 STM ‚Üí etc.

## 1. Setup and Imports

In [None]:
# Install required packages
!pip install -q datasets transformers requests numpy matplotlib tqdm pymongo

In [None]:
# Import the hierarchical training module from tools
from tools import (
    HierarchicalConceptLearner,
    HierarchicalNode,  # NEW: Per-node configuration
    CorpusSegmenter,
    MongoDBAnalyzer,
    StandaloneMongoDBAnalyzer,  # NEW: Session-independent analysis
    TrainingManifest,  # NEW: Training metadata management
    load_latest_manifest,  # NEW: Load saved training manifests
    list_all_training_runs,  # NEW: Discover all training runs
    create_training_run_nodes,  # NEW: Create unique run IDs
    delete_training_run,  # NEW: Cleanup old experiments
    HardwareAnalyzer,
    StreamingDatasetLoader,
    recommend_dataset_configuration,
    train_hierarchical_single_pass,
    train_from_streaming_dataset,
    train_from_streaming_dataset_parallel,  # NEW: Parallel training (Phase 3)
    cleanup_all_nodes,
    analyze_all_nodes,
    transfer_threshold,
    transfer_top_n,
    transfer_weighted,
    transfer_predictions,
)

import matplotlib.pyplot as plt
%matplotlib inline

print("‚úì Imports complete")
print("‚úì Session-independent analysis tools loaded")
print("‚úì Training run comparison tools loaded")

## 2. Parallel Training

**NEW! Phase 3 Optimization**: Train with parallel workers for additional 2-3x speedup on top of batching.

**Combined Speedup (Phases 1+2+3)**: 15-28x faster than baseline!

**How it works**:
- Multiple workers process samples concurrently
- Each worker gets its own isolated KATO session
- No lock contention or race conditions
- MongoDB handles concurrent writes safely

**Recommended**: Use 4-8 workers depending on your CPU cores.

In [None]:
# Combines batching + parallel workers for maximum speed

# Configure dataset
DATASET_KEY = 'wikitext'  # Options: 'c4', 'refinedweb', 'wikitext', 'openwebtext'
MAX_SAMPLES = 100  # Start with small number to test

# Create learner with optimized configuration (Phase 1 batching enabled)
# RECOMMENDED: Use chunk_size=8 for exponential semantic scaling
nodes = [
    HierarchicalNode('node0', chunk_size=8, mode='chunking', base_url='http://kato:8000'),   # 8 tokens (phrases)
    HierarchicalNode('node1', chunk_size=8, mode='chunking', base_url='http://kato:8000'),   # 64 tokens (sentences)
    HierarchicalNode('node2', chunk_size=8, mode='chunking', base_url='http://kato:8000'),   # 512 tokens (paragraphs)
    HierarchicalNode('node3', chunk_size=8, mode='chunking', base_url='http://kato:8000')    # 4,096 tokens (articles)
]

learner = HierarchicalConceptLearner(
    nodes=nodes,
    tokenizer_name='gpt2',
    node0_batch_size=50  # Phase 1: Batching for 4-7x speedup
)

print(f"‚úì Created hierarchical learner with {learner.num_nodes} nodes")
print(f"  Chunk size: {learner.node_configs[0].chunk_size} (optimized for WikiText)")
print(f"  Node0 batch size: {learner.node0_batch_size} (batching ENABLED)")
print(f"\n  Semantic coverage:")
coverage = learner.node_configs[0].chunk_size
for i in range(learner.num_nodes):
    print(f"    node{i}: {coverage} tokens")
    coverage *= learner.node_configs[0].chunk_size
print(f"  Training with parallel workers...\n")

# Phase 3: Train with parallel workers for additional 2-3x speedup
stats = train_from_streaming_dataset_parallel(
    dataset_key=DATASET_KEY,
    max_samples=MAX_SAMPLES,
    learner=learner,
    num_levels=4,  # Match number of nodes
    num_workers=4,  # Recommended: 4-8 workers (adjust based on CPU cores)
    segment_method='simple',
    verbose=True
)

print("\n‚úì Parallel training complete!")
print(f"\nPerformance Statistics:")
print(f"  Samples processed: {stats['samples_processed']:,}")
print(f"  Total time: {stats['total_time_seconds']:.2f}s")
print(f"  Rate: {stats['rate_samples_per_sec']:.2f} samples/sec")
print(f"  Workers: {stats.get('num_workers', 'N/A')}")

print(f"\nPattern Statistics:")
for key, value in stats.items():
    if 'patterns' in key.lower():
        print(f"  {key}: {value:,}")


## 2a. Session-Independent Analysis & Training Run Comparison üíæ

**NEW! Your training data persists in MongoDB!**

After training completes, a **training manifest** is automatically saved to `manifests/` containing metadata about your training run. This enables:

### ‚úÖ Session-Independent Analysis
Analyze your trained model **without active sessions** - even after kernel restarts!

```python
from tools import load_latest_manifest

# Load the most recent training
manifest = load_latest_manifest()
analyzers = manifest.get_analyzers(mongo_uri="mongodb://localhost:27017/")

# Analyze patterns (no active learner needed!)
for node_name, analyzer in analyzers.items():
    stats = analyzer.get_stats()
    print(f"{node_name}: {stats['total_patterns']:,} patterns")
```

**See [`analysis_only_template.ipynb`](analysis_only_template.ipynb) for complete session-independent analysis workflow!**

### üî¨ Comparing Multiple Training Runs
By default, training with the same node IDs **overwrites** previous data. To preserve and compare multiple experiments:

```python
from tools import create_training_run_nodes, HierarchicalConceptLearner

# Experiment 1: Small dataset
nodes_100 = create_training_run_nodes(run_id='wikitext_100samples')
learner_100 = HierarchicalConceptLearner(nodes=nodes_100, tokenizer_name='gpt2')
# Train... creates: node0_wikitext_100samples_kato, node1_wikitext_100samples_kato, etc.

# Experiment 2: Larger dataset (separate databases!)
nodes_500 = create_training_run_nodes(run_id='wikitext_500samples')
learner_500 = HierarchicalConceptLearner(nodes=nodes_500, tokenizer_name='gpt2')
# Train... creates: node0_wikitext_500samples_kato, node1_wikitext_500samples_kato, etc.

# Compare both runs later
from tools import list_all_training_runs
runs = list_all_training_runs(mongo_uri="mongodb://localhost:27017/")
# Returns: {'wikitext_100samples': [...], 'wikitext_500samples': [...]}
```

**See [`TRAINING_RUN_COMPARISON.md`](TRAINING_RUN_COMPARISON.md) for complete guide!**

### üìä Benefits
- ‚úÖ Analyze after kernel restarts
- ‚úÖ Debug analysis code without retraining
- ‚úÖ Compare multiple experiments side-by-side
- ‚úÖ Work with historical training data
- ‚úÖ No need to keep training sessions active

## 3. Visualize Frequency Distribution

Create histograms to visualize pattern frequency distributions of the learned patterns in each node's knowledge base.

In [None]:
# Visualize node0 (sentence patterns)
print("Frequency distribution for node0 (sentence patterns):")
analyzer0 = MongoDBAnalyzer(learner.nodes['node0'])
analyzer0.visualize_frequency_distribution(max_freq=10)
analyzer0.close()

In [None]:
# Visualize node1 (paragraph patterns)
print("Frequency distribution for node1 (paragraph patterns):")
analyzer1 = MongoDBAnalyzer(learner.nodes['node1'])
analyzer1.visualize_frequency_distribution(max_freq=10)
analyzer1.close()

## 4. Get Detailed Frequency Histograms

View exact frequency counts for each node.

In [None]:
# Get histograms for all nodes
print(f"\n{'='*60}")
print("FREQUENCY HISTOGRAMS")
print(f"{'='*60}\n")

for node_name in ['node0', 'node1', 'node2', 'node3']:
    analyzer = MongoDBAnalyzer(learner.nodes[node_name])
    histogram = analyzer.get_frequency_histogram()
    analyzer.close()
    
    if histogram:
        print(f"{node_name}:")
        for freq in sorted(histogram.keys())[:10]:  # Show first 10 frequencies
            print(f"  Frequency {freq}: {histogram[freq]} patterns")
        if len(histogram) > 10:
            print(f"  ... ({len(histogram) - 10} more frequency levels)")
    else:
        print(f"{node_name}: No patterns")
    print()

## 5. (OPTIONAL) Cleanup Low-Frequency Patterns

Remove patterns that appear less than a threshold (e.g., frequency < 2).

In [None]:
# # Cleanup patterns with frequency < 2
# print("Cleaning up patterns with frequency < 2...\n")

# deleted = cleanup_all_nodes(learner, threshold=2, verbose=True)

# print(f"\nTotal patterns deleted across all nodes: {sum(deleted.values())}")

## 6. Hierarchical text generation 

Unravel from node1 (paragraph) ‚Üí node0 (sentences) ‚Üí text

In [None]:
# Hierarchical text generation: Unravel from node1 (paragraph) ‚Üí node0 (sentences) ‚Üí text
print(f"{'='*60}")
print("HIERARCHICAL GENERATION FROM NODE1 (PARAGRAPH LEVEL)")
print(f"{'='*60}\n")

# Sample a paragraph pattern from node1
paragraph_patterns = sample_pattern_by_frequency(learner.nodes['node1'], num_samples=1)

if paragraph_patterns:
    para_pattern = paragraph_patterns[0]
    print(f"Sampled paragraph pattern: {para_pattern['name'][:50]}...")
    print(f"Contains {len(para_pattern['pattern_data'])} events (sentence pattern names)\n")
    
    # Extract sentence pattern names from the paragraph
    sentence_pattern_names = []
    for event in para_pattern['pattern_data']:
        sentence_pattern_names.extend(event)
    
    print(f"Unraveling {len(sentence_pattern_names)} sentence patterns:\n")
    
    # For each sentence pattern name, retrieve it from node0 and decode
    analyzer0 = MongoDBAnalyzer(learner.nodes['node0'])
    
    generated_paragraph = []
    for sent_pattern_name in sentence_pattern_names:
        # Retrieve the sentence pattern from node0's knowledge base
        sent_pattern = analyzer0.patterns_collection.find_one(
            {'name': sent_pattern_name},
            {'pattern_data': 1, '_id': 0}
        )
        
        if sent_pattern:
            # Extract tokens
            tokens = []
            for event in sent_pattern['pattern_data']:
                tokens.extend(event)
            tokens = [t for t in tokens if t != '<EOS>']
            
            # Decode tokens to text
            try:
                token_ids = tokenizer.convert_tokens_to_ids(tokens)
                text = tokenizer.decode(token_ids, skip_special_tokens=True)
                generated_paragraph.append(text)
                print(f"  ‚Ä¢ {text}")
            except:
                generated_paragraph.append(' '.join(tokens))
                print(f"  ‚Ä¢ {' '.join(tokens)}")
    
    analyzer0.close()
    
    print(f"\n{'='*60}")
    print("GENERATED PARAGRAPH (combined):")
    print(f"{'='*60}")
    print(' '.join(generated_paragraph))
    print(f"{'='*60}")
    
else:
    print("No paragraph patterns available. Train with more data to enable hierarchical generation.")

In [None]:
# Example: Get predictions from node0 (requires predictions to be enabled)
# Uncomment to test if predictions are enabled:

# # Observe some tokens to populate STM
# test_tokens = learner.token_processor.tokenize_segment("Machine learning is powerful")
# for token in test_tokens:
#     learner.nodes['node0'].observe(strings=[token])
# 
# # Get predictions
# predictions = learner.nodes['node0'].get_predictions()
# 
# if predictions:
#     print(f"‚úì Retrieved {len(predictions)} predictions")
#     print("\nTop prediction:")
#     print(f"  Pattern: {predictions[0]['name'][:40]}...")
#     print(f"  Potential: {predictions[0].get('potential', 0):.3f}")
#     print(f"  Confidence: {predictions[0].get('confidence', 0):.3f}")
# else:
#     print("No predictions available. Predictions may not be enabled on this node.")

print("Prediction testing code ready (commented out by default)")

In [None]:
# Example custom modeling function
def my_custom_filter(predictions, field='name'):
    """
    Custom modeling function: Filter predictions by potential and return top 3.
    
    Args:
        predictions: List of prediction dicts
        field: Field to extract (default: 'name')
    
    Returns:
        List of filtered pattern names
    """
    # Filter by minimum potential threshold
    filtered = [p for p in predictions if p.get('potential', 0) >= 0.3]
    
    # Sort by confidence
    filtered.sort(key=lambda p: p.get('confidence', 0), reverse=True)
    
    # Return top 3
    return [p[field] for p in filtered[:3]]

print("‚úì Custom modeling function defined")
print("\nExample usage with transfer_predictions:")
print("transfer_predictions(")
print("    node_source=learner.nodes['node0'],")
print("    node_target=learner.nodes['node1'],")
print("    field='name',")
print("    modeling_function=my_custom_filter")
print(")")

In [None]:
# Demonstrate built-in modeling functions
print("Built-in Modeling Functions:\n")

print("1. transfer_threshold:")
print("   - Filter predictions by metric threshold")
print("   - Example: transfer_threshold(predictions, metric='potential', threshold=0.4)")

print("\n2. transfer_top_n:")
print("   - Return top N predictions sorted by metric")
print("   - Example: transfer_top_n(predictions, n=5, sort_by='confidence')")

print("\n3. transfer_weighted:")
print("   - Weight patterns by metric with repetition")
print("   - Example: transfer_weighted(predictions, weight_by='confidence', max_repeats=5)")

print("\n4. transfer_all_names:")
print("   - Pass through all predictions without filtering")
print("   - Example: transfer_all_names(predictions)")

print("\n‚úì All functions are available in the imported tools module")
print("\nTo use with transfer_predictions:")
print("result = transfer_predictions(")
print("    node_source=learner.nodes['node0'],")
print("    node_target=learner.nodes['node1'],")
print("    field='name',")
print("    modeling_function=transfer_top_n,  # or any other function")
print("    num_predictions=10")
print(")")

In [None]:
# Summary of what we accomplished
print(f"{'='*80}")
print("HIERARCHICAL TRAINING SESSION SUMMARY")
print(f"{'='*80}\n")

print("‚úì What We Did:")
print("  1. Created 4-level hierarchical learner (node0 ‚Üí node1 ‚Üí node2 ‚Üí node3)")
print("  2. Segmented text into hierarchical structure (books ‚Üí chapters ‚Üí paragraphs ‚Üí sentences)")
print("  3. Trained single-pass with pattern names flowing up the hierarchy")
print("  4. Analyzed frequency distributions at each level")
print("  5. Cleaned up low-frequency patterns (noise removal)")
print("  6. Generated new text by sampling and unraveling learned patterns ‚≠ê")

print("\n‚úì Key Insights:")
print("  ‚Ä¢ Pattern frequencies follow Zipfian distribution (natural language)")
print("  ‚Ä¢ Higher levels have fewer but more abstract patterns")
print("  ‚Ä¢ Cleanup improves pattern quality by removing rare/noisy patterns")
print("  ‚Ä¢ Text generation works by sampling + unraveling + decoding")

print("\n‚úì Pattern Statistics:")
all_stats_final = analyze_all_nodes(learner)
for node_name in ['node0', 'node1', 'node2', 'node3']:
    stats = all_stats_final[node_name]
    print(f"  {node_name}: {stats['total_patterns']:,} patterns "
          f"(avg freq: {stats['average_frequency']:.2f})")

print(f"\n{'='*80}")
print("Session Complete! You now have a trained hierarchical learning system.")
print(f"{'='*80}")

**Note:** Prediction testing requires enabling predictions on nodes. This is optional and primarily for advanced research use cases involving prediction-based transfer learning.

In [None]:
# Generate text by sampling from node0 (sentence patterns)
print(f"{'='*60}")
print("GENERATING SENTENCES FROM NODE0")
print(f"{'='*60}\n")

# Sample 5 sentence patterns from node0
sampled_sentences = sample_pattern_by_frequency(learner.nodes['node0'], num_samples=5)

# Decode each pattern back to text
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(learner.token_processor.tokenizer_name)

print("Generated Sentences:\n")
for i, pattern in enumerate(sampled_sentences, 1):
    # Extract tokens from pattern_data
    # Pattern data is a list of events, each event is a list of symbols
    tokens = []
    for event in pattern['pattern_data']:
        tokens.extend(event)
    
    # Remove special tokens like <EOS>
    tokens = [t for t in tokens if t != '<EOS>']
    
    # Convert token strings to IDs (if needed) and decode
    try:
        # Try to decode directly if tokens are proper tokenizer tokens
        token_ids = tokenizer.convert_tokens_to_ids(tokens)
        text = tokenizer.decode(token_ids, skip_special_tokens=True)
        print(f"{i}. {text}")
    except:
        # Fallback: just print the tokens
        print(f"{i}. {' '.join(tokens)}")

print(f"\n‚úì Generated {len(sampled_sentences)} sentences from learned patterns")

In [None]:
# Helper function to sample patterns from a node based on frequency
def sample_pattern_by_frequency(node, num_samples=1):
    """
    Sample pattern(s) from a node's knowledge base using frequency as probability.
    Higher frequency patterns are more likely to be sampled.
    
    Args:
        node: KATO node to sample from
        num_samples: Number of patterns to sample
    
    Returns:
        List of sampled pattern dictionaries with 'name' and 'pattern_data'
    """
    analyzer = MongoDBAnalyzer(node)
    
    # Get all patterns with their frequencies
    patterns = list(analyzer.patterns_collection.find(
        {},
        {'name': 1, 'pattern_data': 1, 'frequency': 1, '_id': 0}
    ))
    analyzer.close()
    
    if not patterns:
        print(f"No patterns found in {node.node_id}")
        return []
    
    # Use frequency as sampling weight
    import numpy as np
    frequencies = np.array([p['frequency'] for p in patterns])
    probabilities = frequencies / frequencies.sum()
    
    # Sample indices
    sampled_indices = np.random.choice(len(patterns), size=num_samples, p=probabilities, replace=False)
    
    return [patterns[i] for i in sampled_indices]

print("‚úì Helper functions loaded")

## 10. Text Generation from Learned Patterns ‚≠ê

**This is the PRIMARY use case of hierarchical learning!**

Now that we've learned patterns at multiple levels, we can generate new text by:
1. **Sampling** from learned patterns at any level using frequency statistics
2. **Unraveling** patterns hierarchically (top-down: book ‚Üí chapter ‚Üí paragraph ‚Üí sentence)
3. **Decoding** tokens back to human-readable text

This section demonstrates:
- Sampling from node0 (sentence level)
- Unraveling patterns from higher nodes
- Generating coherent text at multiple scales

In [None]:
## 14. Next Steps

### üÜï NEW! Session-Independent Analysis & Comparison
**Training data persists in MongoDB!** You can now:

1. **Analyze after kernel restarts** - See [`analysis_only_template.ipynb`](analysis_only_template.ipynb)
2. **Compare multiple training runs** - See [`TRAINING_RUN_COMPARISON.md`](TRAINING_RUN_COMPARISON.md)
3. **Load saved training manifests**:
   ```python
   from tools import load_latest_manifest
   manifest = load_latest_manifest()
   analyzers = manifest.get_analyzers()
   ```

### Experimentation Ideas:
1. **Train on your own text data**
2. **Try different tokenizers** (BERT, RoBERTa, T5, LLaMA)
3. **Experiment with different cleanup thresholds**
4. **Compare experiments** using `create_training_run_nodes(run_id='...')`
5. **Use deeper hierarchies** (5, 10+ nodes)
6. **Generate longer text sequences** (paragraphs, chapters, books)
7. **Test prediction-based transfers** with `transfer_predictions()`
8. **Analyze pattern content** using `StandaloneMongoDBAnalyzer`

### Large-Scale Training (Memory-Safe):
For datasets >10K samples, use the streaming training approach to avoid memory crashes:
```python
stats = train_from_streaming_dataset(
    dataset_key='wikitext',
    max_samples=1500000,
    learner=learner,
    num_levels=4,
    checkpoint_interval=10000,
    resume_from_checkpoint=False  # Set True to resume after crash
)
```

**Benefits of Streaming Training:**
- ‚úÖ Constant memory usage (~1-10 MB per sample)
- ‚úÖ Automatic checkpointing every N samples
- ‚úÖ Resume from checkpoint after interruption
- ‚úÖ Can handle unlimited dataset sizes (millions/billions)
- ‚úÖ No OOM crashes!
- ‚úÖ **Auto-saves training manifests** for session-independent analysis

### Training Run Comparison:
Create comparable experiments with unique run IDs:
```python
# Experiment A: Baseline
nodes_baseline = create_training_run_nodes(run_id='baseline_8chunks')
# Creates: node0_baseline_8chunks_kato, node1_baseline_8chunks_kato, etc.

# Experiment B: Different chunk size
nodes_experiment = create_training_run_nodes(run_id='experiment_15chunks', chunk_size=15)
# Creates: node0_experiment_15chunks_kato, node1_experiment_15chunks_kato, etc.

# Later, compare both runs
runs = list_all_training_runs()
# Returns: {'baseline_8chunks': [...], 'experiment_15chunks': [...]}
```

### Training on Different Text Types:
- Use `segment_method='simple'` for generic text (default)
- Use `segment_method='article'` for article-like text with sections
- Use `segment_method='book'` for book-like text with chapters

### Supported Tokenizers:
- GPT-2: `"gpt2"` (default, BPE with ƒ† space markers)
- BERT: `"bert-base-uncased"`, `"bert-base-cased"` (WordPiece with ## continuation)
- RoBERTa: `"roberta-base"` (byte-level BPE)
- T5: `"t5-small"`, `"t5-base"`, `"t5-large"` (SentencePiece)
- Others: ALBERT, DistilBERT, XLNet, ELECTRA, DeBERTa, BART, Phi-2, LLaMA-2

### Advanced Usage:
```python
# Use transfer_predictions for prediction-based transfers
# (After training and enabling predictions on source node)
result = transfer_predictions(
    node_source=learner.nodes['node0'],
    node_target=learner.nodes['node1'],
    field='name',
    modeling_function=my_threshold_model,
    num_predictions=10
)
```

### Text Generation at Different Scales:
- Generate from node0 ‚Üí sentences (token sequences)
- Generate from node1 ‚Üí paragraphs (unravel to sentences)
- Generate from node2 ‚Üí chapters (unravel to paragraphs ‚Üí sentences)
- Generate from node3 ‚Üí books (full hierarchical unraveling)

### Granularity Emerges from Hierarchy:
- node0: TOKEN-level patterns (from tokenized sentences)
- node1: SENTENCE-level patterns (from token pattern sequences)
- node2: PARAGRAPH-level patterns (from sentence pattern sequences)
- node3: CHAPTER-level patterns (from paragraph pattern sequences)

### üìö Related Documentation:
- **Session-Independent Analysis**: [`analysis_only_template.ipynb`](analysis_only_template.ipynb)
- **Training Run Comparison**: [`TRAINING_RUN_COMPARISON.md`](TRAINING_RUN_COMPARISON.md)
- **Project Overview**: [`PROJECT_OVERVIEW.md`](PROJECT_OVERVIEW.md)
- **Main README**: [`README.md`](README.md)

## 14. Next Steps

**Experimentation Ideas:**
1. Train on your own text data
2. Try different tokenizers (BERT, RoBERTa, T5, LLaMA)
3. Experiment with different cleanup thresholds
4. Create more sophisticated modeling functions
5. Use deeper hierarchies (5, 10+ nodes)
6. Generate longer text sequences (paragraphs, chapters, books)
7. Test prediction-based transfers with `transfer_predictions()`
8. Analyze pattern content using `MongoDBAnalyzer.get_patterns_by_frequency()`

**Large-Scale Training (NEW! Memory-Safe):**
For datasets >10K samples, use the streaming training approach to avoid memory crashes:
```python
stats = train_from_streaming_dataset(
    dataset_key='wikitext',
    max_samples=1500000,
    learner=learner,
    num_levels=4,
    checkpoint_interval=10000,
    resume_from_checkpoint=False  # Set True to resume after crash
)
```

**Benefits of Streaming Training:**
- ‚úÖ Constant memory usage (~1-10 MB per sample)
- ‚úÖ Automatic checkpointing every N samples
- ‚úÖ Resume from checkpoint after interruption
- ‚úÖ Can handle unlimited dataset sizes (millions/billions)
- ‚úÖ No OOM crashes!

**Training on Different Text Types:**
- Use `segment_method='simple'` for generic text (default)
- Use `segment_method='article'` for article-like text with sections
- Use `segment_method='book'` for book-like text with chapters

**Supported Tokenizers:**
- GPT-2: `"gpt2"` (default, BPE with ƒ† space markers)
- BERT: `"bert-base-uncased"`, `"bert-base-cased"` (WordPiece with ## continuation)
- RoBERTa: `"roberta-base"` (byte-level BPE)
- T5: `"t5-small"`, `"t5-base"`, `"t5-large"` (SentencePiece)
- Others: ALBERT, DistilBERT, XLNet, ELECTRA, DeBERTa, BART, Phi-2, LLaMA-2

**Advanced Usage:**
```python
# Use transfer_predictions for prediction-based transfers
# (After training and enabling predictions on source node)
result = transfer_predictions(
    node_source=learner.nodes['node0'],
    node_target=learner.nodes['node1'],
    field='name',
    modeling_function=my_threshold_model,
    num_predictions=10
)
```

**Text Generation at Different Scales:**
- Generate from node0 ‚Üí sentences (token sequences)
- Generate from node1 ‚Üí paragraphs (unravel to sentences)
- Generate from node2 ‚Üí chapters (unravel to paragraphs ‚Üí sentences)
- Generate from node3 ‚Üí books (full hierarchical unraveling)

**Granularity Emerges from Hierarchy:**
- node0: TOKEN-level patterns (from tokenized sentences)
- node1: SENTENCE-level patterns (from token pattern sequences)
- node2: PARAGRAPH-level patterns (from sentence pattern sequences)
- node3: CHAPTER-level patterns (from paragraph pattern sequences)

## 9. Text Generation from Learned Patterns ‚≠ê

**This is the PRIMARY use case of hierarchical learning!**

Now that we've learned patterns at multiple levels, we can generate new text by:
1. **Sampling** from learned patterns at any level using frequency statistics
2. **Unraveling** patterns hierarchically (top-down: book ‚Üí chapter ‚Üí paragraph ‚Üí sentence)
3. **Decoding** tokens back to human-readable text

This section demonstrates:
- Sampling from node0 (sentence level)
- Unraveling patterns from higher nodes
- Generating coherent text at multiple scales

## 11. Prediction Testing (Optional/Advanced)

**Note:** Prediction testing requires enabling predictions on nodes. This is optional and primarily for advanced research use cases involving prediction-based transfer learning.

## 12. Custom Modeling Functions (Optional/Advanced)

Create custom functions to model prediction ensembles for transfer between nodes.

## 13. Summary

Review what we've accomplished in this notebook.

## 14. Next Steps

**Experimentation Ideas:**
1. Train on your own text data
2. Try different tokenizers (BERT, RoBERTa, LLaMA)
3. Experiment with different cleanup thresholds
4. Create more sophisticated modeling functions
5. Use deeper hierarchies (5, 10+ nodes)
6. Generate longer text sequences (paragraphs, chapters, books)
7. Test prediction-based transfers with `transfer_predictions()`
8. Analyze pattern content using `MongoDBAnalyzer.get_patterns_by_frequency()`

**Advanced Usage:**
```python
# Use transfer_predictions for prediction-based transfers
# (After training and enabling predictions on source node)
result = transfer_predictions(
    node_source=learner.nodes['node0'],
    node_target=learner.nodes['node1'],
    field='name',
    modeling_function=my_threshold_model,
    num_predictions=10
)
```

**Text Generation at Different Scales:**
- Generate from node0 ‚Üí sentences
- Generate from node1 ‚Üí paragraphs (unravel to sentences)
- Generate from node2 ‚Üí chapters (unravel to paragraphs ‚Üí sentences)
- Generate from node3 ‚Üí books (full hierarchical unraveling)