# 🚀 Dynamic ChunkingHNet - Improved Implementation Demo

This notebook demonstrates the improved, production-ready Dynamic ChunkingHNet library with:
- ✅ Modular architecture with proper package structure
- ✅ Error handling and validation 
- ✅ Advanced caching system
- ✅ Comprehensive testing
- ✅ Configuration management
- ✅ Logging and monitoring

## Key Features
1. **Dynamic Boundary Detection**: Uses embedding similarity to detect semantic breaks
2. **Adaptive Chunking Pipeline**: Adjusts chunk size based on compression ratio and content
3. **Quality Evaluation**: Computes metrics like boundary precision and semantic coherence
4. **Interactive Visualizations**: Comprehensive analysis dashboards

In [1]:
!pip install numpy



## 1. Import Improved Modules

Let's import the improved Dynamic ChunkingHNet modules and set up the environment.

In [None]:

#Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import List, Dict, Optional
import warnings
warnings.filterwarnings('ignore')

# Import improved Dynamic ChunkingHNet modules
import sys
sys.path.append('.')

try:
    from dynamic_chunking_hnet.core.pipeline import DynamicChunkingPipeline
    from dynamic_chunking_hnet.core.boundary_detector import SimilarityBasedBoundaryDetector
    from dynamic_chunking_hnet.core.routing_module import RoutingModule
    from dynamic_chunking_hnet.core.smoothing_module import SmoothingModule
    from dynamic_chunking_hnet.evaluation.metrics import ChunkingQualityMetrics
    from dynamic_chunking_hnet.utils.config import load_config
    from dynamic_chunking_hnet.utils.monitoring import get_logger, performance_monitor
    from dynamic_chunking_hnet.utils.cache import CacheManager
    print("✅ Successfully imported improved Dynamic ChunkingHNet modules!")
    MODULES_AVAILABLE = True
except ImportError as e:
    print(f"❌ Could not import improved modules: {e}")
    print("Please ensure the package is properly installed.")
    MODULES_AVAILABLE = False

# Setup visualization
plt.style.use('default')
sns.set_palette("husl")

print(f"\nModules available: {MODULES_AVAILABLE}")
if MODULES_AVAILABLE:
    print("🎉 Ready to demonstrate improved Dynamic ChunkingHNet!")

## 2. Load Configuration and Setup Monitoring

Demonstrate the improved configuration management and monitoring capabilities.

In [None]:
if MODULES_AVAILABLE:
    # Load configuration
    try:
        config = load_config('config/default.yaml')
        print("✅ Configuration loaded successfully!")
        print(f"Default compression ratio: {config.get('compression_ratio', 'Not set')}")
        print(f"Cache enabled: {config.get('cache', {}).get('enabled', 'Not set')}")
    except Exception as e:
        print(f"⚠️ Using default configuration: {e}")
        config = {'compression_ratio': 6.0}
    
    # Setup logging
    logger = get_logger('demo')
    logger.info("Starting Dynamic ChunkingHNet demonstration")
    
    # Initialize performance monitoring
    print("✅ Monitoring and logging initialized")
else:
    print("❌ Cannot proceed without modules. Please install the package first.")

## 3. Basic Usage with Improved Pipeline

Demonstrate the basic usage of the improved Dynamic ChunkingHNet pipeline.

In [None]:
if MODULES_AVAILABLE:
    # Sample text for demonstration
    sample_text = """
    Machine learning is transforming how we process information. 
    Neural networks can learn complex patterns from data. 
    Natural language processing enables computers to understand text. 
    The H-Net architecture introduces dynamic chunking mechanisms. 
    This approach outperforms traditional fixed-size tokenization methods.
    """.strip()
    
    print("📝 Sample Text:")
    print(f"{sample_text}\n")
    print(f"Text length: {len(sample_text.split())} tokens")
    
    # Initialize pipeline with monitoring
    @performance_monitor('pipeline_processing')
    def process_with_monitoring(text: str, compression_ratio: float = 6.0):
        pipeline = DynamicChunkingPipeline(compression_ratio=compression_ratio)
        return pipeline.process_text(text, return_metrics=True)
    
    # Process the text
    print("🔄 Processing text with improved pipeline...")
    result = process_with_monitoring(sample_text)
    
    # Display results
    print("\n📊 Results:")
    print(f"Original tokens: {result['num_tokens']}")
    print(f"Chunks created: {result['num_chunks']}")
    print(f"Compression ratio achieved: {result['compression_ratio_achieved']:.2f}")
    
    print("\n📝 Generated Chunks:")
    for i, chunk in enumerate(result['chunks'], 1):
        chunk_text = ' '.join(chunk)
        print(f"  {i}: {chunk_text}")
    
    logger.info("Basic processing completed", 
                chunks=result['num_chunks'], 
                compression=result['compression_ratio_achieved'])
else:
    print("❌ Modules not available for demonstration.")