# Comprehensive Keyword Analysis Module Demo

This notebook demonstrates the full capabilities of the Keyword Analysis Module for the "Agentic AI in SCM" Systematic Literature Review.

## Features Demonstrated:
- API-based keyword extraction
- NLP-based keyword extraction (TF-IDF, RAKE, YAKE)
- Semantic analysis with BGE-M3 embeddings
- Temporal trend analysis
- Keyword lifecycle analysis
- Comparative time period analysis
- Comprehensive visualizations
- Interactive dashboards

## 1. Setup and Configuration

In [1]:
# Environment Diagnostics - Check before imports
import sys
import os
from pathlib import Path

print("🔍 Environment Diagnostics:")
print(f"Python executable: {sys.executable}")
print(f"Python version: {sys.version}")
print(f"Current working directory: {os.getcwd()}")
print(f"Project root detected: {Path.cwd()}")

# Check if we're in a devcontainer
if os.path.exists('/.dockerenv'):
    print("✅ Running in Docker/devcontainer")
else:
    print("⚠️ Not in devcontainer")

# Check numpy specifically
try:
    import numpy as np
    print(f"✅ Numpy import successful: {np.__version__} from {np.__file__}")
except ImportError as e:
    print(f"❌ Numpy import failed: {e}")
    # Try to diagnose the issue
    import subprocess
    result = subprocess.run([sys.executable, '-c', 'import numpy; print(numpy.__file__)'], 
                          capture_output=True, text=True)
    if result.returncode == 0:
        print(f"Numpy path from subprocess: {result.stdout.strip()}")
    else:
        print(f"Subprocess error: {result.stderr}")

print("\n📦 Checking key dependencies:")
for package in ['pandas', 'yaml', 'requests']:
    try:
        __import__(package)
        print(f"✅ {package} available")
    except ImportError:
        print(f"❌ {package} not available")

🔍 Environment Diagnostics:
Python executable: /Users/max/miniconda3/envs/tsi/bin/python
Python version: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 16.0.6 ]
Current working directory: /Users/max/Documents/code/tsi-sota-ai/notebooks
Project root detected: /Users/max/Documents/code/tsi-sota-ai/notebooks
⚠️ Not in devcontainer
✅ Numpy import successful: 1.26.4 from /Users/max/miniconda3/envs/tsi/lib/python3.11/site-packages/numpy/__init__.py

📦 Checking key dependencies:
✅ Numpy import successful: 1.26.4 from /Users/max/miniconda3/envs/tsi/lib/python3.11/site-packages/numpy/__init__.py

📦 Checking key dependencies:
✅ pandas available
✅ yaml available
✅ requests available
✅ pandas available
✅ yaml available
✅ requests available


In [2]:
import os
import yaml

# Load configuration
# config_path = '/workspaces/tsi-sota-ai/config/slr_config.yaml'
config_path = '/Users/max/Documents/code/tsi-sota-ai/config/slr_config.yaml'

# Check if config file exists
if not os.path.exists(config_path):
    print(f"⚠️ Config file not found at: {config_path}")
    print("Looking for alternative config locations...")
    
    # Try alternative paths
    alt_paths = [
        os.path.join(project_root, 'config', 'slr_config.yaml'),
        os.path.join(os.getcwd(), 'config', 'slr_config.yaml'),
        os.path.join(os.path.dirname(os.getcwd()), 'config', 'slr_config.yaml')
    ]
    
    for alt_path in alt_paths:
        if os.path.exists(alt_path):
            config_path = alt_path
            print(f"✅ Found config at: {config_path}")
            break
    else:
        print("❌ No config file found. Creating minimal config...")
        config = {
            'keyword_analysis': {},
            'semantic_analysis': {},
            'temporal_analysis': {},
            'visualization': {},
            'test_search_queries': ['agent AI supply chain', 'autonomous agents logistics']
        }
        print("Using default configuration.")

if 'config' not in locals():
    try:
        with open(config_path, 'r') as f:
            config = yaml.safe_load(f)
        print(f"✅ Configuration loaded from: {config_path}")
    except Exception as e:
        print(f"❌ Error loading config: {e}")
        config = {
            'keyword_analysis': {},
            'semantic_analysis': {},
            'temporal_analysis': {},
            'visualization': {},
            'test_search_queries': ['agent AI supply chain', 'autonomous agents logistics']
        }
        print("Using fallback configuration.")

print("Configuration summary:")
print(f"- Keyword Analysis: {len(config.get('keyword_analysis', {}))} settings")
print(f"- Semantic Analysis: {len(config.get('semantic_analysis', {}))} settings")
print(f"- Temporal Analysis: {len(config.get('temporal_analysis', {}))} settings")
print(f"- Visualization: {len(config.get('visualization', {}))} settings")
print(f"- Test queries: {len(config.get('test_search_queries', []))} queries")

✅ Configuration loaded from: /Users/max/Documents/code/tsi-sota-ai/config/slr_config.yaml
Configuration summary:
- Keyword Analysis: 2 settings
- Semantic Analysis: 3 settings
- Temporal Analysis: 0 settings
- Visualization: 6 settings
- Test queries: 3 queries


## 2. Data Acquisition

First, let's acquire some sample data using our test search queries.

In [None]:
# For the devcontainer path, update your import cell:
import sys
import os

# Add slr_core to Python path
sys.path.append('/workspaces/tsi-sota-ai')  # Add project root, not just slr_core

# Correct imports based on your actual module structure:
try:
    from slr_core.data_acquirer import DataAcquirer
    from slr_core.keyword_analysis import KeywordExtractor, SemanticAnalyzer, TemporalAnalyzer, Visualizer
    from slr_core.config_manager import ConfigManager
    print("✅ Imports successful")
except ImportError as e:
    print(f"❌ Import failed: {e}")
    
    # If above fails, try individual imports to debug:
    try:
        from slr_core.data_acquirer import DataAcquirer
        print("✅ DataAcquirer imported")
    except ImportError as e1:
        print(f"❌ DataAcquirer failed: {e1}")
    
    try:
        from slr_core.keyword_analysis import KeywordExtractor
        print("✅ KeywordExtractor imported")
    except ImportError as e2:
        print(f"❌ KeywordExtractor failed: {e2}")

🔍 Checking slr_core directory structure:
❌ Directory /workspaces/tsi-sota-ai/slr_core does not exist
❌ Direct imports failed: No module named 'data_acquisition'
❌ Package-style imports failed: No module named 'slr_core'
❌ All import attempts failed
Last error: No module named 'memory_bank'

Please check the actual structure of your slr_core directory


In [None]:
# Initialize with ConfigManager
config_manager = ConfigManager()
data_acquirer = DataAcquirer(config_manager=config_manager)

# Use the correct method names from DataAcquirer
sample_publications = []
test_queries = config.get('test_search_queries', ['agent AI supply chain'])

for query in test_queries[:2]:
    try:
        print(f"\nSearching for: {query}")
        # Use fetch_all_sources method instead of search_publications
        results = data_acquirer.fetch_all_sources(
            query=query,
            start_year=2020,
            end_year=2024,
            max_results_per_source=25
        )
        
        # Results is a dict with source names as keys
        for source, publications in results.items():
            if publications:
                sample_publications.extend(publications)
                print(f"Found {len(publications)} publications from {source}")
                
    except Exception as e:
        print(f"Error searching for '{query}': {str(e)}")
        continue

print(f"\n📊 Total publications acquired: {len(sample_publications)}")

NameError: name 'DataAcquirer' is not defined

## 3. Keyword Extraction

Now let's extract keywords using both API-based and NLP-based methods.

In [None]:
# Initialize keyword extractor
keyword_extractor = KeywordExtractor(config)

print("🔍 Starting keyword extraction...")

# Extract keywords using API data
print("\n1. API-based keyword extraction:")
api_keywords = keyword_extractor.extract_from_api_data(sample_publications)
print(f"   - Extracted {len(api_keywords.get('all_keywords', []))} unique keywords")
print(f"   - Top 10 by frequency: {list(api_keywords.get('keyword_frequencies', {}).keys())[:10]}")

# Extract keywords using NLP methods
print("\n2. NLP-based keyword extraction:")
text_corpus = [pub.get('title', '') + ' ' + pub.get('abstract', '') for pub in sample_publications]
text_corpus = [text for text in text_corpus if text.strip()]  # Remove empty texts

if text_corpus:
    nlp_keywords = keyword_extractor.extract_from_text(text_corpus)
    
    for method in ['tfidf', 'rake', 'yake']:
        if method in nlp_keywords:
            method_keywords = nlp_keywords[method]
            print(f"   - {method.upper()}: {len(method_keywords)} keywords")
            print(f"     Top 5: {list(method_keywords.keys())[:5]}")
else:
    print("   - No text content available for NLP extraction")
    nlp_keywords = {}

# Combine and analyze frequency distribution
print("\n3. Frequency analysis:")
all_keywords_combined = {}
all_keywords_combined.update(api_keywords.get('keyword_frequencies', {}))

for method_keywords in nlp_keywords.values():
    for kw, freq in method_keywords.items():
        all_keywords_combined[kw] = all_keywords_combined.get(kw, 0) + freq

freq_stats = keyword_extractor.analyze_frequency_distribution(all_keywords_combined)
print(f"   - Total unique keywords: {freq_stats['total_keywords']}")
print(f"   - Mean frequency: {freq_stats['mean_frequency']:.2f}")
print(f"   - Frequency std: {freq_stats['frequency_std']:.2f}")
print(f"   - High-frequency keywords (>mean): {len(freq_stats['high_frequency_keywords'])}")

## 4. Semantic Analysis

Let's perform semantic analysis using BGE-M3 embeddings and clustering.

In [None]:
# Initialize semantic analyzer
semantic_analyzer = SemanticAnalyzer(config)

print("🧠 Starting semantic analysis...")

# Get top keywords for semantic analysis
top_keywords = list(all_keywords_combined.keys())[:50]  # Limit for demo
print(f"Analyzing top {len(top_keywords)} keywords")

# Generate embeddings
print("\n1. Generating BGE-M3 embeddings...")
embeddings = semantic_analyzer.generate_embeddings(top_keywords)
print(f"   - Generated embeddings: {embeddings.shape}")

# Perform clustering
print("\n2. Performing clustering analysis...")
clustering_results = semantic_analyzer.perform_clustering(
    keywords=top_keywords,
    embeddings=embeddings,
    algorithm='kmeans',
    n_clusters=8
)

print(f"   - Number of clusters: {clustering_results['cluster_stats']['n_clusters']}")
print(f"   - Silhouette score: {clustering_results['cluster_stats']['silhouette_score']:.3f}")
print(f"   - Largest cluster size: {max(clustering_results['cluster_stats']['cluster_sizes'])}")

# Show cluster examples
print("\n3. Cluster examples:")
for cluster_id, keywords_in_cluster in clustering_results['clusters'].items():
    if len(keywords_in_cluster) > 1:  # Show clusters with multiple keywords
        print(f"   Cluster {cluster_id}: {', '.join(keywords_in_cluster[:5])}")
        if len(keywords_in_cluster) > 5:
            print(f"      ... and {len(keywords_in_cluster) - 5} more")

# Dimensionality reduction for visualization
print("\n4. Dimensionality reduction...")
reduced_embeddings = semantic_analyzer.reduce_dimensions(
    embeddings, 
    method='umap', 
    n_components=2
)
print(f"   - Reduced to 2D: {reduced_embeddings.shape}")

# Store results for visualization
semantic_results = {
    'keywords': top_keywords,
    'embeddings': embeddings,
    'embeddings_2d': reduced_embeddings,
    'cluster_labels': clustering_results['cluster_labels'],
    'clusters': clustering_results['clusters'],
    'cluster_stats': clustering_results['cluster_stats']
}

## 5. Temporal Analysis

Let's analyze temporal patterns and trends in keyword usage.

In [None]:
# Initialize temporal analyzer
temporal_analyzer = TemporalAnalyzer(config)

print("📈 Starting temporal analysis...")

# Prepare keyword data for temporal analysis
combined_keywords = {
    'all_keywords': list(all_keywords_combined.keys()),
    'keyword_frequencies': all_keywords_combined
}

# 1. Analyze publication trends
print("\n1. Publication volume trends:")
pub_trends = temporal_analyzer.analyze_publication_trends(sample_publications)
if 'volume_trends' in pub_trends:
    volume_trends = pub_trends['volume_trends']
    print(f"   - Date range: {pub_trends['date_range']['start']} to {pub_trends['date_range']['end']}")
    print(f"   - Total publications: {pub_trends['total_publications']}")
    print(f"   - Peak year: {volume_trends.get('peak_year', 'N/A')} ({volume_trends.get('peak_count', 0)} publications)")
    print(f"   - Average yearly growth: {volume_trends.get('average_yearly_growth', 0):.2%}")

# 2. Analyze keyword trends
print("\n2. Keyword temporal trends:")
keyword_trends = temporal_analyzer.analyze_keyword_trends(sample_publications, combined_keywords)
if 'individual_trends' in keyword_trends:
    trends = keyword_trends['individual_trends']
    print(f"   - Keywords analyzed: {len(trends)}")
    
    # Show trending keywords
    if 'top_growing_keywords' in keyword_trends:
        growing = keyword_trends['top_growing_keywords']
        print(f"   - Growing keywords: {len(growing)}")
        for kw in growing[:3]:
            print(f"     • {kw['keyword']}: slope={kw['slope']:.3f}, R²={kw['r_squared']:.3f}")

# 3. Detect temporal patterns
print("\n3. Pattern detection:")
patterns = temporal_analyzer.detect_temporal_patterns(sample_publications, combined_keywords)
if 'pattern_summary' in patterns:
    summary = patterns['pattern_summary']
    print(f"   - Keywords with seasonal patterns: {summary.get('seasonal_keywords', 0)}")
    print(f"   - Keywords with cyclical patterns: {summary.get('cyclical_keywords', 0)}")
    print(f"   - Keywords with trend changes: {summary.get('keywords_with_trend_changes', 0)}")

# 4. Lifecycle analysis
print("\n4. Keyword lifecycle analysis:")
lifecycle = temporal_analyzer.analyze_keyword_lifecycle(sample_publications, combined_keywords)
if 'lifecycle_categories' in lifecycle:
    categories = lifecycle['lifecycle_categories']
    print(f"   - Emerging keywords: {len(categories.get('emerging', []))}")
    print(f"   - Growing keywords: {len(categories.get('growing', []))}")
    print(f"   - Mature keywords: {len(categories.get('mature', []))}")
    print(f"   - Declining keywords: {len(categories.get('declining', []))}")

# 5. Compare time periods
print("\n5. Time period comparison:")
comparison = temporal_analyzer.compare_time_periods(sample_publications, combined_keywords)
if 'period_data' in comparison:
    period_data = comparison['period_data']
    for period, keywords in period_data.items():
        print(f"   - {period}: {len(keywords)} unique keywords, {sum(keywords.values())} total occurrences")

# Store temporal results
temporal_results = {
    'publication_trends': pub_trends,
    'keyword_trends': keyword_trends,
    'temporal_patterns': patterns,
    'lifecycle_analysis': lifecycle,
    'comparative_analysis': comparison
}

## 6. Visualization

Now let's create comprehensive visualizations of our analysis results.

In [None]:
# Initialize visualizer
visualizer = Visualizer(config)

print("📊 Creating visualizations...")

# Create output directory for visualizations
viz_dir = '/workspaces/tsi-sota-ai/outputs/agent_research_analysis'
os.makedirs(viz_dir, exist_ok=True)

visualization_files = []

# 1. Word cloud
print("\n1. Creating word cloud...")
try:
    wordcloud_path = os.path.join(viz_dir, 'keyword_wordcloud.png')
    visualizer.create_word_cloud(
        keywords=all_keywords_combined,
        title="Agent Research Dynamics - Keyword Analysis",
        output_path=wordcloud_path
    )
    visualization_files.append(wordcloud_path)
    print(f"   ✅ Word cloud saved: {wordcloud_path}")
except Exception as e:
    print(f"   ❌ Error creating word cloud: {str(e)}")

# 2. Frequency plot
print("\n2. Creating frequency plot...")
try:
    freq_path = os.path.join(viz_dir, 'keyword_frequencies.png')
    visualizer.plot_keyword_frequencies(
        keywords=all_keywords_combined,
        top_n=20,
        title="Top 20 Keywords by Frequency - Agent Research",
        output_path=freq_path
    )
    visualization_files.append(freq_path)
    print(f"   ✅ Frequency plot saved: {freq_path}")
except Exception as e:
    print(f"   ❌ Error creating frequency plot: {str(e)}")

# 3. Semantic clusters
print("\n3. Creating semantic cluster plot...")
try:
    cluster_path = os.path.join(viz_dir, 'semantic_clusters.png')
    visualizer.plot_semantic_clusters(
        cluster_data=semantic_results,
        title="Semantic Keyword Clusters (BGE-M3 + UMAP) - Agent Research",
        output_path=cluster_path
    )
    visualization_files.append(cluster_path)
    print(f"   ✅ Cluster plot saved: {cluster_path}")
except Exception as e:
    print(f"   ❌ Error creating cluster plot: {str(e)}")

# 4. Temporal trends
print("\n4. Creating temporal trends plot...")
try:
    if 'keyword_trends' in temporal_results and temporal_results['keyword_trends']:
        trends_path = os.path.join(viz_dir, 'temporal_trends.png')
        visualizer.plot_temporal_trends(
            trend_data=temporal_results['keyword_trends'],
            top_keywords=10,
            title="Agent Research Keyword Temporal Trends",
            output_path=trends_path
        )
        visualization_files.append(trends_path)
        print(f"   ✅ Temporal trends plot saved: {trends_path}")
    else:
        print(f"   ⚠️ No temporal trends data available")
except Exception as e:
    print(f"   ❌ Error creating temporal trends plot: {str(e)}")

# 5. Lifecycle analysis
print("\n5. Creating lifecycle analysis plot...")
try:
    if 'lifecycle_analysis' in temporal_results and temporal_results['lifecycle_analysis']:
        lifecycle_path = os.path.join(viz_dir, 'keyword_lifecycle.png')
        visualizer.plot_lifecycle_analysis(
            lifecycle_data=temporal_results['lifecycle_analysis'],
            title="Agent Research Keyword Lifecycle Analysis",
            output_path=lifecycle_path
        )
        visualization_files.append(lifecycle_path)
        print(f"   ✅ Lifecycle plot saved: {lifecycle_path}")
    else:
        print(f"   ⚠️ No lifecycle analysis data available")
except Exception as e:
    print(f"   ❌ Error creating lifecycle plot: {str(e)}")

# 6. Time period comparison
print("\n6. Creating time period comparison plot...")
try:
    if 'comparative_analysis' in temporal_results and temporal_results['comparative_analysis']:
        comparison_path = os.path.join(viz_dir, 'time_period_comparison.png')
        visualizer.plot_comparative_analysis(
            comparative_data=temporal_results['comparative_analysis'],
            title="Agent Research Time Period Comparison",
            output_path=comparison_path
        )
        visualization_files.append(comparison_path)
        print(f"   ✅ Comparison plot saved: {comparison_path}")
    else:
        print(f"   ⚠️ No comparative analysis data available")
except Exception as e:
    print(f"   ❌ Error creating comparison plot: {str(e)}")

print(f"\n📁 Total visualizations created: {len(visualization_files)}")
for path in visualization_files:
    print(f"   - {os.path.basename(path)}")

## 7. Interactive Dashboard

Let's create an interactive dashboard combining all our analysis results.

In [None]:
print("🎛️ Creating interactive dashboard...")

# Compile all analysis results
complete_results = {
    'keyword_frequencies': all_keywords_combined,
    'semantic_analysis': semantic_results,
    'temporal_analysis': temporal_results,
    'publication_count': len(sample_publications),
    'analysis_timestamp': datetime.now().isoformat()
}

# Create interactive dashboard
try:
    dashboard_path = os.path.join(viz_dir, 'interactive_dashboard.html')
    visualizer.create_dashboard(
        analysis_results=complete_results,
        output_path=dashboard_path
    )
    print(f"✅ Interactive dashboard created: {dashboard_path}")
    print(f"🌐 Open in browser: file://{dashboard_path}")
    
except Exception as e:
    print(f"❌ Error creating dashboard: {str(e)}")
    import traceback
    traceback.print_exc()

## 8. Export Results

Let's export all our analysis results in various formats.

In [None]:
print("💾 Exporting analysis results...")

# Export keyword extraction results
print("\n1. Exporting keyword extraction results:")
try:
    keywords_export_path = os.path.join(viz_dir, 'keyword_extraction_results.json')
    keyword_extractor.export_keywords(
        keywords={'combined_keywords': all_keywords_combined},
        output_path=keywords_export_path,
        format='json'
    )
    print(f"   ✅ Keywords exported: {keywords_export_path}")
except Exception as e:
    print(f"   ❌ Error exporting keywords: {str(e)}")

# Export semantic analysis results
print("\n2. Exporting semantic analysis results:")
try:
    semantic_export_path = os.path.join(viz_dir, 'semantic_analysis_results.json')
    semantic_analyzer.export_analysis_results(
        results=semantic_results,
        output_path=semantic_export_path,
        format='json'
    )
    print(f"   ✅ Semantic analysis exported: {semantic_export_path}")
except Exception as e:
    print(f"   ❌ Error exporting semantic analysis: {str(e)}")

# Export temporal analysis results
print("\n3. Exporting temporal analysis results:")
try:
    temporal_export_path = os.path.join(viz_dir, 'temporal_analysis_results.json')
    temporal_analyzer.export_temporal_analysis(
        output_path=temporal_export_path,
        format='json'
    )
    print(f"   ✅ Temporal analysis exported: {temporal_export_path}")
except Exception as e:
    print(f"   ❌ Error exporting temporal analysis: {str(e)}")

# Export all visualizations
print("\n4. Exporting all visualizations:")
try:
    all_viz_files = visualizer.export_all_visualizations(
        analysis_results=complete_results,
        output_dir=viz_dir
    )
    print(f"   ✅ Exported {len(all_viz_files)} visualization files")
except Exception as e:
    print(f"   ❌ Error exporting visualizations: {str(e)}")

# Create summary report
print("\n5. Creating summary report:")
try:
    summary_report = {
        'analysis_summary': {
            'total_publications': len(sample_publications),
            'total_keywords': len(all_keywords_combined),
            'semantic_clusters': semantic_results.get('cluster_stats', {}).get('n_clusters', 0),
            'temporal_patterns': len(temporal_results.get('temporal_patterns', {}).get('keyword_patterns', {})),
            'analysis_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        },
        'top_keywords': dict(list(all_keywords_combined.items())[:20]),
        'configuration_used': config.get('keyword_analysis', {}),
        'files_generated': {
            'visualizations': len(visualization_files),
            'exports': 3,  # JSON exports
            'dashboard': 1
        }
    }
    
    summary_path = os.path.join(viz_dir, 'analysis_summary.json')
    import json
    with open(summary_path, 'w') as f:
        json.dump(summary_report, f, indent=2, default=str)
    
    print(f"   ✅ Summary report created: {summary_path}")
    
except Exception as e:
    print(f"   ❌ Error creating summary: {str(e)}")

print(f"\n🎉 Analysis complete! All results saved to: {viz_dir}")

## 9. Analysis Summary

Let's display a comprehensive summary of our keyword analysis.

In [None]:
print("📋 KEYWORD ANALYSIS SUMMARY")
print("=" * 50)

print(f"\n📊 DATA OVERVIEW:")
print(f"   • Publications analyzed: {len(sample_publications)}")
print(f"   • Total unique keywords: {len(all_keywords_combined)}")
print(f"   • Search queries used: {len(test_queries[:2])}")

print(f"\n🔍 KEYWORD EXTRACTION:")
print(f"   • API-based keywords: {len(api_keywords.get('all_keywords', []))}")
print(f"   • NLP-based methods: {len(nlp_keywords)} (TF-IDF, RAKE, YAKE)")
print(f"   • Combined keyword pool: {len(all_keywords_combined)}")

print(f"\n🧠 SEMANTIC ANALYSIS:")
print(f"   • BGE-M3 embeddings generated: {len(top_keywords)}")
print(f"   • Semantic clusters found: {semantic_results.get('cluster_stats', {}).get('n_clusters', 0)}")
print(f"   • Clustering quality (silhouette): {semantic_results.get('cluster_stats', {}).get('silhouette_score', 0):.3f}")
print(f"   • Dimensionality reduction: UMAP to 2D")

print(f"\n📈 TEMPORAL ANALYSIS:")
if temporal_results.get('publication_trends'):
    pub_trends = temporal_results['publication_trends']
    print(f"   • Publication date range: {pub_trends.get('date_range', {}).get('start', 'N/A')} - {pub_trends.get('date_range', {}).get('end', 'N/A')}")
    if 'volume_trends' in pub_trends:
        volume = pub_trends['volume_trends']
        print(f"   • Peak publication year: {volume.get('peak_year', 'N/A')} ({volume.get('peak_count', 0)} papers)")
        print(f"   • Average yearly growth: {volume.get('average_yearly_growth', 0):.2%}")

if temporal_results.get('keyword_trends'):
    kw_trends = temporal_results['keyword_trends']
    print(f"   • Keywords with temporal trends: {len(kw_trends.get('individual_trends', {}))}")
    print(f"   • Growing keywords: {len(kw_trends.get('top_growing_keywords', []))}")
    print(f"   • Declining keywords: {len(kw_trends.get('declining_keywords', []))}")

if temporal_results.get('lifecycle_analysis'):
    lifecycle = temporal_results['lifecycle_analysis']
    if 'lifecycle_categories' in lifecycle:
        cats = lifecycle['lifecycle_categories']
        print(f"   • Lifecycle stages:")
        print(f"     - Emerging: {len(cats.get('emerging', []))} keywords")
        print(f"     - Growing: {len(cats.get('growing', []))} keywords")
        print(f"     - Mature: {len(cats.get('mature', []))} keywords")
        print(f"     - Declining: {len(cats.get('declining', []))} keywords")

print(f"\n📊 VISUALIZATIONS CREATED:")
print(f"   • Static plots: {len(visualization_files)}")
print(f"   • Interactive dashboard: 1")
print(f"   • Word cloud: ✅")
print(f"   • Frequency plots: ✅")
print(f"   • Semantic clusters: ✅")
print(f"   • Temporal trends: ✅")
print(f"   • Lifecycle analysis: ✅")

print(f"\n💾 EXPORTS GENERATED:")
print(f"   • JSON analysis results: 3 files")
print(f"   • Visualization images: {len(visualization_files)} files")
print(f"   • Interactive HTML dashboard: 1 file")
print(f"   • Summary report: 1 file")

print(f"\n🎯 TOP INSIGHTS:")
if all_keywords_combined:
    top_5_keywords = list(all_keywords_combined.keys())[:5]
    print(f"   • Most frequent keywords: {', '.join(top_5_keywords)}")

if semantic_results.get('clusters'):
    largest_cluster = max(semantic_results['clusters'].items(), key=lambda x: len(x[1]))
    print(f"   • Largest semantic cluster: {len(largest_cluster[1])} keywords")
    print(f"     Example terms: {', '.join(largest_cluster[1][:3])}")

print(f"\n📁 All results saved to: {viz_dir}")
print(f"🌐 Open dashboard: file://{os.path.join(viz_dir, 'interactive_dashboard.html')}")

print("\n" + "=" * 50)
print("✅ KEYWORD ANALYSIS MODULE DEMONSTRATION COMPLETE!")
print("=" * 50)

## 10. Next Steps and Integration

This demonstration shows the complete capabilities of our Keyword Analysis Module. Here are suggested next steps:

### Integration with Existing Workflows:
1. **Data Pipeline Integration**: Connect with `DataAcquirer` for real-time analysis
2. **Batch Processing**: Set up automated keyword analysis for large datasets
3. **API Integration**: Expose keyword analysis through REST APIs

### Advanced Analysis:
1. **Cross-Database Analysis**: Compare keywords across multiple academic databases
2. **Citation-Weighted Keywords**: Weight keywords by publication citation counts
3. **Co-occurrence Networks**: Analyze keyword co-occurrence patterns

### Performance Optimization:
1. **Caching**: Implement embedding and analysis result caching
2. **Parallel Processing**: Utilize multiprocessing for large datasets
3. **Memory Optimization**: Optimize for large-scale keyword analysis

### Enhanced Visualizations:
1. **Interactive Networks**: Create interactive keyword co-occurrence networks
2. **Time-series Animation**: Animate temporal keyword evolution
3. **Comparative Dashboards**: Side-by-side comparison of different datasets

The module is now ready for production use and can be easily integrated into the larger TSI-SOTA-AI research analytics platform.