# Model Analysis and Deployment Walkthrough

This notebook explains our trained BiLSTM model analysis system for real-world Myanmar news article classification and interpretation.

## Environment Setup
**Conda Environment:** nlp  
**Purpose:** Deploy trained model for analyzing Myanmar news articles in production

## Analysis Pipeline Overview
```
Raw Article Text → Preprocessing → Tokenization → Model Prediction → Analysis Report
```

**Output:** Detailed classification with confidence scores, reasoning, and visual reports.

## 1. Model Loading and Initialization

### Production Model Setup
Load the trained BiLSTM model and all required artifacts for inference.

In [None]:
import tensorflow as tf
import pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import json
import os
import sys
import re
import unicodedata

class MyanmarNewsAnalyzer:
    """
    Production deployment class for Myanmar news sentiment analysis.
    
    Capabilities:
    - Load trained BiLSTM model and artifacts
    - Process raw Myanmar text articles
    - Generate detailed classification reports
    - Provide confidence scores and reasoning
    - Create visual analysis outputs
    """
    
    def __init__(self, model_dir):
        """
        Initialize analyzer with trained model artifacts.
        
        Args:
            model_dir (str): Directory containing model files:
                - bilstm_model.h5: Trained Keras model
                - tokenizer.pickle: Fitted tokenizer
                - model_params.pickle: Model configuration
        """
        self.model_dir = model_dir
        self.model = None
        self.tokenizer = None
        self.model_params = None
        self.myword_tokenizer = None
        
        # Label mappings
        self.label_mapping = {
            0: 'neutral',    # DVB - opposition/neutral
            1: 'red',        # Khitthit - critical
            2: 'green'       # Myawady - government-friendly
        }
        
        self.label_descriptions = {
            'neutral': 'Opposition/Neutral perspective (characteristic of DVB News)',
            'red': 'Critical/Independent perspective (characteristic of Khitthit News)',
            'green': 'Government-friendly perspective (characteristic of Myawady News)'
        }
        
        # Initialize components
        self._load_model_artifacts()
        self._initialize_myanmar_tokenizer()
    
    def _load_model_artifacts(self):
        """
        Load all model artifacts from disk.
        
        Critical for production deployment:
        - Model weights and architecture
        - Vocabulary mapping (tokenizer)
        - Model hyperparameters
        """
        try:
            # Load trained model
            model_path = os.path.join(self.model_dir, 'bilstm_model.h5')
            if os.path.exists(model_path):
                self.model = tf.keras.models.load_model(model_path)
                print(f"✅ Model loaded: {model_path}")
            else:
                raise FileNotFoundError(f"Model file not found: {model_path}")
            
            # Load tokenizer
            tokenizer_path = os.path.join(self.model_dir, 'tokenizer.pickle')
            if os.path.exists(tokenizer_path):
                with open(tokenizer_path, 'rb') as f:
                    self.tokenizer = pickle.load(f)
                print(f"✅ Tokenizer loaded: vocabulary size {len(self.tokenizer.word_index)}")
            else:
                raise FileNotFoundError(f"Tokenizer file not found: {tokenizer_path}")
            
            # Load model parameters
            params_path = os.path.join(self.model_dir, 'model_params.pickle')
            if os.path.exists(params_path):
                with open(params_path, 'rb') as f:
                    self.model_params = pickle.load(f)
                print(f"✅ Model parameters loaded: {self.model_params}")
            else:
                print(f"⚠️  Model parameters file not found, using defaults")
                self.model_params = {
                    'max_length': 500,
                    'vocab_size': 10000
                }
            
            print(f"\n🚀 Myanmar News Analyzer initialized successfully!")
            
        except Exception as e:
            print(f"❌ Error loading model artifacts: {e}")
            raise
    
    def _initialize_myanmar_tokenizer(self):
        """
        Initialize MyWord tokenizer for processing raw text.
        
        This is needed to convert raw articles into the same token format
        that was used during training.
        """
        try:
            # Import MyWord tokenizer
            from myword import MyWord
            self.myword_tokenizer = MyWord()
            print(f"✅ MyWord tokenizer initialized")
        except ImportError:
            print(f"⚠️  MyWord tokenizer not available, using fallback tokenization")
            self.myword_tokenizer = None
    
    def preprocess_text(self, text):
        """
        Preprocess raw Myanmar text for model input.
        
        Applies the same preprocessing pipeline used during training:
        1. Unicode normalization
        2. Character cleaning
        3. Myanmar-specific text formatting
        
        Args:
            text (str): Raw Myanmar article text
        
        Returns:
            str: Cleaned text ready for tokenization
        """
        if not text or not isinstance(text, str):
            return ""
        
        # Unicode normalization (critical for Myanmar)
        text = unicodedata.normalize('NFC', text)
        
        # Clean unwanted characters while preserving Myanmar script
        allowed_pattern = r'[^\u1000-\u109F\u0020-\u007E\u00A0-\u00FF\uAA60-\uAA7F\uA9E0-\uA9FF\u2000-\u206F\u2070-\u209F\u20A0-\u20CF]'
        text = re.sub(allowed_pattern, ' ', text)
        
        # Whitespace standardization
        text = re.sub(r'\s+', ' ', text)
        text = text.strip()
        
        # Clean excessive punctuation
        text = re.sub(r'[,.!?;:]{2,}', '.', text)
        text = re.sub(r'\.{3,}', '...', text)
        
        return text
    
    def tokenize_text(self, text):
        """
        Tokenize Myanmar text using the same method as training.
        
        Args:
            text (str): Preprocessed Myanmar text
        
        Returns:
            list: Myanmar word tokens
        """
        if self.myword_tokenizer:
            try:
                tokens = self.myword_tokenizer.segment(text)
                return [token.strip() for token in tokens if token.strip()]
            except Exception as e:
                print(f"⚠️  MyWord tokenization failed: {e}, using fallback")
        
        # Fallback tokenization
        tokens = text.split()
        refined_tokens = []
        for token in tokens:
            subtokens = re.findall(r'[\u1000-\u109F]+|[a-zA-Z0-9]+|[.,!?]', token)
            refined_tokens.extend(subtokens)
        
        return [token for token in refined_tokens if token.strip()]

print("✅ Myanmar News Analyzer class implementation complete")

## 2. Article Analysis Pipeline

### End-to-End Classification Process
Complete pipeline from raw article text to detailed sentiment classification.

In [None]:
    def analyze_article(self, article_text, article_title=""):
        """
        Analyze a single Myanmar news article.
        
        Complete analysis pipeline:
        1. Text preprocessing
        2. Myanmar tokenization
        3. Sequence preparation
        4. Model prediction
        5. Results interpretation
        
        Args:
            article_text (str): Main article content
            article_title (str): Article title (optional)
        
        Returns:
            dict: Comprehensive analysis results
        """
        if not article_text:
            return {'error': 'No article text provided'}
        
        # Step 1: Combine title and content
        full_text = f"{article_title} {article_text}".strip()
        
        # Step 2: Preprocess text
        preprocessed_text = self.preprocess_text(full_text)
        
        if len(preprocessed_text) < 20:
            return {'error': 'Article too short for meaningful analysis'}
        
        # Step 3: Tokenize
        tokens = self.tokenize_text(preprocessed_text)
        
        if len(tokens) < 5:
            return {'error': 'Insufficient tokens for classification'}
        
        # Step 4: Convert to model input format
        token_text = ' '.join(tokens)  # Space-separated tokens
        sequence = self.tokenizer.texts_to_sequences([token_text])
        
        # Step 5: Pad sequence
        max_length = self.model_params.get('max_length', 500)
        padded_sequence = tf.keras.preprocessing.sequence.pad_sequences(
            sequence, maxlen=max_length, padding='post'
        )
        
        # Step 6: Get model prediction
        prediction_proba = self.model.predict(padded_sequence, verbose=0)
        predicted_class = np.argmax(prediction_proba[0])
        confidence_scores = prediction_proba[0]
        
        # Step 7: Interpret results
        predicted_label = self.label_mapping[predicted_class]
        confidence = float(confidence_scores[predicted_class])
        
        # Calculate relative confidence (how much more confident than second choice)
        sorted_scores = np.sort(confidence_scores)[::-1]
        relative_confidence = float(sorted_scores[0] - sorted_scores[1])
        
        # Determine confidence level
        if confidence >= 0.8:
            confidence_level = 'Very High'
        elif confidence >= 0.6:
            confidence_level = 'High'
        elif confidence >= 0.4:
            confidence_level = 'Medium'
        else:
            confidence_level = 'Low'
        
        # Create analysis results
        analysis_result = {
            'input_info': {
                'title': article_title,
                'content_length': len(article_text),
                'preprocessed_length': len(preprocessed_text),
                'token_count': len(tokens),
                'analysis_timestamp': datetime.now().isoformat()
            },
            'classification': {
                'predicted_class': int(predicted_class),
                'predicted_label': predicted_label,
                'confidence': confidence,
                'confidence_level': confidence_level,
                'relative_confidence': relative_confidence
            },
            'detailed_scores': {
                'neutral': float(confidence_scores[0]),
                'red': float(confidence_scores[1]),
                'green': float(confidence_scores[2])
            },
            'interpretation': {
                'description': self.label_descriptions[predicted_label],
                'reasoning': self._generate_reasoning(predicted_label, confidence, tokens),
                'alternative_possibilities': self._get_alternative_analysis(confidence_scores)
            },
            'text_analysis': {
                'sample_tokens': tokens[:20],  # First 20 tokens for inspection
                'token_statistics': self._analyze_token_composition(tokens)
            }
        }
        
        return analysis_result
    
    def _generate_reasoning(self, predicted_label, confidence, tokens):
        """
        Generate human-readable reasoning for the classification.
        
        Provides contextual explanation of why the model made this prediction
        based on confidence levels and label characteristics.
        """
        reasoning_templates = {
            'neutral': {
                'high': 'The article exhibits characteristics typical of opposition or neutral reporting, similar to DVB News style. Language patterns suggest balanced or critical perspective toward current events.',
                'medium': 'The article shows moderate alignment with neutral/opposition reporting style, though with some mixed signals that prevent higher confidence.',
                'low': 'The article has some characteristics of neutral reporting, but the classification is uncertain due to conflicting linguistic patterns.'
            },
            'red': {
                'high': 'The article strongly exhibits critical/independent journalism characteristics, similar to Khitthit News. Language suggests investigative or questioning approach to reporting.',
                'medium': 'The article shows moderate alignment with critical journalism style, with some elements suggesting independent reporting perspective.',
                'low': 'The article has some critical journalism elements, but the classification confidence is limited due to mixed stylistic signals.'
            },
            'green': {
                'high': 'The article clearly demonstrates government-friendly reporting characteristics, similar to Myawady News. Language patterns suggest supportive or official perspective.',
                'medium': 'The article shows moderate alignment with government-friendly reporting, though with some elements that introduce uncertainty.',
                'low': 'The article has some government-friendly characteristics, but mixed signals prevent confident classification.'
            }
        }
        
        # Determine confidence category
        if confidence >= 0.7:
            conf_category = 'high'
        elif confidence >= 0.5:
            conf_category = 'medium'
        else:
            conf_category = 'low'
        
        base_reasoning = reasoning_templates[predicted_label][conf_category]
        
        # Add token-based insights
        token_insight = f" The analysis was based on {len(tokens)} Myanmar language tokens extracted from the article."
        
        return base_reasoning + token_insight
    
    def _get_alternative_analysis(self, confidence_scores):
        """
        Analyze alternative classifications and their probabilities.
        
        Helps understand model uncertainty and alternative interpretations.
        """
        # Get sorted predictions
        sorted_indices = np.argsort(confidence_scores)[::-1]
        
        alternatives = []
        for i, idx in enumerate(sorted_indices):
            if i == 0:  # Skip the primary prediction
                continue
            
            label = self.label_mapping[idx]
            score = float(confidence_scores[idx])
            
            if score > 0.1:  # Only include meaningful alternatives
                alternatives.append({
                    'label': label,
                    'probability': score,
                    'description': self.label_descriptions[label]
                })
        
        return alternatives
    
    def _analyze_token_composition(self, tokens):
        """
        Analyze the composition of tokens for insight into text characteristics.
        """
        myanmar_tokens = [t for t in tokens if re.search(r'[\u1000-\u109F]', t)]
        english_tokens = [t for t in tokens if re.search(r'[a-zA-Z]', t)]
        number_tokens = [t for t in tokens if re.search(r'[0-9]', t)]
        
        return {
            'total_tokens': len(tokens),
            'myanmar_tokens': len(myanmar_tokens),
            'english_tokens': len(english_tokens),
            'number_tokens': len(number_tokens),
            'myanmar_ratio': len(myanmar_tokens) / len(tokens) if tokens else 0,
            'avg_token_length': np.mean([len(t) for t in tokens]) if tokens else 0
        }

print("✅ Article analysis pipeline implementation complete")

## 3. Batch Analysis and Reporting

### Multi-Article Analysis System
Process multiple articles and generate comprehensive analysis reports.

In [None]:
    def analyze_multiple_articles(self, articles_dir, output_dir=None):
        """
        Analyze multiple articles from a directory.
        
        Process:
        1. Load all .txt files from directory
        2. Analyze each article individually
        3. Generate aggregate statistics
        4. Create visual analysis reports
        5. Save detailed results
        
        Args:
            articles_dir (str): Directory containing article .txt files
            output_dir (str): Directory to save analysis results
        
        Returns:
            dict: Batch analysis results
        """
        if output_dir is None:
            output_dir = f"analysis_output_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
        
        os.makedirs(output_dir, exist_ok=True)
        
        # Find all text files
        article_files = [f for f in os.listdir(articles_dir) if f.endswith('.txt')]
        
        if not article_files:
            return {'error': f'No .txt files found in {articles_dir}'}
        
        print(f"\n📂 Analyzing {len(article_files)} articles from: {articles_dir}")
        print(f"📁 Output directory: {output_dir}")
        
        # Analyze each article
        analysis_results = []
        for i, filename in enumerate(article_files):
            file_path = os.path.join(articles_dir, filename)
            
            try:
                # Read article
                with open(file_path, 'r', encoding='utf-8') as f:
                    article_text = f.read().strip()
                
                if not article_text:
                    print(f"   ⚠️  Skipping empty file: {filename}")
                    continue
                
                # Analyze article
                result = self.analyze_article(article_text, article_title=filename)
                
                if 'error' in result:
                    print(f"   ❌ Error analyzing {filename}: {result['error']}")
                    continue
                
                # Add file information
                result['file_info'] = {
                    'filename': filename,
                    'file_path': file_path,
                    'file_index': i
                }
                
                analysis_results.append(result)
                
                # Progress update
                predicted_label = result['classification']['predicted_label']
                confidence = result['classification']['confidence']
                print(f"   ✅ {filename}: {predicted_label} ({confidence:.3f})")
                
            except Exception as e:
                print(f"   ❌ Error processing {filename}: {e}")
        
        if not analysis_results:
            return {'error': 'No articles were successfully analyzed'}
        
        # Generate batch statistics
        batch_stats = self._generate_batch_statistics(analysis_results)
        
        # Create visualizations
        self._create_analysis_visualizations(analysis_results, batch_stats, output_dir)
        
        # Save detailed results
        results_file = os.path.join(output_dir, 'detailed_analysis.json')
        with open(results_file, 'w', encoding='utf-8') as f:
            json.dump({
                'analysis_info': {
                    'timestamp': datetime.now().isoformat(),
                    'articles_processed': len(analysis_results),
                    'source_directory': articles_dir,
                    'output_directory': output_dir
                },
                'batch_statistics': batch_stats,
                'individual_results': analysis_results
            }, f, indent=2, ensure_ascii=False)
        
        print(f"\n✅ Batch analysis complete:")
        print(f"   Articles processed: {len(analysis_results)}")
        print(f"   Results saved: {results_file}")
        print(f"   Visualizations saved: {output_dir}/")
        
        return {
            'batch_statistics': batch_stats,
            'individual_results': analysis_results,
            'output_directory': output_dir
        }
    
    def _generate_batch_statistics(self, analysis_results):
        """
        Generate aggregate statistics from batch analysis.
        
        Statistics include:
        - Label distribution
        - Confidence score analysis
        - Text characteristics
        - Quality metrics
        """
        # Extract key metrics
        labels = [r['classification']['predicted_label'] for r in analysis_results]
        confidences = [r['classification']['confidence'] for r in analysis_results]
        token_counts = [r['input_info']['token_count'] for r in analysis_results]
        
        # Label distribution
        from collections import Counter
        label_counts = Counter(labels)
        
        # Confidence analysis by label
        confidence_by_label = {}
        for label in ['neutral', 'red', 'green']:
            label_confidences = [r['classification']['confidence'] 
                               for r in analysis_results 
                               if r['classification']['predicted_label'] == label]
            if label_confidences:
                confidence_by_label[label] = {
                    'count': len(label_confidences),
                    'mean_confidence': np.mean(label_confidences),
                    'std_confidence': np.std(label_confidences),
                    'min_confidence': np.min(label_confidences),
                    'max_confidence': np.max(label_confidences)
                }
        
        return {
            'total_articles': len(analysis_results),
            'label_distribution': dict(label_counts),
            'label_percentages': {label: count/len(analysis_results)*100 
                                for label, count in label_counts.items()},
            'confidence_statistics': {
                'overall_mean': np.mean(confidences),
                'overall_std': np.std(confidences),
                'by_label': confidence_by_label
            },
            'text_statistics': {
                'mean_token_count': np.mean(token_counts),
                'std_token_count': np.std(token_counts),
                'min_token_count': np.min(token_counts),
                'max_token_count': np.max(token_counts)
            },
            'quality_metrics': {
                'high_confidence_count': sum(1 for c in confidences if c >= 0.8),
                'medium_confidence_count': sum(1 for c in confidences if 0.5 <= c < 0.8),
                'low_confidence_count': sum(1 for c in confidences if c < 0.5),
                'avg_relative_confidence': np.mean([r['classification']['relative_confidence'] 
                                                  for r in analysis_results])
            }
        }
    
    def _create_analysis_visualizations(self, analysis_results, batch_stats, output_dir):
        """
        Create comprehensive visualization reports.
        
        Generates:
        1. Label distribution pie chart
        2. Confidence score distributions
        3. Confidence by label box plots
        4. Text length analysis
        """
        # Set up the plotting style
        plt.style.use('default')
        sns.set_palette("Set2")
        
        # Create a comprehensive figure
        fig = plt.figure(figsize=(16, 12))
        
        # 1. Label Distribution Pie Chart
        ax1 = plt.subplot(2, 3, 1)
        labels = list(batch_stats['label_distribution'].keys())
        counts = list(batch_stats['label_distribution'].values())
        colors = ['#FFB6C1', '#FF6B6B', '#90EE90']  # Light colors for neutral, red, green
        
        wedges, texts, autotexts = ax1.pie(counts, labels=labels, autopct='%1.1f%%', 
                                          colors=colors, startangle=90)
        ax1.set_title('Article Classification Distribution', fontsize=12, fontweight='bold')
        
        # 2. Confidence Score Histogram
        ax2 = plt.subplot(2, 3, 2)
        confidences = [r['classification']['confidence'] for r in analysis_results]
        ax2.hist(confidences, bins=20, alpha=0.7, color='skyblue', edgecolor='black')
        ax2.set_xlabel('Confidence Score')
        ax2.set_ylabel('Number of Articles')
        ax2.set_title('Confidence Score Distribution', fontsize=12, fontweight='bold')
        ax2.axvline(np.mean(confidences), color='red', linestyle='--', 
                    label=f'Mean: {np.mean(confidences):.3f}')
        ax2.legend()
        
        # 3. Confidence by Label Box Plot
        ax3 = plt.subplot(2, 3, 3)
        label_conf_data = []
        label_names = []
        for label in ['neutral', 'red', 'green']:
            label_confidences = [r['classification']['confidence'] 
                               for r in analysis_results 
                               if r['classification']['predicted_label'] == label]
            if label_confidences:
                label_conf_data.append(label_confidences)
                label_names.append(label)
        
        if label_conf_data:
            ax3.boxplot(label_conf_data, labels=label_names)
            ax3.set_ylabel('Confidence Score')
            ax3.set_title('Confidence by Label', fontsize=12, fontweight='bold')
        
        # 4. Token Count Distribution
        ax4 = plt.subplot(2, 3, 4)
        token_counts = [r['input_info']['token_count'] for r in analysis_results]
        ax4.hist(token_counts, bins=15, alpha=0.7, color='lightgreen', edgecolor='black')
        ax4.set_xlabel('Token Count')
        ax4.set_ylabel('Number of Articles')
        ax4.set_title('Article Length Distribution', fontsize=12, fontweight='bold')
        
        # 5. Confidence vs Token Count Scatter
        ax5 = plt.subplot(2, 3, 5)
        confidences = [r['classification']['confidence'] for r in analysis_results]
        token_counts = [r['input_info']['token_count'] for r in analysis_results]
        labels = [r['classification']['predicted_label'] for r in analysis_results]
        
        # Color by label
        color_map = {'neutral': 'pink', 'red': 'red', 'green': 'green'}
        colors = [color_map[label] for label in labels]
        
        ax5.scatter(token_counts, confidences, c=colors, alpha=0.6)
        ax5.set_xlabel('Token Count')
        ax5.set_ylabel('Confidence Score')
        ax5.set_title('Confidence vs Article Length', fontsize=12, fontweight='bold')
        
        # 6. Summary Statistics Text
        ax6 = plt.subplot(2, 3, 6)
        ax6.axis('off')
        
        stats_text = f"""
        Analysis Summary
        ________________
        
        Total Articles: {batch_stats['total_articles']}
        
        Label Distribution:
        • Neutral: {batch_stats['label_distribution'].get('neutral', 0)} 
          ({batch_stats['label_percentages'].get('neutral', 0):.1f}%)
        • Red: {batch_stats['label_distribution'].get('red', 0)} 
          ({batch_stats['label_percentages'].get('red', 0):.1f}%)
        • Green: {batch_stats['label_distribution'].get('green', 0)} 
          ({batch_stats['label_percentages'].get('green', 0):.1f}%)
        
        Confidence:
        • Mean: {batch_stats['confidence_statistics']['overall_mean']:.3f}
        • Std: {batch_stats['confidence_statistics']['overall_std']:.3f}
        
        Quality:
        • High Confidence (≥0.8): {batch_stats['quality_metrics']['high_confidence_count']}
        • Medium Confidence: {batch_stats['quality_metrics']['medium_confidence_count']}
        • Low Confidence (<0.5): {batch_stats['quality_metrics']['low_confidence_count']}
        """
        
        ax6.text(0.1, 0.9, stats_text, transform=ax6.transAxes, fontsize=10,
                verticalalignment='top', fontfamily='monospace')
        
        # Add main title
        fig.suptitle('Myanmar News Classification Analysis Report', 
                    fontsize=16, fontweight='bold', y=0.98)
        
        # Adjust layout and save
        plt.tight_layout()
        plt.subplots_adjust(top=0.93)
        
        # Save the visualization
        viz_path = os.path.join(output_dir, 'visual_analysis_report.png')
        plt.savefig(viz_path, dpi=300, bbox_inches='tight')
        plt.close()
        
        print(f"   📊 Visualization saved: {viz_path}")

print("✅ Batch analysis and reporting implementation complete")

## 4. Production Deployment Example

### Complete Analysis Workflow
Demonstration of the full analysis system in action.

In [None]:
def run_myanmar_news_analysis(model_dir, test_articles_dir):
    """
    Complete production analysis workflow demonstration.
    
    This function shows how to:
    1. Initialize the analyzer with trained model
    2. Process individual articles
    3. Run batch analysis on multiple articles
    4. Generate comprehensive reports
    
    Args:
        model_dir (str): Directory with trained model artifacts
        test_articles_dir (str): Directory with test articles (.txt files)
    
    Returns:
        dict: Complete analysis results
    """
    print(f"🇲🇲 Myanmar News Analysis System")
    print(f"=" * 50)
    
    # Step 1: Initialize analyzer
    print(f"\n📥 Step 1: Loading trained model...")
    try:
        analyzer = MyanmarNewsAnalyzer(model_dir)
    except Exception as e:
        print(f"❌ Failed to initialize analyzer: {e}")
        return {'error': f'Initialization failed: {e}'}
    
    # Step 2: Single article demonstration (if any article exists)
    print(f"\n🔍 Step 2: Single article analysis demonstration...")
    sample_files = [f for f in os.listdir(test_articles_dir) if f.endswith('.txt')][:1]
    
    if sample_files:
        sample_file = os.path.join(test_articles_dir, sample_files[0])
        with open(sample_file, 'r', encoding='utf-8') as f:
            sample_text = f.read().strip()
        
        if sample_text:
            print(f"   📄 Analyzing sample article: {sample_files[0]}")
            single_result = analyzer.analyze_article(sample_text, sample_files[0])
            
            if 'error' not in single_result:
                classification = single_result['classification']
                print(f"   📊 Result: {classification['predicted_label']} ")
                print(f"       Confidence: {classification['confidence']:.3f} ({classification['confidence_level']})")
                print(f"       Reasoning: {single_result['interpretation']['reasoning'][:100]}...")
            else:
                print(f"   ❌ Single article analysis failed: {single_result['error']}")
    
    # Step 3: Batch analysis
    print(f"\n📂 Step 3: Batch analysis...")
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    output_dir = f"analysis_{timestamp}"
    
    batch_results = analyzer.analyze_multiple_articles(test_articles_dir, output_dir)
    
    if 'error' not in batch_results:
        batch_stats = batch_results['batch_statistics']
        
        print(f"\n📋 Step 4: Analysis Summary")
        print(f"   Total articles processed: {batch_stats['total_articles']}")
        print(f"   Label distribution:")
        for label, percentage in batch_stats['label_percentages'].items():
            count = batch_stats['label_distribution'][label]
            print(f"     {label}: {count} articles ({percentage:.1f}%)")
        
        print(f"   Average confidence: {batch_stats['confidence_statistics']['overall_mean']:.3f}")
        print(f"   High confidence predictions: {batch_stats['quality_metrics']['high_confidence_count']}")
        
        print(f"\n📁 Output files:")
        print(f"   Detailed results: {output_dir}/detailed_analysis.json")
        print(f"   Visual report: {output_dir}/visual_analysis_report.png")
        
        return {
            'status': 'success',
            'batch_results': batch_results,
            'output_directory': output_dir
        }
    else:
        print(f"   ❌ Batch analysis failed: {batch_results['error']}")
        return {'error': batch_results['error']}

# Example usage demonstration
def demonstrate_analysis_system():
    """
    Demonstration of how to use the analysis system.
    
    This shows the typical workflow for production deployment.
    """
    print("📚 Myanmar News Analysis System Demonstration")
    print("=" * 55)
    
    # Configuration (adjust paths as needed)
    model_directory = "../00_final_model"  # Directory with trained model
    test_articles_directory = "../data/model_tester/processed"  # Test articles
    
    # Check if directories exist
    if not os.path.exists(model_directory):
        print(f"❌ Model directory not found: {model_directory}")
        return
    
    if not os.path.exists(test_articles_directory):
        print(f"❌ Test articles directory not found: {test_articles_directory}")
        return
    
    # Run the complete analysis
    results = run_myanmar_news_analysis(model_directory, test_articles_directory)
    
    if results.get('status') == 'success':
        print(f"\n✅ Analysis completed successfully!")
        print(f"\n💡 Next steps:")
        print(f"   1. Review detailed analysis: {results['output_directory']}/detailed_analysis.json")
        print(f"   2. Examine visual report: {results['output_directory']}/visual_analysis_report.png")
        print(f"   3. Use individual article results for further processing")
        print(f"   4. Integrate with production systems as needed")
    else:
        print(f"\n❌ Analysis failed. Check error messages above.")

# Uncomment to run demonstration
# demonstrate_analysis_system()

print("✅ Production deployment example complete")

## 5. Analysis Interpretation Guide

### Understanding Model Outputs

**Classification Labels:**
- **Neutral (0):** Opposition/neutral perspective, characteristic of DVB News
- **Red (1):** Critical/independent perspective, characteristic of Khitthit News  
- **Green (2):** Government-friendly perspective, characteristic of Myawady News

**Confidence Levels:**
- **Very High (≥0.8):** Strong confidence, clear stylistic markers
- **High (0.6-0.8):** Good confidence, typical language patterns detected
- **Medium (0.4-0.6):** Moderate confidence, mixed signals present
- **Low (<0.4):** Uncertain classification, article may be atypical

**Analysis Quality Indicators:**
- **Token Count:** More tokens generally improve accuracy (minimum 20 recommended)
- **Myanmar Script Ratio:** Higher Myanmar content typically yields better results
- **Relative Confidence:** Large gap between top predictions indicates certainty

### Practical Applications

**Media Monitoring:**
- Track sentiment trends across Myanmar news sources
- Identify bias patterns in reporting
- Monitor narrative changes over time

**Research Applications:**
- Analyze political discourse in Myanmar media
- Study propaganda and information warfare
- Compare government vs. opposition messaging

**Quality Assessment:**
- High confidence predictions are suitable for automated processing
- Medium confidence predictions may need human review
- Low confidence predictions require manual verification

### Model Limitations

**Training Data Constraints:**
- Model trained on specific news sources (may not generalize to other outlets)
- Historical data (may not reflect current events/language changes)
- Limited domain coverage (news articles only)

**Technical Limitations:**
- Requires Myanmar tokenization (MyWord dependency)
- Optimal performance on 100+ token articles
- Sensitive to text preprocessing quality

**Interpretation Caveats:**
- Classifications reflect source bias, not absolute truth
- Model detects stylistic patterns, not factual accuracy
- Individual articles may deviate from source norms

This analysis system provides a powerful tool for understanding Myanmar media landscape through automated sentiment classification with detailed explanations and confidence assessment.