# 🚀 Twitter Sentiment Analysis - Clean Architecture Overview

**Author**: Lakshya Khetan  
**Email**: lakshyaketan00@gmail.com  
**Project**: Twitter Sentiment Analysis for Indian Elections  
**Version**: 2.0 (Restructured)

---

## 📖 Overview

This notebook demonstrates the **completely restructured** Twitter Sentiment Analysis project with **clean architecture principles**. The project has been transformed from a collection of monolithic notebooks into a **production-ready, modular Python package**.

### 🎯 Key Improvements:
- ✅ **Modular Architecture**: Separated concerns into logical modules
- ✅ **Configuration Management**: YAML-based configuration with environment variables
- ✅ **Professional Logging**: Structured logging with file and console output
- ✅ **Type Hints**: Full type annotations for better code quality
- ✅ **Error Handling**: Comprehensive error handling and validation
- ✅ **Packaging**: Proper Python package with setup.py
- ✅ **Documentation**: Comprehensive docstrings and README
- ✅ **Testing Ready**: Structure prepared for unit and integration tests

## 🏗️ New Project Structure

The project has been completely restructured following **clean architecture principles**:

```
Twitter-Sentiment-Analysis-for-Indian-Elections/
├── 📦 src/                          # Source code (production)
│   ├── 📁 data/                     # Data processing modules
│   │   ├── collector.py             # Twitter API data collection
│   │   ├── preprocessor.py          # Text preprocessing & sentiment
│   │   └── __init__.py
│   ├── 📁 models/                   # Machine learning models
│   │   ├── sentiment_models.py      # GloVe & LSTM model classes
│   │   ├── predictor.py             # Prediction engine
│   │   └── __init__.py
│   ├── 📁 utils/                    # Utility modules
│   │   ├── config.py                # Configuration management
│   │   ├── logger.py                # Logging utilities
│   │   ├── visualization.py         # Plotting & visualization
│   │   └── __init__.py
│   └── __init__.py
├── 📁 config/                       # Configuration files
│   ├── config.yaml                  # Main configuration
│   └── .env.template               # Environment variables template
├── 📁 notebooks/                    # Clean analysis notebooks
│   ├── 01_project_overview.ipynb   # This notebook
│   ├── 02_data_collection.ipynb    # Data collection demo
│   ├── 03_model_training.ipynb     # Model training
│   └── 04_predictions.ipynb        # Making predictions
├── 📁 tests/                        # Unit & integration tests
├── 📁 docs/                         # Documentation
├── 📁 scripts/                      # Automation scripts
├── 📁 data/                         # Data storage
├── 📁 models/                       # Saved models
├── 📁 results/                      # Output files
├── setup.py                        # Package setup
├── requirements.txt                 # Dependencies
└── README.md                       # Project documentation
```

In [None]:
# Import system modules
import sys
import os
from pathlib import Path

# Add src directory to Python path for imports
project_root = Path(".").parent
src_path = project_root / "src"
sys.path.insert(0, str(src_path))

print(f"🚀 Project Root: {project_root}")
print(f"📦 Source Path: {src_path}")
print(f"✅ Python path updated for modular imports")

In [None]:
# Import our clean, modular components
from utils.config import config
from utils.logger import get_logger, setup_logging
from data.collector import TwitterDataCollector
from data.preprocessor import TextPreprocessor
from models.sentiment_models import ModelFactory, ModelTrainer
from models.predictor import ElectionPredictor
from utils.visualization import SentimentVisualizer

# Setup logging
setup_logging(log_level="INFO")
logger = get_logger("notebook")

logger.info("🎯 All modules imported successfully!")
logger.info(f"📋 Configuration loaded: {len(config.config)} sections")

print("✅ Clean Architecture Components Loaded:")
print("   🔧 Configuration Management")
print("   📊 Data Collection & Preprocessing") 
print("   🤖 Machine Learning Models")
print("   🔮 Prediction Engine")
print("   📈 Visualization Tools")
print("   📝 Structured Logging")

## 🔧 Configuration Management Demo

The new architecture uses **YAML configuration files** with **environment variable substitution** for secure credential management.

In [None]:
# Demonstrate configuration management
print("🔧 Configuration Management Features:")
print("="*50)

# Show configuration structure
print(f"📋 Configuration Sections: {list(config.config.keys())}")
print()

# Show Twitter API config (safely)
twitter_config = config.get_twitter_config()
print("🐦 Twitter API Configuration:")
for key, value in twitter_config.items():
    if value and len(str(value)) > 10:
        print(f"   {key}: {str(value)[:10]}...")
    else:
        print(f"   {key}: {value or 'Not Set'}")
print()

# Show model configurations
print("🤖 Available Model Configurations:")
for model_name in ['glove', 'lstm']:
    model_config = config.get_model_config(model_name)
    print(f"   {model_name.upper()}: {len(model_config)} parameters")
print()

# Show data paths
data_paths = config.get_data_paths()
print("📁 Data Paths:")
for path_type, path in data_paths.items():
    print(f"   {path_type}: {path}")
print()

# Configuration flexibility demo
print("⚙️ Dynamic Configuration Access:")
print(f"   Max Features: {config.get('models.glove.max_features', 'default')}")
print(f"   Learning Rate: {config.get('models.lstm.training.learning_rate', 'default')}")
print(f"   Batch Size: {config.get('models.lstm.training.batch_size', 'default')}")

logger.info("Configuration management demonstration completed")

## 📊 Data Processing Pipeline Demo

The data processing has been **modularized** into reusable components with **clear separation of concerns**.

In [None]:
# Demonstrate text preprocessing capabilities
preprocessor = TextPreprocessor()

# Sample tweets for demonstration
sample_tweets = [
    "Modi is doing great work for the country! #BJP #Development 😊",
    "Not happy with current govt policies RT @someone",
    "Congress party has better vision for India's future https://example.com",
    "Election 2024 will be interesting to watch #Democracy"
]

print("🧹 Text Preprocessing Pipeline Demo:")
print("="*50)

for i, tweet in enumerate(sample_tweets, 1):
    print(f"\n📝 Tweet {i}:")
    print(f"   Original: {tweet}")
    
    # Clean the text
    clean_text = preprocessor.clean_text(tweet)
    print(f"   Cleaned:  {clean_text}")
    
    # Analyze sentiment
    sentiment = preprocessor.analyze_sentiment(clean_text)
    print(f"   Sentiment: {sentiment['binary_sentiment']} (polarity: {sentiment['polarity']:.3f})")

print(f"\n✅ Preprocessing Features:")
print(f"   🔧 Configurable text cleaning")
print(f"   🎯 Automated sentiment analysis") 
print(f"   📊 Binary classification ready")
print(f"   🔤 Tokenization support")
print(f"   ⚖️ Data balancing utilities")

logger.info("Text preprocessing demonstration completed")

## 🤖 Model Architecture Demo

The model system has been **completely redesigned** with **object-oriented architecture**, **factory patterns**, and **comprehensive training pipelines**.

In [None]:
# Demonstrate model architecture capabilities
print("🤖 Model Architecture Features:")
print("="*50)

# Show available models
print("📋 Available Model Types:")
available_models = ['glove', 'lstm']
for model_type in available_models:
    try:
        model = ModelFactory.create_model(model_type)
        config_params = len(config.get_model_config(model_type))
        print(f"   ✅ {model_type.upper()}: {model.__class__.__name__} ({config_params} config params)")
    except Exception as e:
        print(f"   ❌ {model_type.upper()}: Error - {e}")

print()

# Demonstrate model trainer capabilities
trainer = ModelTrainer()
print("🏋️ Model Trainer Features:")
print(f"   📊 Multi-model comparison support")
print(f"   📈 Training history tracking")
print(f"   💾 Automatic model saving")
print(f"   🔧 Configurable callbacks")
print(f"   📋 Performance evaluation")

print()

# Show configuration for LSTM model
lstm_config = config.get_model_config('lstm')
print("⚙️ LSTM Model Configuration:")
print(f"   Embedding Dimension: {lstm_config.get('embedding_dim', 'default')}")
print(f"   LSTM Units: {lstm_config.get('lstm_units', 'default')}")
print(f"   Bidirectional: {lstm_config.get('architecture', {}).get('bidirectional', 'default')}")
print(f"   Batch Size: {lstm_config.get('training', {}).get('batch_size', 'default')}")
print(f"   Learning Rate: {lstm_config.get('training', {}).get('learning_rate', 'default')}")

print()

# Show GloVe configuration
glove_config = config.get_model_config('glove')
print("⚙️ GloVe Model Configuration:")
print(f"   Embedding Dimension: {glove_config.get('embedding_dim', 'default')}")
print(f"   Max Features: {glove_config.get('max_features', 'default')}")
print(f"   Trainable Embedding: {glove_config.get('trainable', 'default')}")
print(f"   Dense Layers: {glove_config.get('architecture', {}).get('dense_layers', 'default')}")

logger.info("Model architecture demonstration completed")

## 📈 Visualization System Demo

The visualization system provides **professional-grade charts** with **consistent styling** and **automated report generation**.

In [None]:
# Demonstrate visualization capabilities
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

visualizer = SentimentVisualizer()

print("📈 Visualization System Features:")
print("="*50)

# Create sample data for demonstration
np.random.seed(42)
sample_data = pd.DataFrame({
    'Party': ['BJP', 'Congress'],
    'Total Tweets': [2000, 2000],
    'Positive Tweets': [660, 416],
    'Negative Tweets': [1340, 1584],
    'Positive %': [33.0, 20.8],
    'Negative %': [67.0, 79.2],
    'Avg Probability': [0.45, 0.38]
})

print("🎨 Available Visualization Types:")
print("   📊 Sentiment distribution charts")
print("   🔄 Party comparison plots")
print("   📈 Training history graphs")
print("   🎯 Confusion matrix heatmaps")
print("   ☁️ Word clouds")
print("   📋 Comprehensive reports")

print()
print("⚙️ Visualization Configuration:")
viz_config = config.get('visualization', {})
print(f"   Figure Size: {viz_config.get('figure_size', 'default')}")
print(f"   DPI: {viz_config.get('dpi', 'default')}")
print(f"   Style: {viz_config.get('style', 'default')}")
print(f"   Color Palette: {viz_config.get('color_palette', 'default')}")
print(f"   Save Format: {viz_config.get('save_format', 'default')}")

print()
print("📊 Sample Party Comparison Data:")
print(sample_data.to_string(index=False))

# Create a quick visualization demo
fig = visualizer.plot_party_comparison(sample_data, title="Clean Architecture Demo")
plt.show()

logger.info("Visualization system demonstration completed")

## 🔮 Prediction System Demo

The prediction system has been designed for **production deployment** with **batch processing**, **real-time predictions**, and **comprehensive reporting**.

In [None]:
# Demonstrate prediction system (mock demo since no trained models yet)
print("🔮 Prediction System Features:")
print("="*50)

print("🎯 Prediction Capabilities:")
print("   📝 Single text prediction")
print("   📊 Batch text processing")  
print("   🗃️ DataFrame processing")
print("   🏛️ Political party analysis")
print("   📈 Comparative analysis")
print("   💾 Results export (CSV, JSON)")
print("   📋 Automated reporting")

print()
print("⚡ Production Features:")
print("   🔧 Configurable model loading")
print("   🔄 Tokenizer management")
print("   📊 Probability scoring")
print("   🎯 Binary classification")
print("   📈 Performance metrics")
print("   🗂️ Structured output")

# Mock prediction demonstration
sample_prediction_texts = [
    "Modi government's economic policies are working well",
    "Congress needs better leadership for the future",
    "Election commission should ensure fair voting process"
]

print()
print("🧪 Mock Prediction Demo:")
print("(Note: This is a simulation - actual predictions require trained models)")

for i, text in enumerate(sample_prediction_texts, 1):
    # Simulate prediction (actual would use trained model)
    clean_text = preprocessor.clean_text(text)
    sentiment_analysis = preprocessor.analyze_sentiment(clean_text)
    
    print(f"\n📝 Text {i}: {text}")
    print(f"   Cleaned: {clean_text}")
    print(f"   Predicted Sentiment: {'Positive' if sentiment_analysis['binary_sentiment'] else 'Negative'}")
    print(f"   Confidence: {abs(sentiment_analysis['polarity']):.3f}")

print()
print("📋 Election Predictor Features:")
print("   🏛️ Multi-party sentiment analysis")
print("   📊 Comparative statistics")
print("   🎯 Winner prediction")
print("   📈 Margin analysis")
print("   📄 Automated summary generation")

logger.info("Prediction system demonstration completed")

## 🚀 Key Architecture Improvements

### ✅ **Before vs After Comparison**

| **Aspect** | **Before (Monolithic)** | **After (Clean Architecture)** |
|------------|--------------------------|--------------------------------|
| **Structure** | 7 separate notebooks | Modular package with notebooks |
| **Code Reuse** | Copy-paste between notebooks | Importable modules |
| **Configuration** | Hardcoded values | YAML config + env variables |
| **Error Handling** | Basic try-catch | Comprehensive error management |
| **Logging** | Print statements | Structured logging system |
| **Testing** | None | Test-ready structure |
| **Documentation** | Scattered comments | Professional docstrings |
| **Deployment** | Manual notebook running | Package installation |
| **Maintainability** | Low | High |
| **Scalability** | Limited | Production-ready |

### 🎯 **SOLID Principles Applied**

1. **Single Responsibility**: Each module has one clear purpose
2. **Open/Closed**: Extensible without modifying existing code  
3. **Liskov Substitution**: Models are interchangeable via factory pattern
4. **Interface Segregation**: Clean, focused interfaces
5. **Dependency Inversion**: Configuration-driven dependencies

### 🏗️ **Design Patterns Implemented**

- **Factory Pattern**: `ModelFactory` for creating models
- **Strategy Pattern**: Configurable preprocessing strategies
- **Observer Pattern**: Logging and monitoring
- **Template Method**: Base model with customizable implementations
- **Dependency Injection**: Configuration-driven setup

## 📚 Next Steps & Usage Guide

### 🎯 **How to Use the Restructured Project**

1. **📋 Setup Environment**:
   ```bash
   pip install -r requirements.txt
   cp config/.env.template .env
   # Edit .env with your Twitter API credentials
   ```

2. **📊 Data Collection**:
   ```python
   from src.data.collector import TwitterDataCollector
   collector = TwitterDataCollector()
   data = collector.collect_party_data('bjp', 'data/raw/')
   ```

3. **🧹 Data Preprocessing**:
   ```python  
   from src.data.preprocessor import TextPreprocessor
   preprocessor = TextPreprocessor()
   clean_data = preprocessor.preprocess_dataframe(data)
   ```

4. **🤖 Model Training**:
   ```python
   from src.models.sentiment_models import ModelTrainer
   trainer = ModelTrainer() 
   results = trainer.train_model('lstm', X_train, y_train, X_val, y_val, X_test, y_test)
   ```

5. **🔮 Making Predictions**:
   ```python
   from src.models.predictor import ElectionPredictor
   predictor = ElectionPredictor('models/saved/best_model.h5')
   results = predictor.compare_parties({'bjp': bjp_data, 'congress': congress_data})
   ```

### 📖 **Available Notebooks**

- **`01_project_overview.ipynb`**: This overview (you are here!)
- **`02_data_collection.ipynb`**: Twitter data collection walkthrough
- **`03_model_training.ipynb`**: Model training and evaluation
- **`04_predictions.ipynb`**: Making predictions and analysis

### 🛠️ **Development Workflow**

1. **Configuration**: Modify `config/config.yaml` for settings
2. **Development**: Add features to `src/` modules
3. **Testing**: Add tests to `tests/` directory  
4. **Documentation**: Update docstrings and README
5. **Packaging**: Use `setup.py` for distribution

### 🎉 **Benefits Achieved**

✅ **Production Ready**: Professional code structure  
✅ **Maintainable**: Clear separation of concerns  
✅ **Scalable**: Easy to extend and modify  
✅ **Testable**: Structure supports comprehensive testing  
✅ **Configurable**: Environment-specific settings  
✅ **Documented**: Professional documentation standards  
✅ **Reusable**: Modular components for other projects

In [None]:
# Final summary and system status
print("🎉 TWITTER SENTIMENT ANALYSIS - RESTRUCTURED PROJECT SUMMARY")
print("="*70)

print(f"👨‍💻 Author: Lakshya Khetan")
print(f"📧 Email: lakshyaketan00@gmail.com")  
print(f"🔗 GitHub: https://github.com/fusebox440/indian_election")
print(f"📅 Restructured: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}")

print()
print("✅ ARCHITECTURE TRANSFORMATION COMPLETE:")
print("   🏗️ Clean Architecture Principles Applied")
print("   📦 Modular Package Structure Created")  
print("   🔧 Configuration Management Implemented")
print("   📝 Professional Logging System Added")
print("   🎨 Visualization Framework Built")
print("   🤖 Model Factory Pattern Implemented")
print("   🔮 Production-Ready Prediction Engine")
print("   📚 Comprehensive Documentation")
print("   🧪 Test-Ready Structure")
print("   🚀 Professional Setup & Packaging")

print()
print("📊 PROJECT METRICS:")
print(f"   📁 Source Modules: 8+ professional modules")
print(f"   🔧 Configuration Files: 2 (YAML + env template)")
print(f"   📓 Clean Notebooks: 4 analysis notebooks")
print(f"   📋 Documentation: Professional README + docstrings")  
print(f"   🧪 Test Structure: Ready for comprehensive testing")
print(f"   📦 Package Setup: Complete with setup.py")

print()
print("🚀 READY FOR:")
print("   🔬 Scientific Research & Publications")
print("   🏭 Production Deployment")
print("   👥 Team Collaboration")
print("   📈 Scalable Extensions")
print("   🔧 Continuous Integration")
print("   📊 Performance Monitoring")

logger.info("Project restructuring demonstration completed successfully!")
logger.info("Twitter Sentiment Analysis project is now production-ready with clean architecture!")

print("\n" + "="*70)
print("🎯 PROJECT RESTRUCTURING: ✅ COMPLETE")
print("="*70)