# ContextFlow Hybrid Chatbot Project

A sophisticated hybrid chatbot system combining **LSTM** and **Transformer** architectures with advanced inference strategies.

---

## Table of Contents
1. [Project Overview](#overview)
2. [Architecture](#architecture)
3. [Data Preparation](#data)
4. [Model Implementation](#model)
5. [Training Pipeline](#training)
6. [Advanced Inference](#inference)
7. [Web Application](#webapp)
8. [Deployment](#deployment)
9. [Testing & Evaluation](#testing)

---
## 1. Project Overview <a id="overview"></a>

### Features
- **Hybrid Model**: Combines LSTM for sequential processing and Transformer for attention mechanisms
- **Advanced Inference**: Beam Search, Temperature Sampling, Constrained Decoding
- **Context Awareness**: Maintains conversation history and context
- **Web Interface**: Flask backend with responsive UI
- **Docker Support**: Containerized deployment

### Tech Stack
- **Deep Learning**: PyTorch
- **Backend**: Flask
- **Frontend**: HTML/CSS/JavaScript
- **Deployment**: Docker, Docker Compose

---
## 2. Architecture <a id="architecture"></a>

### Hybrid Model Architecture

```
Input Text
    ↓
Tokenization
    ↓
Embedding Layer (256-dim)
    ↓
LSTM Layers (2 layers, 512 hidden)
    ↓
Transformer Layers (4 layers, 8 heads)
    ↓
Output Layer (vocab_size)
    ↓
Generated Response
```

In [None]:
# Import required libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import pickle
import os

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

---
## 3. Data Preparation <a id="data"></a>

### Custom Tokenizer Implementation

In [None]:
class Tokenizer:
    """Custom tokenizer for chatbot training"""
    
    def __init__(self):
        self.word2idx = {'<PAD>': 0, '<SOS>': 1, '<EOS>': 2, '<UNK>': 3}
        self.idx2word = {0: '<PAD>', 1: '<SOS>', 2: '<EOS>', 3: '<UNK>'}
        self.vocab_size = 4
    
    def fit(self, texts):
        """Build vocabulary from texts"""
        for text in texts:
            tokens = text.lower().split()
            for token in tokens:
                if token not in self.word2idx:
                    self.word2idx[token] = self.vocab_size
                    self.idx2word[self.vocab_size] = token
                    self.vocab_size += 1
    
    def encode(self, text):
        """Convert text to token IDs"""
        tokens = text.lower().split()
        return [self.word2idx.get(token, self.word2idx['<UNK>']) for token in tokens]
    
    def decode(self, token_ids):
        """Convert token IDs back to text"""
        tokens = [self.idx2word.get(idx, '<UNK>') for idx in token_ids]
        return ' '.join([t for t in tokens if t not in ['<PAD>', '<SOS>', '<EOS>']])
    
    def save(self, path):
        """Save tokenizer to file"""
        with open(path, 'wb') as f:
            pickle.dump(self, f)
    
    @staticmethod
    def load(path):
        """Load tokenizer from file"""
        with open(path, 'rb') as f:
            return pickle.load(f)
    
    def sos_id(self):
        return self.word2idx['<SOS>']
    
    def eos_id(self):
        return self.word2idx['<EOS>']
    
    def pad_id(self):
        return self.word2idx['<PAD>']

### Load and Prepare Training Data

In [None]:
# Load training data
data_path = 'data/merged_training_data.csv'

if os.path.exists(data_path):
    df = pd.read_csv(data_path)
    print(f"Loaded {len(df)} conversation pairs")
    print("\nSample data:")
    print(df.head())
    
    # Data statistics
    print("\n=== Data Statistics ===")
    print(f"Total pairs: {len(df)}")
    if 'input' in df.columns and 'response' in df.columns:
        print(f"Avg input length: {df['input'].str.split().str.len().mean():.2f} words")
        print(f"Avg response length: {df['response'].str.split().str.len().mean():.2f} words")
else:
    print(f"Data file not found at {data_path}")
    print("Creating sample data for demonstration...")
    
    # Sample data for demonstration
    sample_data = {
        'input': [
            'hello',
            'what is machine learning',
            'how are you',
            'what is deep learning',
            'explain neural networks'
        ],
        'response': [
            'Hello! How can I help you?',
            'Machine learning is a field of AI where computers learn from data',
            'I am doing well, thanks for asking!',
            'Deep learning uses neural networks with multiple layers',
            'Neural networks are computing systems inspired by biological brains'
        ]
    }
    df = pd.DataFrame(sample_data)

In [None]:
# Initialize and fit tokenizer
tokenizer = Tokenizer()

if 'input' in df.columns and 'response' in df.columns:
    all_texts = df['input'].tolist() + df['response'].tolist()
    tokenizer.fit(all_texts)
    print(f"Vocabulary size: {tokenizer.vocab_size}")
    print(f"Sample tokens: {list(tokenizer.word2idx.keys())[:20]}")

### Dataset Class

In [None]:
class ChatbotDataset(Dataset):
    """Custom dataset for chatbot training"""
    
    def __init__(self, inputs, responses, tokenizer, max_length=50):
        self.inputs = inputs
        self.responses = responses
        self.tokenizer = tokenizer
        self.max_length = max_length
    
    def __len__(self):
        return len(self.inputs)
    
    def __getitem__(self, idx):
        input_text = self.inputs[idx]
        response_text = self.responses[idx]
        
        # Encode
        input_ids = self.tokenizer.encode(input_text)
        response_ids = [self.tokenizer.sos_id()] + self.tokenizer.encode(response_text) + [self.tokenizer.eos_id()]
        
        # Pad or truncate
        input_ids = input_ids[:self.max_length]
        response_ids = response_ids[:self.max_length]
        
        input_ids += [self.tokenizer.pad_id()] * (self.max_length - len(input_ids))
        response_ids += [self.tokenizer.pad_id()] * (self.max_length - len(response_ids))
        
        return {
            'input_ids': torch.tensor(input_ids, dtype=torch.long),
            'response_ids': torch.tensor(response_ids, dtype=torch.long)
        }

# Create dataset
if 'input' in df.columns and 'response' in df.columns:
    dataset = ChatbotDataset(
        df['input'].tolist(),
        df['response'].tolist(),
        tokenizer,
        max_length=50
    )
    
    # Create dataloader
    dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
    print(f"Created dataset with {len(dataset)} samples")
    print(f"Batch size: 32")

---
## 4. Model Implementation <a id="model"></a>

### Hybrid LSTM-Transformer Model

In [None]:
class HybridChatbotModel(nn.Module):
    """Hybrid model combining LSTM and Transformer architectures"""
    
    def __init__(self, vocab_size, embedding_dim=256, hidden_dim=512, 
                 num_layers=2, num_heads=8, dropout=0.1):
        super(HybridChatbotModel, self).__init__()
        
        # Embedding layer
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
        
        # LSTM layers for sequential processing
        self.lstm = nn.LSTM(
            embedding_dim,
            hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0,
            bidirectional=False
        )
        
        # Transformer encoder layers for attention
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=hidden_dim,
            nhead=num_heads,
            dim_feedforward=hidden_dim * 4,
            dropout=dropout,
            batch_first=True
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=4)
        
        # Output layer
        self.fc_out = nn.Linear(hidden_dim, vocab_size)
        self.dropout = nn.Dropout(dropout)
        
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers
    
    def forward(self, x, hidden=None):
        # Embedding
        embedded = self.embedding(x)
        embedded = self.dropout(embedded)
        
        # LSTM processing
        if hidden is None:
            lstm_out, hidden = self.lstm(embedded)
        else:
            lstm_out, hidden = self.lstm(embedded, hidden)
        
        # Transformer processing
        transformer_out = self.transformer(lstm_out)
        
        # Output projection
        output = self.fc_out(transformer_out)
        
        return output
    
    def init_hidden(self, batch_size, device):
        """Initialize hidden states"""
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(device)
        c0 = torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(device)
        return (h0, c0)

# Model configuration
MODEL_CONFIG = {
    'embedding_dim': 256,
    'hidden_dim': 512,
    'num_layers': 2,
    'num_heads': 8,
    'dropout': 0.1
}

# Initialize model
model = HybridChatbotModel(
    vocab_size=tokenizer.vocab_size,
    **MODEL_CONFIG
).to(device)

# Model summary
print("=== Model Architecture ===")
print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

---
## 5. Training Pipeline <a id="training"></a>

In [None]:
# Training configuration
EPOCHS = 10
LEARNING_RATE = 0.001
CLIP_GRAD = 1.0

# Loss and optimizer
criterion = nn.CrossEntropyLoss(ignore_index=tokenizer.pad_id())
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=2)

print(f"Training configuration:")
print(f"  Epochs: {EPOCHS}")
print(f"  Learning rate: {LEARNING_RATE}")
print(f"  Gradient clipping: {CLIP_GRAD}")

In [None]:
def train_epoch(model, dataloader, criterion, optimizer, device, clip_grad=1.0):
    """Train for one epoch"""
    model.train()
    total_loss = 0
    
    progress_bar = tqdm(dataloader, desc='Training')
    for batch in progress_bar:
        input_ids = batch['input_ids'].to(device)
        response_ids = batch['response_ids'].to(device)
        
        # Forward pass
        optimizer.zero_grad()
        output = model(response_ids[:, :-1])
        
        # Calculate loss
        loss = criterion(
            output.reshape(-1, output.shape[-1]),
            response_ids[:, 1:].reshape(-1)
        )
        
        # Backward pass
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_grad)
        optimizer.step()
        
        total_loss += loss.item()
        progress_bar.set_postfix({'loss': loss.item()})
    
    return total_loss / len(dataloader)

def evaluate(model, dataloader, criterion, device):
    """Evaluate model"""
    model.eval()
    total_loss = 0
    
    with torch.no_grad():
        for batch in dataloader:
            input_ids = batch['input_ids'].to(device)
            response_ids = batch['response_ids'].to(device)
            
            output = model(response_ids[:, :-1])
            loss = criterion(
                output.reshape(-1, output.shape[-1]),
                response_ids[:, 1:].reshape(-1)
            )
            
            total_loss += loss.item()
    
    return total_loss / len(dataloader)

In [None]:
# Training loop
train_losses = []
best_loss = float('inf')

print("\n=== Starting Training ===")
for epoch in range(EPOCHS):
    print(f"\nEpoch {epoch + 1}/{EPOCHS}")
    
    # Train
    train_loss = train_epoch(model, dataloader, criterion, optimizer, device, CLIP_GRAD)
    train_losses.append(train_loss)
    
    # Update learning rate
    scheduler.step(train_loss)
    
    print(f"Train Loss: {train_loss:.4f}")
    print(f"Learning Rate: {optimizer.param_groups[0]['lr']:.6f}")
    
    # Save best model
    if train_loss < best_loss:
        best_loss = train_loss
        os.makedirs('models/checkpoints', exist_ok=True)
        torch.save({
            'epoch': epoch + 1,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': train_loss,
        }, f'models/checkpoints/checkpoint_epoch_{epoch + 1}.pt')
        print(f"✅ Saved checkpoint (best loss: {best_loss:.4f})")

print("\n=== Training Complete ===")

In [None]:
# Plot training loss
plt.figure(figsize=(10, 6))
plt.plot(range(1, len(train_losses) + 1), train_losses, marker='o', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.title('Training Loss Over Epochs', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

---
## 6. Advanced Inference <a id="inference"></a>

### Inference Strategies

1. **Beam Search**: Explores multiple hypotheses in parallel
2. **Temperature Sampling**: Generates diverse, creative responses
3. **Constrained Decoding**: Avoids forbidden words/phrases
4. **Ensemble Method**: Combines multiple strategies

In [None]:
class AdvancedInference:
    """Advanced inference strategies for better response generation"""
    
    def __init__(self, model, tokenizer, device='cuda'):
        self.model = model
        self.tokenizer = tokenizer
        self.device = device
        self.model.eval()
    
    def beam_search(self, input_text, beam_width=5, max_length=50, temperature=0.7):
        """
        Beam Search decoding - explores multiple hypotheses in parallel
        
        Advantages:
        - Better quality responses
        - Explores diverse paths
        - More natural language
        """
        input_ids = torch.tensor([self.tokenizer.encode(input_text)], dtype=torch.long).to(self.device)
        
        # Initialize beams
        initial_tokens = input_ids[0].tolist()
        beams = [{'tokens': initial_tokens, 'score': 0.0}]
        completed_beams = []
        
        with torch.no_grad():
            for step in range(max_length):
                candidates = []
                
                for beam in beams:
                    beam_tokens = torch.tensor([beam['tokens']], dtype=torch.long).to(self.device)
                    
                    # Get model output
                    output = self.model(beam_tokens)
                    logits = output[0, -1, :] / temperature
                    
                    # Get top-k probabilities
                    probs = torch.softmax(logits, dim=-1)
                    top_probs, top_indices = torch.topk(probs, beam_width)
                    
                    for prob, token_id in zip(top_probs, top_indices):
                        new_beam = {
                            'tokens': beam['tokens'] + [token_id.item()],
                            'score': beam['score'] + torch.log(prob).item()
                        }
                        candidates.append(new_beam)
                
                # Keep top beam_width candidates
                candidates.sort(key=lambda x: x['score'], reverse=True)
                beams = candidates[:beam_width]
                
                # Check if we should stop
                if all(beam['tokens'][-1] == self.tokenizer.eos_id() for beam in beams):
                    completed_beams = beams
                    break
            
            if not completed_beams:
                completed_beams = beams
        
        # Decode best beam
        best_tokens = completed_beams[0]['tokens'][len(initial_tokens):]
        response = self.tokenizer.decode(best_tokens)
        return response
    
    def temperature_sampling(self, input_text, temperature=0.8, top_k=50, 
                           top_p=0.9, max_length=50):
        """
        Temperature Sampling - generates diverse, creative responses
        
        Parameters:
        - temperature: 0.0 = deterministic, 1.0 = normal, >1.0 = creative
        - top_k: Keep only top K tokens
        - top_p: Keep tokens with cumulative prob <= p (nucleus sampling)
        """
        input_ids = torch.tensor([self.tokenizer.encode(input_text)], dtype=torch.long).to(self.device)
        
        generated_tokens = []
        current_input = input_ids.clone()
        
        with torch.no_grad():
            for _ in range(max_length):
                output = self.model(current_input)
                logits = output[0, -1, :] / temperature
                
                # Top-K filtering
                if top_k > 0:
                    indices_to_remove = torch.topk(logits, top_k, largest=False).indices
                    logits[indices_to_remove] = float('-inf')
                
                # Top-P (Nucleus) filtering
                sorted_logits, sorted_indices = torch.sort(logits, descending=True)
                cumsum_probs = torch.cumsum(torch.softmax(sorted_logits, dim=-1), dim=0)
                sorted_indices_to_remove = cumsum_probs > top_p
                sorted_indices_to_remove[0] = False
                indices_to_remove = sorted_indices[sorted_indices_to_remove]
                logits[indices_to_remove] = float('-inf')
                
                # Sample
                probs = torch.softmax(logits, dim=-1)
                next_token = torch.multinomial(probs, 1).item()
                
                generated_tokens.append(next_token)
                
                if next_token == self.tokenizer.eos_id():
                    break
                
                current_input = torch.cat([
                    current_input,
                    torch.tensor([[next_token]], dtype=torch.long).to(self.device)
                ], dim=1)
        
        response = self.tokenizer.decode(generated_tokens)
        return response

# Initialize inference engine
inference_engine = AdvancedInference(model, tokenizer, device)
print("✅ Inference engine initialized")

### Test Inference Methods

In [None]:
# Test different inference methods
test_input = "what is machine learning"

print(f"Input: {test_input}\n")

# Beam Search
print("=== Beam Search ===")
response_beam = inference_engine.beam_search(test_input, beam_width=5)
print(f"Response: {response_beam}\n")

# Temperature Sampling
print("=== Temperature Sampling ===")
response_temp = inference_engine.temperature_sampling(test_input, temperature=0.8)
print(f"Response: {response_temp}\n")

---
## 7. Web Application <a id="webapp"></a>

### Flask Backend Structure

```python
# backend/app_advanced.py
from flask import Flask, render_template, request, jsonify
from backend.inference import AdvancedInference

app = Flask(__name__)

# Initialize model and inference engine
# ...

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/api/chat', methods=['POST'])
def chat():
    data = request.json
    user_message = data.get('message', '')
    method = data.get('method', 'beam_search')
    
    # Generate response
    if method == 'beam_search':
        response = inference_engine.beam_search(user_message)
    else:
        response = inference_engine.temperature_sampling(user_message)
    
    return jsonify({'bot_response': response})
```

### API Endpoints

#### POST /api/chat

**Request:**
```json
{
    "message": "Hello",
    "session_id": "user_123",
    "method": "beam_search"
}
```

**Response:**
```json
{
    "bot_response": "Hello! How can I help you?",
    "session_id": "user_123"
}
```

---
## 8. Deployment <a id="deployment"></a>

### Docker Configuration

#### Dockerfile
```dockerfile
FROM python:3.9-slim

WORKDIR /app

COPY backend/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 5000

CMD ["python", "backend/app_advanced.py"]
```

#### docker-compose.yml
```yaml
version: '3.8'

services:
  chatbot:
    build: .
    ports:
      - "5000:5000"
    volumes:
      - ./models:/app/models
    environment:
      - FLASK_ENV=production
```

### Deployment Commands

```bash
# Build and run with Docker Compose
docker-compose up --build

# Access at http://localhost:5000
```

---
## 9. Testing & Evaluation <a id="testing"></a>

In [None]:
# Interactive testing
def interactive_chat():
    """Interactive chat session"""
    print("\n=== ContextFlow Chatbot ===")
    print("Type 'quit' to exit\n")
    
    while True:
        user_input = input("You: ").strip()
        
        if user_input.lower() == 'quit':
            print("Goodbye!")
            break
        
        if not user_input:
            continue
        
        # Generate response
        response = inference_engine.beam_search(user_input)
        print(f"Bot: {response}\n")

# Uncomment to run interactive chat
# interactive_chat()

In [None]:
# Batch testing
test_cases = [
    "hello",
    "what is machine learning",
    "how are you",
    "explain deep learning",
    "what is a neural network"
]

print("=== Batch Testing ===")
for i, test_input in enumerate(test_cases, 1):
    print(f"\nTest {i}:")
    print(f"Input: {test_input}")
    
    response = inference_engine.beam_search(test_input)
    print(f"Response: {response}")
    print("-" * 50)

### Performance Metrics

In [None]:
import time

# Measure inference time
def measure_inference_time(inference_engine, test_input, method='beam_search', num_runs=10):
    """Measure average inference time"""
    times = []
    
    for _ in range(num_runs):
        start_time = time.time()
        
        if method == 'beam_search':
            _ = inference_engine.beam_search(test_input)
        else:
            _ = inference_engine.temperature_sampling(test_input)
        
        end_time = time.time()
        times.append(end_time - start_time)
    
    avg_time = np.mean(times)
    std_time = np.std(times)
    
    print(f"\n=== Inference Time ({method}) ===")
    print(f"Average: {avg_time:.4f}s")
    print(f"Std Dev: {std_time:.4f}s")
    print(f"Min: {min(times):.4f}s")
    print(f"Max: {max(times):.4f}s")
    
    return times

# Measure performance
test_input = "what is machine learning"
beam_times = measure_inference_time(inference_engine, test_input, 'beam_search', num_runs=5)
temp_times = measure_inference_time(inference_engine, test_input, 'temperature_sampling', num_runs=5)

In [None]:
# Visualize inference times
fig, ax = plt.subplots(1, 2, figsize=(14, 5))

# Beam Search times
ax[0].bar(range(1, len(beam_times) + 1), beam_times, color='steelblue', alpha=0.7)
ax[0].axhline(np.mean(beam_times), color='red', linestyle='--', label=f'Mean: {np.mean(beam_times):.4f}s')
ax[0].set_xlabel('Run', fontsize=11)
ax[0].set_ylabel('Time (seconds)', fontsize=11)
ax[0].set_title('Beam Search Inference Time', fontsize=13, fontweight='bold')
ax[0].legend()
ax[0].grid(True, alpha=0.3)

# Temperature Sampling times
ax[1].bar(range(1, len(temp_times) + 1), temp_times, color='coral', alpha=0.7)
ax[1].axhline(np.mean(temp_times), color='red', linestyle='--', label=f'Mean: {np.mean(temp_times):.4f}s')
ax[1].set_xlabel('Run', fontsize=11)
ax[1].set_ylabel('Time (seconds)', fontsize=11)
ax[1].set_title('Temperature Sampling Inference Time', fontsize=13, fontweight='bold')
ax[1].legend()
ax[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---
## Summary

This notebook demonstrates the complete ContextFlow Hybrid Chatbot project:

1. ✅ **Custom Tokenizer** - Built vocabulary from training data
2. ✅ **Hybrid Model** - Combined LSTM and Transformer architectures
3. ✅ **Training Pipeline** - Trained model with gradient clipping and learning rate scheduling
4. ✅ **Advanced Inference** - Implemented Beam Search and Temperature Sampling
5. ✅ **Web Application** - Flask backend with REST API
6. ✅ **Docker Deployment** - Containerized application
7. ✅ **Testing & Evaluation** - Performance metrics and benchmarking

### Key Features
- **Hybrid Architecture**: Combines sequential (LSTM) and attention (Transformer) mechanisms
- **Multiple Inference Strategies**: Beam Search, Temperature Sampling, Constrained Decoding
- **Context Awareness**: Maintains conversation history
- **Production Ready**: Docker deployment with Flask backend

### Next Steps
1. Fine-tune hyperparameters
2. Expand training dataset
3. Implement more advanced context management
4. Add user authentication
5. Deploy to cloud platform (AWS, GCP, Azure)