# 2. Transformer Architecture Advantages for POI
Captures long-range dependencies between POI visits
Understands relationships between non-consecutive visits
Can identify periodic patterns (weekly grocery shopping, monthly haircuts)
Potential Architecture Approaches:

# Conceptual architecture
class POITransformer:
    - Embedding layers:
        * POI ID embedding
        * Category embedding (restaurant, gym, store)
        * Time embedding (hour, day, season)
        * Geographical embedding (lat/long or region)
    
    - Transformer blocks:
        * Multi-head attention for POI relationships
        * Position encoding for sequence order
        * Temporal attention for time patterns
    
    - Output heads:
        * Next POI classification
        * Time-to-next-visit regression
        * Category prediction

**This advanced implementation demonstrates the key advantages of Transformers for POI prediction:**

Key Features:
1. Multi-Modal Embeddings
POI ID embedding
Category embedding (food, work, fitness, etc.)
Temporal embeddings (hour, day, season)
Geographical embedding (latitude/longitude)
Region embedding
2. Self-Attention Benefits
Long-range dependencies: The model can relate POI visits that are far apart in the sequence
Periodic patterns: Identifies weekly gym visits, monthly haircuts
Complex relationships: Understands that coffee shop → office, restaurant → cinema
3. Multiple Output Heads
Next POI prediction
Category prediction
Time-to-next-visit regression
4. Advanced Features
Temporal positional encoding for cyclical patterns
Causal masking for autoregressive prediction
Attention weight analysis for interpretability
Rich synthetic data with realistic patterns
5. Demonstrated Advantages
The attention analysis shows which past POIs influence future predictions
Periodic pattern detection (e.g., weekly shopping, bi-weekly gym)
Context-aware predictions based on time of day and day of week
The model is optimized for Mac M1 with a small dataset (200 sequences) and demonstrates how Transformers excel at capturing complex spatiotemporal patterns in POI trajectories.


In [1]:
"""
poi_transformer_advanced.py
Advanced Transformer Architecture for POI Prediction
Fixed version with proper references
"""

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import random
import math
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass
from collections import defaultdict
import warnings
warnings.filterwarnings('ignore')

# Set device for Mac M1
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")

# Set random seeds
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)

# ==================== Data Structures ====================

@dataclass
class POI:
    """Point of Interest data structure"""
    poi_id: int
    name: str
    category: int
    lat: float
    lon: float
    region: int

@dataclass
class Visit:
    """Visit data structure"""
    poi_id: int
    timestamp: int  # hour of day
    day_of_week: int
    season: int
    duration: float  # hours

Using device: cpu


In [None]:
# ==================== Advanced Data Generator ====================

class AdvancedPOIDataGenerator:
    """Generate rich synthetic POI data with patterns"""
    
    def __init__(self):
        # POI categories
        self.categories = {
            'food': 0,
            'work': 1,
            'fitness': 2,
            'shopping': 3,
            'entertainment': 4,
            'personal': 5,
            'transport': 6
        }
        
        # Detailed POI types with categories
        self.poi_database = {
            0: POI(0, 'home', self.categories['personal'], 40.7128, -74.0060, 0),
            1: POI(1, 'office', self.categories['work'], 40.7580, -73.9855, 1),
            2: POI(2, 'starbucks', self.categories['food'], 40.7489, -73.9680, 1),
            3: POI(3, 'gym', self.categories['fitness'], 40.7614, -73.9776, 1),
            4: POI(4, 'italian_restaurant', self.categories['food'], 40.7431, -73.9897, 2),
            5: POI(5, 'sushi_bar', self.categories['food'], 40.7516, -73.9755, 1),
            6: POI(6, 'walmart', self.categories['shopping'], 40.7424, -74.0055, 2),
            7: POI(7, 'cinema', self.categories['entertainment'], 40.7580, -73.9855, 1),
            8: POI(8, 'barber', self.categories['personal'], 40.7489, -73.9680, 0),
            9: POI(9, 'subway_station', self.categories['transport'], 40.7527, -73.9772, 1),
            10: POI(10, 'central_park', self.categories['entertainment'], 40.7829, -73.9654, 3),
            11: POI(11, 'pharmacy', self.categories['personal'], 40.7480, -73.9870, 0),
            12: POI(12, 'bank', self.categories['personal'], 40.7505, -73.9934, 1),
        }
        
        # Temporal patterns
        self.time_patterns = {
            'morning': (6, 12),
            'afternoon': (12, 17),
            'evening': (17, 21),
            'night': (21, 6)
        }
        
        # Periodic patterns (e.g., weekly gym, monthly barber)
        self.periodic_patterns = {
            'gym': {'frequency': 'bi-weekly', 'preferred_times': ['morning', 'evening']},
            'barber': {'frequency': 'monthly', 'preferred_times': ['afternoon']},
            'walmart': {'frequency': 'weekly', 'preferred_times': ['afternoon', 'evening']},
            'cinema': {'frequency': 'weekly', 'preferred_times': ['evening', 'night']},
        }
        
        # Long-range dependencies
        self.dependency_patterns = {
            'starbucks': ['office'],  # Coffee before work
            'gym': ['home', 'office'],  # Gym after work or from home
            'italian_restaurant': ['cinema'],  # Dinner and movie
            'pharmacy': ['home'],  # Medicine then home
        }
        
        self.num_pois = len(self.poi_database)
        self.num_categories = len(self.categories)
        self.num_regions = 4
        self.num_seasons = 4
        
    def calculate_distance(self, poi1: POI, poi2: POI) -> float:
        """Calculate distance between two POIs (simplified)"""
        return math.sqrt((poi1.lat - poi2.lat)**2 + (poi1.lon - poi2.lon)**2)
    
    def generate_trajectory_with_patterns(self, length: int = 20) -> Dict:
        """Generate trajectory with realistic patterns"""
        trajectory = []
        current_hour = 7
        current_day = random.randint(0, 6)
        current_season = random.randint(0, 3)
        visit_history = defaultdict(list)
        
        # Start from home
        current_poi_id = 0
        
        for step in range(length):
            # Record visit
            visit = Visit(
                poi_id=current_poi_id,
                timestamp=current_hour,
                day_of_week=current_day,
                season=current_season,
                duration=random.uniform(0.5, 3.0)
            )
            trajectory.append(visit)
            visit_history[self.poi_database[current_poi_id].name].append(step)
            
            # Determine next POI based on patterns
            next_poi_id = self._select_next_poi(
                current_poi_id, 
                current_hour, 
                current_day,
                visit_history,
                step
            )
            
            # Update time
            current_hour = (current_hour + int(visit.duration) + 1) % 24
            if current_hour < 7:  # New day
                current_day = (current_day + 1) % 7
            
            current_poi_id = next_poi_id
        
        return self._extract_features(trajectory)
    
    def _select_next_poi(self, current_poi_id: int, hour: int, day: int, 
                        history: Dict, step: int) -> int:
        """Select next POI based on multiple factors"""
        current_poi = self.poi_database[current_poi_id]
        candidates = []
        weights = []
        
        for poi_id, poi in self.poi_database.items():
            if poi_id == current_poi_id:
                continue
                
            weight = 1.0
            
            # Distance factor (prefer closer POIs)
            distance = self.calculate_distance(current_poi, poi)
            weight *= max(0.1, 1.0 - distance * 10)
            
            # Time appropriateness
            if 6 <= hour < 12 and poi.category == self.categories['food']:
                weight *= 2.0
            elif 12 <= hour < 14 and poi.category == self.categories['food']:
                weight *= 3.0
            elif 17 <= hour < 21 and poi.category in [self.categories['entertainment'], 
                                                       self.categories['food']]:
                weight *= 2.5
            
            # Periodic patterns
            if poi.name in self.periodic_patterns:
                pattern = self.periodic_patterns[poi.name]
                if poi.name not in history or not history[poi.name]:
                    weight *= 1.5
                elif pattern['frequency'] == 'weekly' and step - history[poi.name][-1] > 7:
                    weight *= 3.0
                elif pattern['frequency'] == 'monthly' and step - history[poi.name][-1] > 30:
                    weight *= 4.0
            
            # Long-range dependencies
            if current_poi.name in self.dependency_patterns.get(poi.name, []):
                weight *= 2.0
            
            candidates.append(poi_id)
            weights.append(weight)
        
        # Normalize weights
        total_weight = sum(weights)
        weights = [w/total_weight for w in weights]
        
        return random.choices(candidates, weights=weights)[0]
    
    def _extract_features(self, trajectory: List[Visit]) -> Dict:
        """Extract features from trajectory"""
        return {
            'poi_ids': [v.poi_id for v in trajectory],
            'categories': [self.poi_database[v.poi_id].category for v in trajectory],
            'timestamps': [v.timestamp for v in trajectory],
            'days': [v.day_of_week for v in trajectory],
            'seasons': [v.season for v in trajectory],
            'regions': [self.poi_database[v.poi_id].region for v in trajectory],
            'lats': [self.poi_database[v.poi_id].lat for v in trajectory],
            'lons': [self.poi_database[v.poi_id].lon for v in trajectory],
            'durations': [v.duration for v in trajectory]
        }
    
    def generate_dataset(self, num_sequences: int = 300) -> List[Dict]:
        """Generate dataset"""
        dataset = []
        for _ in range(num_sequences):
            dataset.append(self.generate_trajectory_with_patterns())
        return dataset

In [3]:
# ==================== Improved Multi-Modal Embedding ====================

class MultiModalEmbedding(nn.Module):
    """Enhanced multi-modal embedding with proper fusion"""
    
    def __init__(self, config):
        super().__init__()
        self.config = config
        d_model = config['d_model']
        
        # ============ Modality-specific embeddings ============
        self.poi_embedding = nn.Embedding(config['num_pois'], config['poi_embed_dim'])
        self.category_embedding = nn.Embedding(config['num_categories'], config['cat_embed_dim'])
        self.time_embedding = nn.Embedding(24, config['time_embed_dim'])
        self.day_embedding = nn.Embedding(7, config['day_embed_dim'])
        self.season_embedding = nn.Embedding(4, config['season_embed_dim'])
        self.region_embedding = nn.Embedding(config['num_regions'], config['region_embed_dim'])
        
        # Geographical embedding (continuous)
        self.geo_projection = nn.Sequential(
            nn.Linear(2, config['geo_embed_dim']),
            nn.LayerNorm(config['geo_embed_dim']),
            nn.ReLU()
        )
        
        # ============ Modality-specific projections to common space ============
        # Project each modality to d_model with normalization
        self.poi_proj = nn.Sequential(
            nn.Linear(config['poi_embed_dim'], d_model),
            nn.LayerNorm(d_model)
        )
        
        self.cat_proj = nn.Sequential(
            nn.Linear(config['cat_embed_dim'], d_model),
            nn.LayerNorm(d_model)
        )
        
        self.temporal_proj = nn.Sequential(
            nn.Linear(config['time_embed_dim'] + config['day_embed_dim'] + 
                     config['season_embed_dim'], d_model),
            nn.LayerNorm(d_model)
        )
        
        self.spatial_proj = nn.Sequential(
            nn.Linear(config['region_embed_dim'] + config['geo_embed_dim'], d_model),
            nn.LayerNorm(d_model)
        )
        
        # ============ Learnable modality weights (gating) ============
        self.modality_weights = nn.Parameter(torch.ones(4))  # 4 modality groups
        
        # ============ Non-linear fusion network ============
        self.fusion_network = nn.Sequential(
            nn.Linear(d_model, d_model * 2),
            nn.LayerNorm(d_model * 2),
            nn.GELU(),  # Better than ReLU for transformers
            nn.Dropout(config.get('dropout', 0.1)),
            nn.Linear(d_model * 2, d_model),
            nn.LayerNorm(d_model)
        )
        
        # Optional: Cross-modal attention for richer interactions
        self.use_cross_attention = config.get('use_cross_attention', False)
        if self.use_cross_attention:
            self.cross_modal_attention = CrossModalAttention(d_model, num_heads=4)
    
    def forward(self, poi_ids, categories, timestamps, days, seasons, regions, coords):
        # ============ Get embeddings ============
        poi_emb = self.poi_embedding(poi_ids)
        cat_emb = self.category_embedding(categories)
        time_emb = self.time_embedding(timestamps)
        day_emb = self.day_embedding(days)
        season_emb = self.season_embedding(seasons)
        region_emb = self.region_embedding(regions)
        geo_emb = self.geo_projection(coords)
        
        # ============ Group and project modalities ============
        # POI identity
        poi_features = self.poi_proj(poi_emb)
        
        # Categorical features
        cat_features = self.cat_proj(cat_emb)
        
        # Temporal features (combined)
        temporal_combined = torch.cat([time_emb, day_emb, season_emb], dim=-1)
        temporal_features = self.temporal_proj(temporal_combined)
        
        # Spatial features (combined)
        spatial_combined = torch.cat([region_emb, geo_emb], dim=-1)
        spatial_features = self.spatial_proj(spatial_combined)
        
        # ============ Apply learnable modality weights ============
        weights = torch.softmax(self.modality_weights, dim=0)
        
        # Weighted combination of modalities
        fused = (weights[0] * poi_features + 
                weights[1] * cat_features + 
                weights[2] * temporal_features + 
                weights[3] * spatial_features)
        
        # ============ Non-linear fusion ============
        fused = self.fusion_network(fused)
        
        # ============ Optional: Cross-modal attention ============
        if self.use_cross_attention:
            modalities = torch.stack([
                poi_features, cat_features, 
                temporal_features, spatial_features
            ], dim=1)  # (batch, 4, seq_len, d_model)
            fused = self.cross_modal_attention(fused, modalities)
        
        return fused

class CrossModalAttention(nn.Module):
    """Cross-modal attention for richer interactions"""
    
    def __init__(self, d_model, num_heads=4):
        super().__init__()
        self.attention = nn.MultiheadAttention(
            d_model, num_heads, batch_first=True
        )
        self.norm = nn.LayerNorm(d_model)
        
    def forward(self, query, modalities):
        """
        query: (batch, seq_len, d_model) - fused features
        modalities: (batch, num_modalities, seq_len, d_model)
        """
        batch_size, num_mod, seq_len, d_model = modalities.shape
        
        # Reshape for attention
        modalities_flat = modalities.view(batch_size, -1, d_model)
        
        # Cross-attention
        attended, _ = self.attention(query, modalities_flat, modalities_flat)
        
        # Residual connection
        return self.norm(query + attended)


In [4]:
# ==================== Temporal Positional Encoding ====================

class TemporalPositionalEncoding(nn.Module):
    """Temporal-aware positional encoding"""
    
    def __init__(self, d_model: int, max_len: int = 500):
        super().__init__()
        
        # Standard positional encoding
        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * 
                           (-math.log(10000.0) / d_model))
        
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        self.register_buffer('pe', pe.unsqueeze(0))
        
        # Temporal encoding for cyclical patterns
        self.hour_encoding = nn.Parameter(torch.randn(24, d_model // 4))
        self.day_encoding = nn.Parameter(torch.randn(7, d_model // 4))
        
    def forward(self, x, timestamps=None, days=None):
        # Add standard positional encoding
        x = x + self.pe[:, :x.size(1)]
        
        # Add temporal encodings if provided
        if timestamps is not None and days is not None:
            batch_size, seq_len = timestamps.shape
            
            # Get temporal encodings
            hour_enc = self.hour_encoding[timestamps]  # (batch, seq, d_model//4)
            day_enc = self.day_encoding[days]  # (batch, seq, d_model//4)
            
            # Pad to match d_model
            zeros = torch.zeros(batch_size, seq_len, x.size(-1) // 2).to(x.device)
            temporal_enc = torch.cat([hour_enc, day_enc, zeros], dim=-1)
            
            x = x + temporal_enc
        
        return x

In [5]:
# ==================== Transformer Block ====================

class TransformerBlock(nn.Module):
    """Custom Transformer block with multi-head attention"""
    
    def __init__(self, d_model, nhead, dim_feedforward, dropout):
        super().__init__()
        
        # Multi-head attention
        self.self_attn = nn.MultiheadAttention(
            d_model, nhead, dropout=dropout, batch_first=True
        )
        
        # Feed forward
        self.feed_forward = nn.Sequential(
            nn.Linear(d_model, dim_feedforward),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(dim_feedforward, d_model)
        )
        
        # Layer normalization
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        
        # Dropout
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x, mask=None, return_attention=False):
        # Self-attention with residual connection
        attn_output, attn_weights = self.self_attn(
            x, x, x, attn_mask=mask, need_weights=return_attention
        )
        x = self.norm1(x + self.dropout(attn_output))
        
        # Feed forward with residual connection
        ff_output = self.feed_forward(x)
        x = self.norm2(x + self.dropout(ff_output))
        
        if return_attention:
            return x, attn_weights
        return x, None

In [6]:
# ==================== POI Transformer Model ====================

class POITransformer(nn.Module):
    """Advanced Transformer for POI prediction with multiple heads"""
    
    def __init__(self, config):
        super().__init__()
        self.config = config
        
        # Multi-modal embedding (using the improved version)
        self.embedding = MultiModalEmbedding(config)
        
        # Temporal positional encoding
        self.pos_encoding = TemporalPositionalEncoding(config['d_model'])
        
        # Transformer blocks
        self.transformer_blocks = nn.ModuleList([
            TransformerBlock(
                d_model=config['d_model'],
                nhead=config['nhead'],
                dim_feedforward=config['dim_feedforward'],
                dropout=config['dropout']
            ) for _ in range(config['num_layers'])
        ])
        
        # Output heads
        self.poi_prediction_head = nn.Linear(config['d_model'], config['num_pois'])
        self.category_prediction_head = nn.Linear(config['d_model'], config['num_categories'])
        self.time_regression_head = nn.Sequential(
            nn.Linear(config['d_model'], config['d_model'] // 2),
            nn.ReLU(),
            nn.Linear(config['d_model'] // 2, 1),
            nn.Softplus()  # Ensure positive output
        )
        
        # Dropout
        self.dropout = nn.Dropout(config['dropout'])
        
    def forward(self, batch_data, return_attention=False):
        # Unpack batch data
        poi_ids = batch_data['poi_ids']
        categories = batch_data['categories']
        timestamps = batch_data['timestamps']
        days = batch_data['days']
        seasons = batch_data['seasons']
        regions = batch_data['regions']
        coords = batch_data['coords']
        
        batch_size, seq_len = poi_ids.shape
        
        # Get embeddings using the improved multi-modal embedding
        x = self.embedding(poi_ids, categories, timestamps, days, seasons, regions, coords)
        
        # Add positional encoding
        x = self.pos_encoding(x, timestamps, days)
        
        # Apply dropout
        x = self.dropout(x)
        
        # Create causal mask
        mask = self.generate_causal_mask(seq_len).to(x.device)
        
        # Store attention weights if requested
        attention_weights = []
        
        # Apply transformer blocks
        for block in self.transformer_blocks:
            x, attn = block(x, mask, return_attention=return_attention)
            if return_attention:
                attention_weights.append(attn)
        
        # Apply output heads
        poi_logits = self.poi_prediction_head(x)
        category_logits = self.category_prediction_head(x)
        time_to_next = self.time_regression_head(x)
        
        outputs = {
            'poi_logits': poi_logits,
            'category_logits': category_logits,
            'time_to_next': time_to_next
        }
        
        if return_attention:
            outputs['attention_weights'] = attention_weights
        
        return outputs
    
    def generate_causal_mask(self, size: int) -> torch.Tensor:
        """Generate causal mask"""
        mask = torch.triu(torch.ones(size, size), diagonal=1)
        return mask.masked_fill(mask == 1, float('-inf'))

In [7]:
# ==================== Training and Evaluation ====================

class POIDataset(torch.utils.data.Dataset):
    """Dataset for POI trajectories"""
    
    def __init__(self, data, seq_length=15):
        self.data = data
        self.seq_length = seq_length
        
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        trajectory = self.data[idx]
        
        # Sample a subsequence
        if len(trajectory['poi_ids']) > self.seq_length + 1:
            start = random.randint(0, len(trajectory['poi_ids']) - self.seq_length - 1)
            end = start + self.seq_length
        else:
            start = 0
            end = len(trajectory['poi_ids']) - 1
        
        # Extract features
        batch_item = {
            'poi_ids': torch.tensor(trajectory['poi_ids'][start:end], dtype=torch.long),
            'categories': torch.tensor(trajectory['categories'][start:end], dtype=torch.long),
            'timestamps': torch.tensor(trajectory['timestamps'][start:end], dtype=torch.long),
            'days': torch.tensor(trajectory['days'][start:end], dtype=torch.long),
            'seasons': torch.tensor(trajectory['seasons'][start:end], dtype=torch.long),
            'regions': torch.tensor(trajectory['regions'][start:end], dtype=torch.long),
            'coords': torch.tensor(
                [[trajectory['lats'][i], trajectory['lons'][i]] 
                 for i in range(start, end)], 
                dtype=torch.float32
            ),
            'durations': torch.tensor(trajectory['durations'][start:end], dtype=torch.float32)
        }
        
        # Targets (next items)
        targets = {
            'next_poi': torch.tensor(trajectory['poi_ids'][start+1:end+1], dtype=torch.long),
            'next_category': torch.tensor(trajectory['categories'][start+1:end+1], dtype=torch.long),
            'next_duration': torch.tensor(trajectory['durations'][start+1:end+1], dtype=torch.float32)
        }
        
        return batch_item, targets

def collate_fn(batch):
    """Custom collate function for batching"""
    batch_data = defaultdict(list)
    batch_targets = defaultdict(list)
    
    for data, targets in batch:
        for key, value in data.items():
            batch_data[key].append(value)
        for key, value in targets.items():
            batch_targets[key].append(value)
    
    # Stack tensors
    for key in batch_data:
        batch_data[key] = torch.stack(batch_data[key])
    for key in batch_targets:
        batch_targets[key] = torch.stack(batch_targets[key])
    
    return dict(batch_data), dict(batch_targets)

def train_model(model, train_loader, config, num_epochs=30):
    """Train the model with multiple objectives"""
    model.train()
    optimizer = optim.Adam(model.parameters(), lr=config['learning_rate'])
    
    # Loss functions
    poi_criterion = nn.CrossEntropyLoss()
    category_criterion = nn.CrossEntropyLoss()
    time_criterion = nn.MSELoss()
    
    losses_history = []
    
    for epoch in range(num_epochs):
        epoch_losses = defaultdict(float)
        
        for batch_data, targets in train_loader:
            # Move to device
            for key in batch_data:
                batch_data[key] = batch_data[key].to(device)
            for key in targets:
                targets[key] = targets[key].to(device)
            
            # Forward pass
            outputs = model(batch_data)
            
            # Calculate losses
            poi_loss = poi_criterion(
                outputs['poi_logits'].reshape(-1, config['num_pois']),
                targets['next_poi'].reshape(-1)
            )
            
            category_loss = category_criterion(
                outputs['category_logits'].reshape(-1, config['num_categories']),
                targets['next_category'].reshape(-1)
            )
            
            time_loss = time_criterion(
                outputs['time_to_next'].squeeze(-1),
                targets['next_duration']
            )
            
            # Combined loss
            total_loss = poi_loss + 0.5 * category_loss + 0.3 * time_loss
            
            # Backward pass
            optimizer.zero_grad()
            total_loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            optimizer.step()
            
            # Record losses
            epoch_losses['poi'] += poi_loss.item()
            epoch_losses['category'] += category_loss.item()
            epoch_losses['time'] += time_loss.item()
            epoch_losses['total'] += total_loss.item()
        
        # Average losses
        for key in epoch_losses:
            epoch_losses[key] /= len(train_loader)
        
        losses_history.append(dict(epoch_losses))
        
        if (epoch + 1) % 10 == 0:
            print(f"Epoch {epoch+1}/{num_epochs}")
            print(f"  POI Loss: {epoch_losses['poi']:.4f}")
            print(f"  Category Loss: {epoch_losses['category']:.4f}")
            print(f"  Time Loss: {epoch_losses['time']:.4f}")
            print(f"  Total Loss: {epoch_losses['total']:.4f}")
    
    return losses_history


In [8]:
# ==================== Analysis and Visualization ====================

def analyze_attention_patterns(model, sample_data, generator):
    """Analyze attention patterns to show long-range dependencies"""
    model.eval()
    
    with torch.no_grad():
        # Prepare sample
        for key in sample_data:
            if isinstance(sample_data[key], torch.Tensor):
                sample_data[key] = sample_data[key].unsqueeze(0).to(device)
        
        # Get predictions with attention
        outputs = model(sample_data, return_attention=True)
        
        # Get attention weights from last layer
        attention = outputs['attention_weights'][-1].cpu().numpy()[0]
        
        # Analyze patterns
        seq_len = attention.shape[-1]
        poi_ids = sample_data['poi_ids'].cpu().numpy()[0]
        
        print("\n" + "="*60)
        print("ATTENTION PATTERN ANALYSIS")
        print("="*60)
        
        # Find long-range dependencies
        print("\nLong-range Dependencies Detected:")
        for i in range(seq_len):
            for j in range(max(0, i-5), i):
                if attention[i, j] > 0.15:  # Significant attention
                    poi_i = generator.poi_database[poi_ids[i]].name
                    poi_j = generator.poi_database[poi_ids[j]].name
                    distance = i - j
                    print(f"  {poi_i} (pos {i}) → {poi_j} (pos {j}) "
                          f"[distance: {distance}, weight: {attention[i, j]:.3f}]")
        
        # Find periodic patterns
        print("\nPeriodic Patterns (similar POIs attended):")
        poi_positions = defaultdict(list)
        for i, poi_id in enumerate(poi_ids):
            poi_positions[poi_id].append(i)
        
        for poi_id, positions in poi_positions.items():
            if len(positions) > 1:
                poi_name = generator.poi_database[poi_id].name
                print(f"  {poi_name} appears at positions: {positions}")
                
                # Check if later occurrences attend to earlier ones
                for i in range(1, len(positions)):
                    if positions[i] < seq_len and positions[i-1] < seq_len:
                        attn_weight = attention[positions[i], positions[i-1]]
                        if attn_weight > 0.1:
                            print(f"    → Self-attention weight: {attn_weight:.3f}")

def demonstrate_predictions(model, generator, config):
    """Demonstrate model predictions with different scenarios"""
    model.eval()
    
    print("\n" + "="*60)
    print("PREDICTION DEMONSTRATIONS")
    print("="*60)
    
    # Scenario 1: Morning routine
    print("\n1. Morning Routine Prediction:")
    morning_trajectory = {
        'poi_ids': torch.tensor([[0, 2, 1]], dtype=torch.long),  # home → starbucks → office
        'categories': torch.tensor([[5, 0, 1]], dtype=torch.long),
        'timestamps': torch.tensor([[7, 8, 9]], dtype=torch.long),
        'days': torch.tensor([[1, 1, 1]], dtype=torch.long),  # Monday
        'seasons': torch.tensor([[0, 0, 0]], dtype=torch.long),
        'regions': torch.tensor([[0, 1, 1]], dtype=torch.long),
        'coords': torch.tensor([[[40.7128, -74.0060], 
                                 [40.7489, -73.9680],
                                 [40.7580, -73.9855]]], dtype=torch.float32)
    }
    
    for key in morning_trajectory:
        morning_trajectory[key] = morning_trajectory[key].to(device)
    
    with torch.no_grad():
        outputs = model(morning_trajectory)
        
        # Next POI prediction
        poi_probs = F.softmax(outputs['poi_logits'][0, -1], dim=-1)
        top_pois = torch.topk(poi_probs, k=3)
        
        print("  Current sequence: Home → Starbucks → Office (Morning)")
        print("  Next POI predictions:")
        for prob, idx in zip(top_pois.values, top_pois.indices):
            poi_name = generator.poi_database[idx.item()].name
            print(f"    - {poi_name}: {prob.item():.3f}")
        
        # Category prediction
        cat_probs = F.softmax(outputs['category_logits'][0, -1], dim=-1)
        top_cats = torch.topk(cat_probs, k=3)
        print("  Next category predictions:")
        for prob, idx in zip(top_cats.values, top_cats.indices):
            cat_name = list(generator.categories.keys())[idx.item()]
            print(f"    - {cat_name}: {prob.item():.3f}")
        
        # Time prediction
        predicted_time = outputs['time_to_next'][0, -1].item()
        print(f"  Predicted time to next visit: {predicted_time:.2f} hours")
    
    # Scenario 2: Weekend pattern
    print("\n2. Weekend Entertainment Prediction:")
    weekend_trajectory = {
        'poi_ids': torch.tensor([[0, 10, 4]], dtype=torch.long),  # home → park → restaurant
        'categories': torch.tensor([[5, 4, 0]], dtype=torch.long),
        'timestamps': torch.tensor([[10, 14, 18]], dtype=torch.long),
        'days': torch.tensor([[6, 6, 6]], dtype=torch.long),  # Saturday
        'seasons': torch.tensor([[1, 1, 1]], dtype=torch.long),  # Summer
        'regions': torch.tensor([[0, 3, 2]], dtype=torch.long),
        'coords': torch.tensor([[[40.7128, -74.0060],
                                 [40.7829, -73.9654],
                                 [40.7431, -73.9897]]], dtype=torch.float32)
    }
    
    for key in weekend_trajectory:
        weekend_trajectory[key] = weekend_trajectory[key].to(device)
    
    with torch.no_grad():
        outputs = model(weekend_trajectory)
        poi_probs = F.softmax(outputs['poi_logits'][0, -1], dim=-1)
        top_pois = torch.topk(poi_probs, k=3)
        
        print("  Current sequence: Home → Central Park → Italian Restaurant (Weekend)")
        print("  Next POI predictions:")
        for prob, idx in zip(top_pois.values, top_pois.indices):
            poi_name = generator.poi_database[idx.item()].name
            print(f"    - {poi_name}: {prob.item():.3f}")

In [15]:
# ==================== Main Execution ====================

def main():
    print("="*60)
    print("ADVANCED POI TRANSFORMER")
    print("Demonstrating Self-Attention Benefits")
    print("="*60)
    
    # Configuration
    config = {
        'num_pois': 13,
        'num_categories': 7,
        'num_regions': 4,
        'poi_embed_dim': 32,
        'cat_embed_dim': 16,
        'time_embed_dim': 16,
        'day_embed_dim': 8,
        'season_embed_dim': 8,
        'region_embed_dim': 8,
        'geo_embed_dim': 16,
        'd_model': 128,
        'nhead': 8,
        'num_layers': 3,
        'dim_feedforward': 256,
        'dropout': 0.1,
        'learning_rate': 0.001,
        'use_cross_attention': True 
    }
    
    # Generate data
    print("\n1. Generating synthetic POI data with complex patterns...")
    generator = AdvancedPOIDataGenerator()
    data = generator.generate_dataset(num_sequences=200)  # Small dataset for M1
    
    # Create dataset and dataloader
    dataset = POIDataset(data, seq_length=10)
    train_loader = torch.utils.data.DataLoader(
        dataset, 
        batch_size=16, 
        shuffle=True,
        collate_fn=collate_fn
    )
    
    # Create model
    print("\n2. Creating Advanced POI Transformer...")
    model = POITransformer(config).to(device)
    num_params = sum(p.numel() for p in model.parameters())
    print(f"   Total parameters: {num_params:,}")
    
    # Train model
    print("\n3. Training model with multiple objectives...")
    losses = train_model(model, train_loader, config, num_epochs=30)
    
    if config.get("use_cross_attention", False):
        print("\n4. Analyzing cross-attention patterns...")    
    else:
        print("\n4. Analyzing self-attention patterns...")
    sample_data, _ = dataset[0]
    analyze_attention_patterns(model, sample_data, generator)
    
    # Demonstrate predictions
    print("\n5. Demonstrating predictions...")
    demonstrate_predictions(model, generator, config)
    
    print("\n" + "="*60)
    print("KEY TRANSFORMER ADVANTAGES DEMONSTRATED:")
    print("="*60)
    print("✓ Long-range dependencies: Model can relate distant POI visits")
    print("✓ Periodic patterns: Identifies weekly/monthly visit patterns")
    print("✓ Multi-modal fusion: Combines POI, category, time, and location")
    print("✓ Multiple predictions: POI, category, and time-to-next visit")
    print("✓ Attention analysis: Interpretable attention patterns")
    
    return model, generator


In [16]:

if __name__ == "__main__":
    model, generator = main()

ADVANCED POI TRANSFORMER
Demonstrating Self-Attention Benefits

1. Generating synthetic POI data with complex patterns...

2. Creating Advanced POI Transformer...
   Total parameters: 558,289

3. Training model with multiple objectives...
Epoch 10/30
  POI Loss: 1.8773
  Category Loss: 1.3618
  Time Loss: 0.5222
  Total Loss: 2.7149
Epoch 20/30
  POI Loss: 1.4354
  Category Loss: 1.0830
  Time Loss: 0.4954
  Total Loss: 2.1255
Epoch 30/30
  POI Loss: 1.2001
  Category Loss: 0.8899
  Time Loss: 0.4934
  Total Loss: 1.7930

4. Analyzing cross-attention patterns...

ATTENTION PATTERN ANALYSIS

Long-range Dependencies Detected:
  sushi_bar (pos 1) → starbucks (pos 0) [distance: 1, weight: 0.567]
  italian_restaurant (pos 2) → starbucks (pos 0) [distance: 2, weight: 0.451]
  italian_restaurant (pos 2) → sushi_bar (pos 1) [distance: 1, weight: 0.275]
  gym (pos 3) → starbucks (pos 0) [distance: 3, weight: 0.310]
  gym (pos 3) → sushi_bar (pos 1) [distance: 2, weight: 0.230]
  gym (pos 3) → i