# üåç Adaptive Travel Recommendation System - Interactive Demo

This notebook provides an interactive demonstration of three reinforcement learning algorithms for travel recommendations:
- **Epsilon-Greedy**
- **LinUCB (Contextual Bandit)**
- **Thompson Sampling**

You can interact with each model and see how they learn from your feedback!

## Setup and Imports

In [1]:
import sys
sys.path.append('..')

import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import clear_output
import warnings
warnings.filterwarnings('ignore')

# Import agent classes from models module
from models.agents import EpsilonGreedyAgent, LinUCBAgent, ContextualThompsonSampling

print("‚úÖ All libraries imported successfully!")

‚úÖ All libraries imported successfully!


## Load Data

In [2]:
# Load user profiles
df_users = pd.read_csv('../data/gen/user_profiles.csv')
feature_columns = [col for col in df_users.columns if col != 'user_id']

# Convert to user profile dictionaries
user_profiles = []
for _, row in df_users.iterrows():
    user_id = int(row['user_id'])
    prefs = {col: row[col] for col in feature_columns}
    user_profiles.append({"id": user_id, "prefs": prefs})

print(f"‚úÖ Loaded {len(user_profiles)} user profiles")

# Load places from dataset
df_places = pd.read_csv('../data/final_dataset.csv')
places = []
for _, row in df_places.iterrows():
    keywords = row['Keywords'].split(', ')
    primary_type = keywords[0] if keywords else 'du l·ªãch'
    places.append({
        "name": row['Location Name'],
        "type": primary_type,
        "keywords": keywords,
        "rating": row['Rating']
    })

print(f"‚úÖ Loaded {len(places)} travel destinations")
print(f"‚úÖ Feature types: {len(feature_columns)}")

‚úÖ Loaded 20 user profiles
‚úÖ Loaded 141 travel destinations
‚úÖ Feature types: 12


## Helper Functions

In [3]:
def context_to_vector(user):
    """Convert user preferences to vector"""
    return np.array([user["prefs"].get(col, 0) for col in feature_columns])

def simulate_reward(user, place):
    """Simulate reward based on user preferences and place rating"""
    pref_score = user["prefs"].get(place["type"], 0)
    place_r = np.clip(place.get("rating", 3.0) / 5.0, 0, 1)
    utility = 0.7 * pref_score + 0.3 * place_r
    prob = np.clip(utility + np.random.normal(0, 0.05), 0, 1)
    rating = int(np.floor(prob * 5)) + 1
    return max(1, min(5, rating))

## Load or Train Models

In [4]:
import pickle
import os

# Try to load pre-trained models
model_dir = '../saved_models'
models_loaded = False

try:
    with open(f'{model_dir}/epsilon_greedy.pkl', 'rb') as f:
        agent_egreedy = pickle.load(f)
    with open(f'{model_dir}/linucb.pkl', 'rb') as f:
        agent_linucb = pickle.load(f)
    with open(f'{model_dir}/thompson_sampling.pkl', 'rb') as f:
        agent_ts = pickle.load(f)
    
    print("‚úÖ Pre-trained models loaded successfully!")
    models_loaded = True
    
except FileNotFoundError:
    print("‚ö†Ô∏è  Pre-trained models not found. Please run train.ipynb first.")
    print("Creating new models with minimal training...")
    
    # Quick training function
    def quick_train(agent, agent_type, n_rounds=500):
        for _ in range(n_rounds):
            user = random.choice(user_profiles)
            
            if agent_type == "egreedy":
                arm = agent.select_arm()
                rating = simulate_reward(user, places[arm])
                reward = 1 + (rating - 1) / 4
                agent.update(arm, reward)
                
            else:  # contextual agents
                x = context_to_vector(user)
                ranked, _ = agent.select_arm(x)
                arm = random.choice(ranked[:3])
                rating = simulate_reward(user, places[arm])
                reward = 1 + (rating - 1) / 4
                agent.update(arm, x, reward)
    
    # Initialize and quick train
    agent_egreedy = EpsilonGreedyAgent(n_arms=len(places), epsilon=0.2)
    agent_linucb = LinUCBAgent(n_arms=len(places), n_features=len(feature_columns), alpha=0.1)
    agent_ts = ContextualThompsonSampling(n_arms=len(places), d=len(feature_columns), alpha=0.1)
    
    print("Training Epsilon-Greedy...")
    quick_train(agent_egreedy, "egreedy", 500)
    print("Training LinUCB...")
    quick_train(agent_linucb, "linucb", 500)
    print("Training Thompson Sampling...")
    quick_train(agent_ts, "ts", 500)
    
    print("‚úÖ Models trained with 500 rounds each!")
    models_loaded = True

‚úÖ Pre-trained models loaded successfully!


## Interactive Recommendation Functions

In [5]:
def interactive_recommendation_egreedy(agent):
    """Interactive recommendation using Epsilon-Greedy"""
    clear_output(wait=True)
    print("=" * 70)
    print("üéØ EPSILON-GREEDY TRAVEL RECOMMENDER")
    print("=" * 70)
    
    while True:
        # Get top-3 recommendations
        ranked = np.argsort(agent.values)[::-1][:3]
        
        print("\n‚úàÔ∏è  Top 3 Recommendations for You:")
        print("-" * 70)
        for i, idx in enumerate(ranked):
            print(f"   {i+1}. {places[idx]['name']:30s} ({places[idx]['type']:15s}) | Score: {agent.values[idx]:.3f}")
        print("-" * 70)
        
        # Get user input
        clicks = input("\nüëÜ Click on recommendation (1-3) or 'q' to quit: ")
        if clicks.lower() == 'q':
            print("\nüëã Thank you for using the Travel Recommender!")
            break
        
        # Process clicks
        clicked_arms = []
        for c in clicks.split():
            try:
                idx = int(c) - 1
                if 0 <= idx < len(ranked):
                    clicked_arms.append(ranked[idx])
            except:
                continue
        
        if not clicked_arms:
            print("‚ö†Ô∏è  No valid selection. Please try again.")
            continue
        
        # Get ratings and update
        for arm in ranked:
            if arm in clicked_arms:
                try:
                    rating = float(input(f"‚≠ê Rate '{places[arm]['name']}' (0-1): "))
                    rating = np.clip(rating, 0, 1)
                except:
                    rating = 1
                agent.update(arm, rating)
            else:
                agent.update(arm, -0.1)
        
        print("\n‚úÖ Model updated with your feedback!")
        clear_output(wait=True)
        print("=" * 70)
        print("üéØ EPSILON-GREEDY TRAVEL RECOMMENDER")
        print("=" * 70)

In [6]:
def interactive_recommendation_linucb(agent):
    """Interactive recommendation using LinUCB"""
    clear_output(wait=True)
    print("=" * 70)
    print("üéØ LinUCB CONTEXTUAL TRAVEL RECOMMENDER")
    print("=" * 70)
    
    while True:
        # Get user ID
        user_input = input("\nüë§ Enter user ID (0-19) or 'q' to quit: ")
        if user_input.lower() == 'q':
            print("\nüëã Thank you for using the Travel Recommender!")
            break
        
        try:
            user_id = int(user_input)
            if user_id < 0 or user_id >= len(user_profiles):
                print("‚ö†Ô∏è  Invalid user ID! Please enter a number between 0 and 19.")
                continue
        except:
            print("‚ö†Ô∏è  Invalid input!")
            continue
        
        user = user_profiles[user_id]
        x = context_to_vector(user)
        
        # Get top-3 recommendations with scores
        scores = []
        for arm in range(len(places)):
            A_inv = np.linalg.inv(agent.A[arm])
            theta = A_inv @ agent.b[arm]
            score = theta @ x + agent.alpha * np.sqrt(x @ A_inv @ x)
            scores.append(score)
        
        ranked = np.argsort(scores)[::-1][:3]
        
        print(f"\n‚úàÔ∏è  Top 3 Recommendations for User {user_id}:")
        print("-" * 70)
        for i, idx in enumerate(ranked):
            print(f"   {i+1}. {places[idx]['name']:30s} ({places[idx]['type']:15s}) | Score: {scores[idx]:.3f}")
        print("-" * 70)
        
        # Get clicks
        clicks = input("\nüëÜ Click on recommendation (1-3) or 'q' to quit: ")
        if clicks.lower() == 'q':
            print("\nüëã Thank you for using the Travel Recommender!")
            break
        
        clicked_arms = []
        for c in clicks.split():
            try:
                idx = int(c) - 1
                if 0 <= idx < len(ranked):
                    clicked_arms.append(ranked[idx])
            except:
                continue
        
        if not clicked_arms:
            print("‚ö†Ô∏è  No valid selection. Please try again.")
            continue
        
        # Get ratings and update
        for arm in ranked:
            if arm in clicked_arms:
                try:
                    rating = float(input(f"‚≠ê Rate '{places[arm]['name']}' (0-1): "))
                    rating = np.clip(rating, 0, 1)
                except:
                    rating = 1
                agent.update(arm, x, rating)
            else:
                agent.update(arm, x, -0.1)
        
        print("\n‚úÖ Model updated with your feedback!")
        clear_output(wait=True)
        print("=" * 70)
        print("üéØ LinUCB CONTEXTUAL TRAVEL RECOMMENDER")
        print("=" * 70)

In [7]:
def interactive_recommendation_ts(agent):
    """Interactive recommendation using Thompson Sampling"""
    clear_output(wait=True)
    print("=" * 70)
    print("üéØ THOMPSON SAMPLING TRAVEL RECOMMENDER")
    print("=" * 70)
    
    while True:
        # Get user ID
        user_input = input("\nüë§ Enter user ID (0-19) or 'q' to quit: ")
        if user_input.lower() == 'q':
            print("\nüëã Thank you for using the Travel Recommender!")
            break
        
        try:
            user_id = int(user_input)
            if user_id < 0 or user_id >= len(user_profiles):
                print("‚ö†Ô∏è  Invalid user ID! Please enter a number between 0 and 19.")
                continue
        except:
            print("‚ö†Ô∏è  Invalid input!")
            continue
        
        user = user_profiles[user_id]
        x = context_to_vector(user)
        
        # Get top-3 recommendations using Thompson Sampling
        sampled_rewards = []
        for arm in range(agent.n_arms):
            B_inv = np.linalg.inv(agent.B[arm])
            mu_hat = B_inv @ agent.f[arm]
            theta_sample = np.random.multivariate_normal(mu_hat, agent.alpha**2 * B_inv)
            sampled_rewards.append(theta_sample @ x)
        
        ranked = np.argsort(sampled_rewards)[::-1][:3]
        
        print(f"\n‚úàÔ∏è  Top 3 Recommendations for User {user_id}:")
        print("-" * 70)
        for i, idx in enumerate(ranked):
            print(f"   {i+1}. {places[idx]['name']:30s} ({places[idx]['type']:15s}) | Score: {sampled_rewards[idx]:.3f}")
        print("-" * 70)
        
        # Get clicks
        clicks = input("\nüëÜ Click on recommendation (1-3) or 'q' to quit: ")
        if clicks.lower() == 'q':
            print("\nüëã Thank you for using the Travel Recommender!")
            break
        
        clicked_arms = []
        for c in clicks.split():
            try:
                idx = int(c) - 1
                if 0 <= idx < len(ranked):
                    clicked_arms.append(ranked[idx])
            except:
                continue
        
        if not clicked_arms:
            print("‚ö†Ô∏è  No valid selection. Please try again.")
            continue
        
        # Get ratings and update
        for arm in ranked:
            if arm in clicked_arms:
                try:
                    rating = float(input(f"‚≠ê Rate '{places[arm]['name']}' (0-1): "))
                    rating = np.clip(rating, 0, 1)
                except:
                    rating = 1
                agent.update(arm, x, rating)
            else:
                agent.update(arm, x, 0)
        
        print("\n‚úÖ Model updated with your feedback!")
        clear_output(wait=True)
        print("=" * 70)
        print("üéØ THOMPSON SAMPLING TRAVEL RECOMMENDER")
        print("=" * 70)

## üéÆ Try the Demos!

Choose one of the models below to interact with:

### Demo 1: Epsilon-Greedy

This model uses a simple exploration-exploitation strategy. It doesn't consider user context, just learns which destinations are generally popular.

In [8]:
# Run Epsilon-Greedy Interactive Demo
interactive_recommendation_egreedy(agent_egreedy)

üéØ EPSILON-GREEDY TRAVEL RECOMMENDER

‚úàÔ∏è  Top 3 Recommendations for You:
----------------------------------------------------------------------
   1. V∆∞·ªùn qu·ªëc gia N√∫i Ch√∫a         (tham quan      ) | Score: 0.015
   2. Ch√πa T√¥n Th·∫°nh                 (tham quan      ) | Score: 0.011
   3. ƒê·ªânh Qu·∫ø                       (thi√™n nhi√™n    ) | Score: -0.002
----------------------------------------------------------------------
‚ö†Ô∏è  No valid selection. Please try again.

‚úàÔ∏è  Top 3 Recommendations for You:
----------------------------------------------------------------------
   1. V∆∞·ªùn qu·ªëc gia N√∫i Ch√∫a         (tham quan      ) | Score: 0.015
   2. Ch√πa T√¥n Th·∫°nh                 (tham quan      ) | Score: 0.011
   3. ƒê·ªânh Qu·∫ø                       (thi√™n nhi√™n    ) | Score: -0.002
----------------------------------------------------------------------
‚ö†Ô∏è  No valid selection. Please try again.

‚úàÔ∏è  Top 3 Recommendations for You:
-------

### Demo 2: LinUCB (Contextual Bandit)

This model considers user preferences to provide personalized recommendations. It uses confidence bounds to balance exploration and exploitation.

In [None]:
# Run LinUCB Interactive Demo
interactive_recommendation_linucb(agent_linucb)

### Demo 3: Thompson Sampling

This model uses Bayesian inference to learn user preferences. It naturally balances exploration and exploitation through probability matching.

In [14]:
# Run Thompson Sampling Interactive Demo
interactive_recommendation_ts(agent_ts)

üéØ THOMPSON SAMPLING TRAVEL RECOMMENDER

‚úàÔ∏è  Top 3 Recommendations for User 2:
----------------------------------------------------------------------
   1. S√¥ng Ch√†y                      (tham quan      ) | Score: 0.215
   2. C·ª≠a kh·∫©u H·ªØu Ngh·ªã              (tham quan      ) | Score: 0.109
   3. ƒê·ªÅn Tr·∫ßn Th∆∞∆°ng                (kh√°m ph√°       ) | Score: 0.066
----------------------------------------------------------------------

‚úàÔ∏è  Top 3 Recommendations for User 2:
----------------------------------------------------------------------
   1. S√¥ng Ch√†y                      (tham quan      ) | Score: 0.215
   2. C·ª≠a kh·∫©u H·ªØu Ngh·ªã              (tham quan      ) | Score: 0.109
   3. ƒê·ªÅn Tr·∫ßn Th∆∞∆°ng                (kh√°m ph√°       ) | Score: 0.066
----------------------------------------------------------------------


KeyboardInterrupt: Interrupted by user