<a href="https://colab.research.google.com/github/pimanzi/VisitRwandaBot/blob/main/visitRwandaBot_flan_t5_small.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install required PyTorch libraries
print("Installing PyTorch Hugging Face ecosystem...")
!pip install transformers datasets evaluate accelerate gradio wandb torch

print("All PyTorch libraries installed successfully!")

Installing PyTorch Hugging Face ecosystem...
Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6
All PyTorch libraries installed successfully!


In [None]:
!pip install rouge_score

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24934 sha256=2880990f4e04e9126f355d37d7e40c5ca082122d8caa48980eafc3c985e64c7e
  Stored in directory: /root/.cache/pip/wheels/85/9d/af/01feefbe7d55ef5468796f0c68225b6788e85d9d0a281e7a70
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [None]:
# Import PyTorch libraries only
import pandas as pd
import numpy as np
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
    Seq2SeqTrainingArguments,
    Seq2SeqTrainer,
    DataCollatorForSeq2Seq
)
from datasets import Dataset
import evaluate
from sklearn.model_selection import train_test_split
import gradio as gr
import warnings
import rouge_score
warnings.filterwarnings('ignore')

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device: {'cuda' if torch.cuda.is_available() else 'cpu'}")

PyTorch version: 2.8.0+cu126
CUDA available: False
Device: cpu


Load Synthentic data


In [None]:
# Load the Rwanda Tourism Q&A dataset
df = pd.read_csv('rwanda_tourism_balanced_dataset.csv')

print("Dataset Overview:")
print(f"Total samples: {len(df)}")
print(f"Columns: {list(df.columns)}")
print("\nCategory distribution:")
print(df['category'].value_counts())

# Display first few samples
print("\n Sample data:")
print(df.head())

# Display random samples from each category
print("\n Random samples from each category:")
for category in df['category'].unique():
    sample = df[df['category'] == category].sample(1)
    print(f"\n {category}:")
    print(f"Q: {sample['question'].iloc[0]}")
    print(f"A: {sample['answer'].iloc[0][:100]}...")

Dataset Overview:
Total samples: 790
Columns: ['id', 'category', 'question', 'answer']

Category distribution:
category
National Parks           158
Cultural and heritage    158
Sports and Leisures      158
Foods and Dishes         158
Towns                    158
Name: count, dtype: int64

 Sample data:
   id        category                                           question  \
0   1  National Parks          How many national parks does Rwanda have?   
1   2  National Parks  What are the names of the national parks found...   
2   3  National Parks          Which national park should I visit first?   
3   4  National Parks             Tell me about Volcanoes National Park.   
4   5  National Parks            How many volcanoes are found in Rwanda?   

                                              answer  
0                    Rwanda has four national parks.  
1  The national parks in Rwanda are Volcanoes Nat...  
2  It depends on your interests. If you love wild...  
3  “In the heart 

In [None]:
# Split the dataset into training and test sets (80/20 split)
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42, stratify=df['category'])

print(" Dataset Split:")
print(f"Training samples: {len(train_df)}")
print(f"Test samples: {len(test_df)}")

print("\n Training set category distribution:")
print(train_df['category'].value_counts())

print("\n Test set category distribution:")
print(test_df['category'].value_counts())

 Dataset Split:
Training samples: 632
Test samples: 158

 Training set category distribution:
category
Towns                    127
Foods and Dishes         127
Cultural and heritage    126
National Parks           126
Sports and Leisures      126
Name: count, dtype: int64

 Test set category distribution:
category
Sports and Leisures      32
National Parks           32
Cultural and heritage    32
Towns                    31
Foods and Dishes         31
Name: count, dtype: int64


##  Data Preprocessing

Now we'll tokenize our data using the Flan-T5-small tokenizer. We'll format the input as questions and target as answers, with proper padding and truncation.

In [None]:
# Initialize the tokenizer for Flan-T5-small
model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Set maximum sequence lengths
max_input_length = 256
max_target_length = 256

def preprocess_function(examples):
    """Tokenize the input questions and target answers"""
    # Prefix for better question answering performance
    inputs = ["question: " + q for q in examples["question"]]
    targets = examples["answer"]

    # Tokenize inputs
    model_inputs = tokenizer(
        inputs,
        max_length=max_input_length,
        truncation=True,
        padding=True
    )

    # Tokenize targets
    labels = tokenizer(
        targets,
        max_length=max_target_length,
        truncation=True,
        padding=True
    )

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# Convert pandas DataFrames to Hugging Face Datasets
train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)

# Apply tokenization
train_tokenized = train_dataset.map(preprocess_function, batched=True)
test_tokenized = test_dataset.map(preprocess_function, batched=True)

# Remove unnecessary columns that cause tensor conversion issues
columns_to_remove = ['question', 'answer', 'category', '__index_level_0__']
train_tokenized = train_tokenized.remove_columns([col for col in columns_to_remove if col in train_tokenized.column_names])
test_tokenized = test_tokenized.remove_columns([col for col in columns_to_remove if col in test_tokenized.column_names])

print(" Tokenization completed!")
print(f"Training samples: {len(train_tokenized)}")
print(f"Test samples: {len(test_tokenized)}")
print(f"Training columns: {train_tokenized.column_names}")
print(f"Test columns: {test_tokenized.column_names}")

# Display a sample tokenized example
print("\n Sample tokenized data:")
sample = train_tokenized[0]
print("Input IDs:", sample['input_ids'][:10], "...")
print("Labels:", sample['labels'][:10], "...")
print("Input shape:", len(sample['input_ids']))
print("Labels shape:", len(sample['labels']))

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Map:   0%|          | 0/632 [00:00<?, ? examples/s]

Map:   0%|          | 0/158 [00:00<?, ? examples/s]

 Tokenization completed!
Training samples: 632
Test samples: 158
Training columns: ['id', 'input_ids', 'attention_mask', 'labels']
Test columns: ['id', 'input_ids', 'attention_mask', 'labels']

 Sample tokenized data:
Input IDs: [822, 10, 363, 5449, 11228, 14721, 16, 1435, 5138, 58] ...
Labels: [11386, 11228, 14721, 16, 1435, 5138, 114, 27177, 6, 29162] ...
Input shape: 28
Labels shape: 115


## Model Setup

Let's load the Flan-T5-small model and configure the training parameters for fine-tuning on our Rwanda Tourism dataset.

In [None]:
# Initialize the tokenizer for Flan-T5-small
model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)

print(f" Tokenizer loaded: {model_name}")
print(f"Vocabulary size: {tokenizer.vocab_size}")

# Set device for model training
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

 Tokenizer loaded: google/flan-t5-small
Vocabulary size: 32100
Using device: cpu


In [None]:
# Load the Flan-T5-small model
print("🔄 Loading Flan-T5-small model...")

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
model = model.to(device)

print(f" Model loaded successfully: {model_name}")

# Print model information
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f" Model Statistics:")
print(f"  Total parameters: {total_params:,}")
print(f"  Trainable parameters: {trainable_params:,}")
print(f"  Model size: ~{total_params * 4 / 1024 / 1024:.1f} MB")

# Data collator for sequence-to-sequence tasks
data_collator = DataCollatorForSeq2Seq(
    tokenizer=tokenizer,
    model=model,
    padding=True,
    return_tensors="pt"
)

print(" Data collator configured")

🔄 Loading Flan-T5-small model...


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

 Model loaded successfully: google/flan-t5-small
 Model Statistics:
  Total parameters: 76,961,152
  Trainable parameters: 76,961,152
  Model size: ~293.6 MB
 Data collator configured


In [None]:
# Configure training arguments for 6 epochs with overfitting protection
training_args = Seq2SeqTrainingArguments(
    output_dir="./rwanda_tourism_flan_t5",
    eval_strategy="epoch",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="epoch",
    learning_rate=3e-4,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=6,
    predict_with_generate=True,
    fp16=torch.cuda.is_available(),
    push_to_hub=False,
    remove_unused_columns=True,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    report_to=["wandb"],
    dataloader_pin_memory=False,
    dataloader_num_workers=0,
    logging_dir='./logs',
    logging_first_step=True,
    save_steps=50,
    warmup_steps=50,
    run_name="rwanda-tourism-flan-t5-6epochs-safe",
)

##  Model Training

Now let's fine-tune the Flan-T5 model on our Rwanda Tourism dataset.

In [None]:
# Initialize wandb for 6-epoch training with overfitting monitoring
import wandb

# Initialize wandb run
wandb.init(
    project="rwanda-tourism-chatbot",
    name="flan-t5-small-monitored",
    config={
        "model": "google/flan-t5-small",
        "dataset": "Rwanda Tourism Q&A",
        "learning_rate": 3e-4,
        "batch_size": 4,
        "epochs": 6,
        "max_input_length": 256,
        "max_target_length": 256,
        "overfitting_protection": "load_best_model_at_end",
        "metric": "eval_loss",
    }
)

# Initialize trainer
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train_tokenized,
    eval_dataset=test_tokenized,
    tokenizer=tokenizer,
    data_collator=data_collator,
)
print("-" * 60)

# Start training with enhanced monitoring
training_output = trainer.train()

print("-" * 60)
print("Training completed!")

# Analyze training progression
print(f"\nFinal Training Results:")
print(f"Final Training Loss: {training_output.training_loss:.4f}")
print(f"Training Time: {training_output.metrics['train_runtime']:.2f} seconds")

# Check training history for overfitting signs
log_history = trainer.state.log_history
train_losses = [x['train_loss'] for x in log_history if 'train_loss' in x]
eval_losses = [x['eval_loss'] for x in log_history if 'eval_loss' in x]

if len(eval_losses) >= 2:
    best_epoch = eval_losses.index(min(eval_losses)) + 1
    final_train_loss = train_losses[-1] if train_losses else "N/A"
    best_val_loss = min(eval_losses)
    final_val_loss = eval_losses[-1]

    print(f"\n Training Analysis:")
    print(f"Best model from epoch: {best_epoch}")
    print(f"Best validation loss: {best_val_loss:.4f}")
    print(f"Final validation loss: {final_val_loss:.4f}")

    if final_val_loss > best_val_loss + 0.02:
        print("Some overfitting detected - but best model was saved!")
    else:
        print("No overfitting detected - training was successful!")

# Save the best model (already done automatically)
trainer.save_model()
print(f"Best model saved to ./rwanda_tourism_flan_t5")

# Finish wandb
wandb.finish()
print("Training monitoring completed")

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mp-imanzi[0m ([33mp-imanzi-african-leadership-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: Detected [huggingface_hub.inference, mcp] in use.
[34m[1mwandb[0m: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
[34m[1mwandb[0m: For more information, check out the docs at: https://weave-docs.wandb.ai/


------------------------------------------------------------


Epoch,Training Loss,Validation Loss
1,1.2587,0.796812
2,0.913,0.621733
3,0.6492,0.535184


Epoch,Training Loss,Validation Loss
1,1.2587,0.796812
2,0.913,0.621733
3,0.6492,0.535184
4,0.5637,0.483146
5,0.4966,0.456923
6,0.4897,0.448075


There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].


------------------------------------------------------------
Training completed!

Final Training Results:
Final Training Loss: 1.4527
Training Time: 26798.87 seconds

 Training Analysis:
Best model from epoch: 6
Best validation loss: 0.4481
Final validation loss: 0.4481
No overfitting detected - training was successful!
Best model saved to ./rwanda_tourism_flan_t5


0,1
eval/loss,█▄▃▂▁▁
eval/runtime,▁█▄▅▅▄
eval/samples_per_second,█▁▅▅▄▅
eval/steps_per_second,█▁▆▄▄▆
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▅▆▆▆▆▆▇▇████
train/global_step,▁▁▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▆▇▇▇▇▇███
train/grad_norm,▇█▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/learning_rate,▁███▇▇▇▇▆▆▆▆▆▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁▁
train/loss,██▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
eval/loss,0.44808
eval/runtime,53.366
eval/samples_per_second,2.961
eval/steps_per_second,0.75
total_flos,38549043412992.0
train/epoch,6
train/global_step,948
train/grad_norm,1.35553
train/learning_rate,0.0
train/loss,0.4897


Training monitoring completed


##  Model Evaluation

Let's evaluate our fine-tuned model using ROUGE metrics and examine some example predictions.

In [None]:
# Load ROUGE metric for evaluation
rouge = evaluate.load("rouge")

def generate_response(question, max_length=256):
    """Generate response for a given question - PyTorch Version"""
    input_text = f"question: {question}"
    inputs = tokenizer(
        input_text,
        return_tensors="pt",
        truncation=True,
        max_length=max_input_length
    )

    # Move inputs to device
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Generate response
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            num_beams=4,
            early_stopping=True,
            do_sample=False,
            temperature=1.0
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Test on some examples from test set
print("Example Predictions:")
print("=" * 80)

for i in range(5):
    test_sample = test_df.iloc[i]
    question = test_sample['question']
    true_answer = test_sample['answer']
    predicted_answer = generate_response(question)

    print(f"Category: {test_sample['category']}")
    print(f"Question: {question}")
    print(f"True Answer: {true_answer[:150]}...")
    print(f"Predicted: {predicted_answer}")
    print("-" * 80)

Example Predictions:
Category: Sports and Leisures
Question: Are the caves safe and suitable for all tourists?
True Answer: Most caves are safe when visited with professional guides, but some involve rugged terrain, dark spaces, and narrow passages, so it’s recommended for ...
Predicted: Most caves are safe when visited with professional guides, but some are safe when visited with professional guides.
--------------------------------------------------------------------------------
Category: National Parks
Question: How do I get to Akagera?
True Answer: Self-drive access is through the southern Mutumba Gate. You can also arrive via scheduled or charter helicopter. 4×4 vehicles are recommended during t...
Predicted: Akagera, in Eastern Rwanda, is about 1 hour’s drive from Akagera International Airport. You can reach it by car or by road.
--------------------------------------------------------------------------------
Category: Sports and Leisures
Question: Why is Rwanda an excellent dest

In [None]:
# Evaluate ROUGE scores on a subset of test data
print("Computing ROUGE Scores...")

test_subset = test_df.sample(50, random_state=42)
predictions = []
references = []

for _, row in test_subset.iterrows():
    pred = generate_response(row['question'])
    predictions.append(pred)
    references.append(row['answer'])

# Calculate ROUGE scores
rouge_scores = rouge.compute(predictions=predictions, references=references)

print("ROUGE Evaluation Results:")
print(f"ROUGE-1: {rouge_scores['rouge1']:.4f}")
print(f"ROUGE-2: {rouge_scores['rouge2']:.4f}")
print(f"ROUGE-L: {rouge_scores['rougeL']:.4f}")
print(f"ROUGE-Lsum: {rouge_scores['rougeLsum']:.4f}")

# Evaluate on each category
print("\n Category-wise Performance:")
for category in test_df['category'].unique():
    category_data = test_df[test_df['category'] == category].sample(10, random_state=42)
    cat_predictions = [generate_response(row['question']) for _, row in category_data.iterrows()]
    cat_references = category_data['answer'].tolist()

    cat_rouge = rouge.compute(predictions=cat_predictions, references=cat_references)
    print(f"{category}: ROUGE-1 = {cat_rouge['rouge1']:.4f}")

Computing ROUGE Scores...
ROUGE Evaluation Results:
ROUGE-1: 0.4521
ROUGE-2: 0.2778
ROUGE-L: 0.4082
ROUGE-Lsum: 0.4107

 Category-wise Performance:
Sports and Leisures: ROUGE-1 = 0.5571
National Parks: ROUGE-1 = 0.2985
Cultural and heritage: ROUGE-1 = 0.4254
Towns: ROUGE-1 = 0.5617
Foods and Dishes: ROUGE-1 = 0.3912


In [None]:
#  Accuracy and F1 score
print(" COMPREHENSIVE EVALUATION METRICS")
print("=" * 60)

def calculate_f1_scores(predictions, references):
    """Calculate F1 scores for text generation"""
    f1_scores = []

    for pred, ref in zip(predictions, references):
        # Convert to word sets for F1 calculation
        pred_words = set(pred.lower().split())
        ref_words = set(ref.lower().split())

        # Calculate precision, recall, F1
        if len(pred_words) == 0:
            f1_scores.append(0.0)
            continue

        intersection = pred_words & ref_words
        precision = len(intersection) / len(pred_words) if pred_words else 0
        recall = len(intersection) / len(ref_words) if ref_words else 0

        if precision + recall == 0:
            f1_scores.append(0.0)
        else:
            f1 = 2 * (precision * recall) / (precision + recall)
            f1_scores.append(f1)

    return f1_scores

def calculate_accuracy_scores(predictions, references):
    """Calculate accuracy scores based on word overlap"""
    accuracy_scores = []

    for pred, ref in zip(predictions, references):
        pred_words = set(pred.lower().split())
        ref_words = set(ref.lower().split())

        if not ref_words:
            accuracy_scores.append(0.0)
            continue

        overlap = len(pred_words & ref_words)
        accuracy = overlap / len(ref_words) if ref_words else 0
        accuracy_scores.append(min(accuracy, 1.0))  # Cap at 1.0

    return accuracy_scores

def calculate_domain_relevance(predictions):
    """Calculate Rwanda tourism domain relevance"""
    rwanda_keywords = ['rwanda', 'kigali', 'gorilla', 'volcanoes', 'akagera', 'nyungwe', 'park', 'tourism']
    domain_scores = []

    for pred in predictions:
        pred_lower = pred.lower()
        keyword_count = sum(1 for keyword in rwanda_keywords if keyword in pred_lower)
        domain_score = min(keyword_count / 2, 1.0)
        domain_scores.append(domain_score)

    return domain_scores

# Calculate all metrics
f1_scores = calculate_f1_scores(predictions, references)
accuracy_scores = calculate_accuracy_scores(predictions, references)
domain_scores = calculate_domain_relevance(predictions)

# Calculate averages
avg_f1 = sum(f1_scores) / len(f1_scores)
avg_accuracy = sum(accuracy_scores) / len(accuracy_scores)
avg_domain_relevance = sum(domain_scores) / len(domain_scores)

# DISPLAY RESULTS
print("\n CORE PERFORMANCE METRICS:")
print(f"F1-Score: {avg_f1:.4f}")
print(f"Accuracy: {avg_accuracy:.4f}")
print(f"Domain Relevance: {avg_domain_relevance:.4f}")
print("\n ROUGE METRICS (Previously Calculated):")
print(f"ROUGE-1: {rouge_scores['rouge1']:.4f}")
print(f"ROUGE-2: {rouge_scores['rouge2']:.4f}")
print(f"ROUGE-L: {rouge_scores['rougeL']:.4f}")

print("\n RESPONSE QUALITY ANALYSIS:")
response_lengths = [len(pred.split()) for pred in predictions]
avg_length = sum(response_lengths) / len(response_lengths)
print(f"Average Response Length: {avg_length:.1f} words")

# Coherence check (simple heuristic)
coherent_responses = sum(1 for pred in predictions if len(pred.split()) >= 8 and not any(word in pred.lower() for word in ['sorry', 'unclear', 'not sure']))
coherence_rate = coherent_responses / len(predictions)
print(f"Response Coherence Rate: {coherence_rate:.4f}")

print("\n OVERALL PERFORMANCE ASSESSMENT:")
overall_score = (avg_f1 + avg_accuracy + rouge_scores['rouge1']) / 3
if overall_score >= 0.6:
    grade = "EXCELLENT"
elif overall_score >= 0.4:
    grade = "GOOD"
elif overall_score >= 0.3:
    grade = "FAIR"
else:
    grade = "NEEDS IMPROVEMENT"

print(f"Combined Score: {overall_score:.4f} - {grade}")

 COMPREHENSIVE EVALUATION METRICS

 CORE PERFORMANCE METRICS:
F1-Score: 0.4566
Accuracy: 0.4040
Domain Relevance: 0.3800

 ROUGE METRICS (Previously Calculated):
ROUGE-1: 0.4521
ROUGE-2: 0.2778
ROUGE-L: 0.4082

 RESPONSE QUALITY ANALYSIS:
Average Response Length: 24.4 words
Response Coherence Rate: 1.0000

 OVERALL PERFORMANCE ASSESSMENT:
Combined Score: 0.4376 - GOOD


In [None]:
# 🔧 FIXED: Load the correct fine-tuned model for Gradio
print("Loading the fine-tuned baseline model for Gradio...")
gradio_model_path = "./rwanda_tourism_flan_t5"
gradio_tokenizer = AutoTokenizer.from_pretrained(gradio_model_path)
gradio_model = AutoModelForSeq2SeqLM.from_pretrained(gradio_model_path)
gradio_model = gradio_model.to(device)

print(f"Loaded fine-tuned model from: {gradio_model_path}")

def generate_response_gradio(question, max_length=256):
    """Generate response using the fine-tuned model - FIXED VERSION"""
    input_text = f"question: {question}"
    inputs = gradio_tokenizer(
        input_text,
        return_tensors="pt",
        truncation=True,
        max_length=max_input_length
    )

    # Move inputs to device
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Generate response using the FINE-TUNED model
    with torch.no_grad():
        outputs = gradio_model.generate(
            **inputs,
            max_length=max_length,
            num_beams=4,
            early_stopping=True,
            do_sample=False,
            temperature=1.0
        )

    response = gradio_tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Enhanced chatbot function with domain filtering
def rwanda_tourism_chatbot(user_question):
    """
    Rwanda Tourism Chatbot using the fine-tuned model
    """
    # Keywords related to Rwanda tourism
    tourism_keywords = [
        'rwanda', 'kigali', 'volcanoes', 'gorilla', 'akagera', 'nyungwe', 'park', 'national',
        'tourism', 'travel', 'visit', 'museum', 'culture', 'heritage', 'food', 'dish',
        'sport', 'leisure', 'town', 'city', 'attractions', 'activities', 'accommodation',
        'safari', 'wildlife', 'mountain', 'lake', 'kivu', 'traditional', 'dance', 'music'
    ]

    # Check if question is tourism-related
    question_lower = user_question.lower()
    is_tourism_related = any(keyword in question_lower for keyword in tourism_keywords)

    if not is_tourism_related:
        return "🚫 I'm sorry, but I can only answer questions related to Rwanda tourism, culture, attractions, and travel. Please ask me about Rwanda's national parks, cultural heritage, foods, sports, or towns and destinations."

    # Generate response using the FINE-TUNED model
    try:
        response = generate_response_gradio(user_question)  # Using the fixed function
        if len(response.strip()) < 10:  # If response is too short, provide fallback
            return "🤔 I'm not sure about that specific aspect of Rwanda tourism. Could you please rephrase your question or ask about something more specific like national parks, cultural sites, or tourism activities?"
        return f"🇷🇼 {response}"
    except Exception as e:
        return f"😅 I encountered an issue generating a response: {str(e)}. Please try rephrasing your question about Rwanda tourism."

# Test the fixed function before launching Gradio
print("\n Testing the fixed model loading...")
test_question = "What can I see in Volcanoes National Park?"
test_response = generate_response_gradio(test_question)
print(f"Test Question: {test_question}")
print(f"Fine-tuned Model Response: {test_response}")

# Create Gradio interface with the corrected function
demo = gr.Interface(
    fn=rwanda_tourism_chatbot,
    inputs=gr.Textbox(
        lines=3,
        placeholder="Ask me anything about Rwanda tourism, culture, attractions, or travel...",
        label="Your Question"
    ),
    outputs=gr.Textbox(
        lines=5,
        label="Rwanda Tourism Assistant Response (Fine-tuned Model)"
    ),
    title="🇷🇼 Rwanda Tourism Chatbot - Fine-tuned Model",
    description="I'm your fine-tuned Rwanda Tourism Assistant! Ask me about national parks, cultural heritage, foods & dishes, sports & leisure activities, towns, and destinations in Rwanda.",
    examples=[
        "What national parks can I visit in Rwanda?",
        "Tell me about gorilla trekking in Volcanoes National Park",
        "What traditional foods should I try in Rwanda?",
        "What cultural attractions are there in Kigali?",
        "What sports activities are available in Rwanda?"
    ],
    theme="soft"
)

print("Launching Gradio with the fine-tuned model...")
# Launch the chatbot
demo.launch(share=True, debug=True)

Loading the fine-tuned baseline model for Gradio...
Loaded fine-tuned model from: ./rwanda_tourism_flan_t5

 Testing the fixed model loading...
Test Question: What can I see in Volcanoes National Park?
Fine-tuned Model Response: Volcanoes National Park is home to several volcanoes, including volcanoes, volcanoes, volcanoes, and volcanoes. Visitors can see volcanoes, volcanoes, volcanoes, volcanoes, and volcanoes.
Launching Gradio with the fine-tuned model...
Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://ab20dc5175d5d6dcc6.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://ab20dc5175d5d6dcc6.gradio.live




In [None]:
# Save the fine-tuned model and tokenizer locally - PyTorch Version
save_directory = "./rwanda_tourism_chatbot"

# Save model and tokenizer
model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)

print(f"✅ Model and tokenizer saved to: {save_directory}")

#  a simple test to verify saved model
print("\n Testing saved model...")
saved_tokenizer = AutoTokenizer.from_pretrained(save_directory)
saved_model = AutoModelForSeq2SeqLM.from_pretrained(save_directory)
saved_model = saved_model.to(device)
inputs = saved_tokenizer(test_input, return_tensors="pt", truncation=True, max_length=256)
inputs = {key: value.to(device) for key, value in inputs.items()}

with torch.no_grad():
    outputs = saved_model.generate(**inputs, max_length=256, num_beams=4, early_stopping=True)
response = saved_tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Saved model works correctly!")

✅ Model and tokenizer saved to: ./rwanda_tourism_chatbot

 Testing saved model...
Saved model works correctly!
