# FLAN-T5 Fine-tuning for Yelp Review Analysis

This notebook trains a FLAN-T5-small model to generate analysis text and recommendations for restaurant owners based on Yelp reviews.

**Author**: RLau33  
**Model**: google/flan-t5-small  
**Dataset**: Yelp/yelp_review_full  
**Task**: Text-to-Text Generation (Review Analysis)

## 1. Setup and Installation

In [29]:
# Install required packages
!pip install -q transformers datasets accelerate sentencepiece torch evaluate rouge-score

In [30]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [31]:
# Import libraries
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
    Seq2SeqTrainingArguments,
    Seq2SeqTrainer,
    DataCollatorForSeq2Seq
)
from datasets import load_dataset
import numpy as np
import evaluate
from huggingface_hub import login
import random

# Check GPU availability
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
if device == "cuda":
    print(f"GPU: {torch.cuda.get_device_name(0)}")

Using device: cuda
GPU: Tesla T4


## 2. Login to Hugging Face

You need to login to upload the trained model to your Hugging Face account.

In [32]:
# Login to Hugging Face (you'll need your token)
# Get your token from: https://huggingface.co/settings/tokens
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

## 3. Load and Prepare Dataset

In [33]:
# Load Yelp dataset
print("Loading Yelp dataset...")
dataset = load_dataset("Yelp/yelp_review_full")

print(f"Train samples: {len(dataset['train'])}")
print(f"Test samples: {len(dataset['test'])}")
print("\nSample data:")
print(dataset['train'][0])

Loading Yelp dataset...
Train samples: 650000
Test samples: 50000

Sample data:
{'label': 4, 'text': "dr. goldberg offers everything i look for in a general practitioner.  he's nice and easy to talk to without being patronizing; he's always on time in seeing his patients; he's affiliated with a top-notch hospital (nyu) which my parents have explained to me is very important in case something happens and you need surgery; and you can get referrals to see specialists without having to see him first.  really, what more do you need?  i'm sitting here trying to think of any complaints i have about him, but i'm really drawing a blank."}


In [34]:
# Sample dataset for faster training (adjust as needed)
# Using 50,000 samples (10,000 per star rating) for training
# Using 5,000 samples (1,000 per star rating) for validation

def sample_balanced_dataset(dataset, samples_per_class):
    """Sample equal number of examples from each class"""
    sampled_indices = []
    for label in range(5):  # 0-4 (1-5 stars)
        label_indices = [i for i, example in enumerate(dataset) if example['label'] == label]
        sampled_indices.extend(random.sample(label_indices, min(samples_per_class, len(label_indices))))

    random.shuffle(sampled_indices)
    return dataset.select(sampled_indices)

# Sample datasets
train_dataset = sample_balanced_dataset(dataset['train'], samples_per_class=10000)
eval_dataset = sample_balanced_dataset(dataset['test'], samples_per_class=1000)

print(f"Sampled train size: {len(train_dataset)}")
print(f"Sampled eval size: {len(eval_dataset)}")

Sampled train size: 50000
Sampled eval size: 5000


## 4. Create Training Data with Analysis Templates

In [35]:
# Define star rating descriptions
STAR_DESCRIPTIONS = {
    0: {"rating": "1-star", "sentiment": "very negative", "quality": "poor"},
    1: {"rating": "2-star", "sentiment": "negative", "quality": "below average"},
    2: {"rating": "3-star", "sentiment": "neutral", "quality": "average"},
    3: {"rating": "4-star", "sentiment": "positive", "quality": "good"},
    4: {"rating": "5-star", "sentiment": "very positive", "quality": "excellent"}
}

# Define recommendation templates based on star ratings
RECOMMENDATION_TEMPLATES = {
    0: [
        "This {rating} review indicates serious issues. Immediate action required: investigate service quality, address customer complaints promptly, and implement staff training. Consider reaching out to this customer directly to resolve their concerns and prevent negative word-of-mouth.",
        "Critical feedback detected in this {rating} review. Priority actions: review operational procedures, enhance quality control, and improve customer service protocols. Implement a customer recovery strategy and monitor similar complaints to prevent recurring issues.",
        "This {rating} review signals urgent problems. Recommendations: conduct internal audit of service standards, retrain staff on customer engagement, and establish feedback loops. Consider offering service recovery to affected customers and communicate improvements publicly."
    ],
    1: [
        "This {rating} review shows dissatisfaction. Suggested improvements: analyze specific pain points mentioned, enhance staff training, and improve service consistency. Reach out to the customer for detailed feedback and implement corrective measures to prevent similar experiences.",
        "Below-average experience reflected in this {rating} review. Actions needed: identify service gaps, improve response times, and strengthen quality assurance. Consider customer retention strategies and monitor improvement metrics over the next quarter.",
        "This {rating} review indicates room for improvement. Focus on: upgrading service protocols, enhancing product quality, and improving customer communication. Implement regular staff performance reviews and customer satisfaction surveys."
    ],
    2: [
        "This {rating} review shows average satisfaction. Opportunities: differentiate your offerings, enhance unique value propositions, and improve memorable customer experiences. Focus on consistency and consider loyalty programs to convert neutral customers into advocates.",
        "Neutral feedback in this {rating} review. Growth strategies: identify what would elevate the experience from average to excellent, invest in staff development, and enhance ambiance or service speed. Monitor competitor offerings and innovate accordingly.",
        "This {rating} review suggests meeting but not exceeding expectations. Recommendations: introduce surprise-and-delight elements, personalize service, and gather specific feedback on improvement areas. Develop marketing campaigns highlighting your strengths."
    ],
    3: [
        "This {rating} review indicates good performance. Maintain momentum by: sustaining current quality standards, identifying minor improvement areas, and encouraging positive reviews. Leverage this feedback in marketing materials and continue staff recognition programs.",
        "Positive experience shown in this {rating} review. Build on success: maintain consistency, address any mentioned minor issues, and encourage customer loyalty through rewards programs. Use positive feedback for staff motivation and training examples.",
        "This {rating} review reflects strong satisfaction. Next steps: ensure quality consistency across all touchpoints, implement customer referral incentives, and showcase positive reviews on social media. Continue monitoring feedback to maintain high standards."
    ],
    4: [
        "Excellent {rating} review! Capitalize on this success: feature this review in marketing campaigns, maintain exceptional service standards, and implement referral programs. Recognize staff members mentioned and use this as a training benchmark for excellence.",
        "Outstanding {rating} review indicating exceptional experience. Leverage this: encourage the customer to share on social platforms, maintain the high standards that earned this praise, and document best practices for team training. Consider loyalty rewards for top advocates.",
        "This {rating} review showcases your strengths. Strategic actions: amplify positive feedback through social media and website testimonials, ensure service consistency, and engage with the reviewer to build long-term loyalty. Use as case study for staff excellence."
    ]
}

def generate_analysis_text(review_text, label):
    """Generate analysis text based on review and rating"""
    desc = STAR_DESCRIPTIONS[label]
    template = random.choice(RECOMMENDATION_TEMPLATES[label])
    return template.format(**desc)

# Test the function
sample_review = train_dataset[0]
sample_analysis = generate_analysis_text(sample_review['text'], sample_review['label'])
print("Sample Review (truncated):")
print(sample_review['text'][:200] + "...")
print(f"\nRating: {sample_review['label'] + 1} stars")
print(f"\nGenerated Analysis:")
print(sample_analysis)

Sample Review (truncated):
I'd been wanting to to go Gilligin's for a while and finally made it this past Friday. It was quite full despite being the Friday of Labor Day Weekend. This has a nice divy atmosphere and kind of make...

Rating: 3 stars

Generated Analysis:
This 3-star review suggests meeting but not exceeding expectations. Recommendations: introduce surprise-and-delight elements, personalize service, and gather specific feedback on improvement areas. Develop marketing campaigns highlighting your strengths.


## 5. Load Model and Tokenizer

In [36]:
# Load FLAN-T5 model and tokenizer
model_name = "google/flan-t5-small"

print(f"Loading {model_name}...")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

print(f"Model loaded successfully!")
print(f"Model parameters: {model.num_parameters():,}")

Loading google/flan-t5-small...
Model loaded successfully!
Model parameters: 76,961,152


## 6. Preprocess Data

In [37]:
# Preprocessing parameters
max_input_length = 512
max_target_length = 256

def preprocess_function(examples):
    """Preprocess data for FLAN-T5 training"""
    # Create input prompts
    inputs = [
        f"Analyze this restaurant review and provide recommendations for the owner: {text[:400]}"  # Truncate long reviews
        for text in examples['text']
    ]

    # Generate target analysis texts
    targets = [
        generate_analysis_text(text, label)
        for text, label in zip(examples['text'], examples['label'])
    ]

    # Tokenize inputs
    model_inputs = tokenizer(
        inputs,
        max_length=max_input_length,
        truncation=True,
        padding="max_length"
    )

    # Tokenize targets
    labels = tokenizer(
        targets,
        max_length=max_target_length,
        truncation=True,
        padding="max_length"
    )

    # üîë CRITICAL FIX: Replace pad tokens (0) in labels with -100 so they are ignored in loss computation
    # Use tokenizer.pad_token_id for robustness (though it's 0 for T5/FLAN-T5)
    pad_token_id = tokenizer.pad_token_id
    labels_ids = []
    for label_seq in labels["input_ids"]:
        # Replace any pad token with -100
        label_seq = [(token if token != pad_token_id else -100) for token in label_seq]
        labels_ids.append(label_seq)

    model_inputs["labels"] = labels_ids
    return model_inputs

# Preprocess datasets
print("Preprocessing training data...")
tokenized_train = train_dataset.map(
    preprocess_function,
    batched=True,
    remove_columns=train_dataset.column_names
)

print("Preprocessing evaluation data...")
tokenized_eval = eval_dataset.map(
    preprocess_function,
    batched=True,
    remove_columns=eval_dataset.column_names
)

print("Preprocessing complete!")

Preprocessing training data...


Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

Preprocessing evaluation data...


Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Preprocessing complete!


In [38]:
# Quick sanity check
assert -100 in tokenized_train[0]["labels"], "Labels should contain -100 for padding!"
assert tokenizer.pad_token_id not in tokenized_train[0]["labels"], "Pad token should be replaced by -100!"

## 7. Setup Training

In [39]:
# Define training arguments
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer, DataCollatorForSeq2Seq
training_args = Seq2SeqTrainingArguments(
    output_dir="./flan-t5-yelp-analysis",
    eval_strategy="steps",
    eval_steps=500,
    learning_rate=5e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=3,
    predict_with_generate=True,
    fp16=torch.cuda.is_available(),  # Use mixed precision if GPU available
    logging_steps=100,
    save_steps=500,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    push_to_hub=False,  # We'll push manually after training
    report_to="none",  # Disable wandb/tensorboard
)

# Data collator
data_collator = DataCollatorForSeq2Seq(
    tokenizer=tokenizer,
    model=model
)

# Initialize trainer
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

print("Training setup complete!")

  trainer = Seq2SeqTrainer(


Training setup complete!


## 8. Train Model

In [None]:
# Start training
print("Starting training...")
print("This may take 2-4 hours depending on your GPU.")
print("="*50)

trainer.train()

print("\n" + "="*50)
print("Training complete!")

In [40]:
# Start training (QUICK DEBUG MODE)
print("Starting QUICK training (only 200 steps)...")
print("This should take ~5-10 minutes.")
print("="*50)

# ‚ö° ‰∏¥Êó∂ÈôêÂà∂ËÆ≠ÁªÉÊ≠•Êï∞ÔºàË¶ÜÁõñÂéüÈÖçÁΩÆÔºâ
trainer.args.max_steps = 200          # Âè™ËÆ≠ÁªÉ 200 Ê≠•
trainer.args.eval_steps = 50          # ÊØè 50 Ê≠•È™åËØÅ‰∏ÄÊ¨°
trainer.args.logging_steps = 25       # ÊØè 25 Ê≠•ÊâìÂç∞ loss
trainer.args.save_steps = 100         # ÊØè 100 Ê≠•‰øùÂ≠ò‰∏ÄÊ¨°
trainer.args.load_best_model_at_end = False  # Ë∑≥ËøáÂä†ËΩΩ best modelÔºàÂä†ÈÄüÔºâ

# ÂºÄÂßãËÆ≠ÁªÉ
trainer.train()

print("\n" + "="*50)
print("Quick training complete!")

Starting QUICK training (only 200 steps)...
This should take ~5-10 minutes.


Step,Training Loss,Validation Loss
50,0.0,
100,0.0,
150,0.0,
200,0.0,



Quick training complete!


## 9. Evaluate Model

In [41]:
# Evaluate on test set
print("Evaluating model...")
eval_results = trainer.evaluate()

print("\nEvaluation Results:")
for key, value in eval_results.items():
    print(f"{key}: {value:.4f}")

Evaluating model...



Evaluation Results:
eval_loss: nan
eval_runtime: 63.9998
eval_samples_per_second: 78.1250
eval_steps_per_second: 9.7660
epoch: 0.0320


## 10. Test Model with Sample Reviews

In [42]:
# Test with sample reviews
def generate_analysis(review_text, model, tokenizer, max_length=256):
    """Generate analysis for a review"""
    prompt = f"Analyze this restaurant review and provide recommendations for the owner: {review_text}"

    inputs = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True).to(model.device)

    outputs = model.generate(
        **inputs,
        max_length=max_length,
        num_beams=4,
        early_stopping=True,
        temperature=0.7
    )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Test with different star ratings
test_reviews = [
    ("The food was absolutely terrible and the service was even worse. Will never come back.", "1-star"),
    ("It was okay, nothing special. The food was average and service was slow.", "3-star"),
    ("Amazing experience! The food was delicious and the staff was incredibly friendly. Highly recommend!", "5-star")
]

print("Testing model with sample reviews:\n")
print("="*80)

for review, rating in test_reviews:
    print(f"\n{rating} Review:")
    print(f"Review: {review}")
    print(f"\nGenerated Analysis:")
    analysis = generate_analysis(review, model, tokenizer)
    print(analysis)
    print("="*80)

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Testing model with sample reviews:


1-star Review:
Review: The food was absolutely terrible and the service was even worse. Will never come back.

Generated Analysis:
The food was terrible and the service was terrible.

3-star Review:
Review: It was okay, nothing special. The food was average and service was slow.

Generated Analysis:
The food was average and the service was slow.

5-star Review:
Review: Amazing experience! The food was delicious and the staff was incredibly friendly. Highly recommend!

Generated Analysis:
The food was delicious and the staff was very friendly.


## 11. Save Model Locally

In [43]:
# Save model and tokenizer locally
output_dir = "./flan-t5-yelp-analysis-final"

print(f"Saving model to {output_dir}...")
trainer.save_model(output_dir)
tokenizer.save_pretrained(output_dir)

print("Model saved successfully!")

Saving model to ./flan-t5-yelp-analysis-final...
Model saved successfully!


## 12. Upload to Hugging Face Hub

In [44]:
# Upload to Hugging Face Hub
hub_model_id = "RLau33/flan-t5-yelp-analysis"

print(f"Uploading model to Hugging Face Hub: {hub_model_id}")
print("This may take several minutes...")

model.push_to_hub(hub_model_id)
tokenizer.push_to_hub(hub_model_id)

print(f"\nModel uploaded successfully!")
print(f"Model URL: https://huggingface.co/{hub_model_id}")

Uploading model to Hugging Face Hub: RLau33/flan-t5-yelp-analysis
This may take several minutes...


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...nalysis/model.safetensors:   5%|5         | 16.7MB /  308MB            

README.md: 0.00B [00:00, ?B/s]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...elp-analysis/spiece.model: 100%|##########|  792kB /  792kB            


Model uploaded successfully!
Model URL: https://huggingface.co/RLau33/flan-t5-yelp-analysis


## 13. Create Model Card

In [45]:
# Create a model card
model_card = f"""
---
language: en
license: apache-2.0
tags:
- text2text-generation
- flan-t5
- yelp
- restaurant-analysis
datasets:
- Yelp/yelp_review_full
metrics:
- rouge
---

# FLAN-T5 Small for Yelp Review Analysis

This model is a fine-tuned version of `google/flan-t5-small` on the Yelp Review Full dataset.
It generates analysis text and actionable recommendations for restaurant owners based on customer reviews.

## Model Description

- **Model**: FLAN-T5-small (77M parameters)
- **Task**: Text-to-Text Generation
- **Training Data**: Yelp Review Full (50,000 samples)
- **Purpose**: Generate review analysis and owner recommendations

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("RLau33/flan-t5-yelp-analysis")
model = AutoModelForSeq2SeqLM.from_pretrained("RLau33/flan-t5-yelp-analysis")

review = "The food was amazing and service was excellent!"
prompt = f"Analyze this restaurant review and provide recommendations for the owner: {{review}}"

inputs = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=256, num_beams=4)
analysis = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(analysis)
```

## Training Details

- **Base Model**: google/flan-t5-small
- **Training Samples**: 50,000 (10,000 per star rating)
- **Validation Samples**: 5,000 (1,000 per star rating)
- **Epochs**: 3
- **Learning Rate**: 5e-5
- **Batch Size**: 8

## Intended Use

This model is designed for:
- Restaurant review analysis
- Generating actionable recommendations for restaurant owners
- Customer feedback interpretation
- Business intelligence for hospitality industry

## Limitations

- Trained specifically on restaurant reviews
- May not generalize well to other domains
- Generated recommendations are template-based
- Maximum input length: 512 tokens
- Maximum output length: 256 tokens

## Author

Created by RLau33 for ISOM5240 course project.
"""

# Save model card
with open(f"{output_dir}/README.md", "w") as f:
    f.write(model_card)

print("Model card created!")

Model card created!


## ‚úÖ Training Complete!

Your FLAN-T5 model has been successfully trained and uploaded to Hugging Face Hub.

**Next Steps**:
1. Visit your model page: https://huggingface.co/RLau33/flan-t5-yelp-analysis
2. Test the model in the Hugging Face interface
3. Use the model in your Streamlit app

**Model ID for app.py**: `RLau33/flan-t5-yelp-analysis`