# 🧠 GenAI Travel Planner: Personalized Itinerary Generator

Welcome to the **GenAI Travel Planner**, a project developed as part of the [Gen AI Intensive Course Capstone 2025Q1](https://www.kaggle.com/competitions/gen-ai-intensive-course-capstone-2025q1).

In this notebook, we build a smart itinerary planner that:
- Generates **personalized travel plans** based on user preferences
- Uses real-world datasets of **museums**, **restaurants**, **natural attractions**, and **transit stations**
- Leverages **Generative AI capabilities** to produce structured and creative multi-day itineraries

### 🚀 GenAI Capabilities Demonstrated:
- **Structured Output / JSON Mode** — for day-by-day travel itineraries
- **Few-shot Prompting** — to guide the itinerary generation
- **Retrieval Augmented Generation (RAG)** — we extract real-world POIs from datasets and use them as context

Let’s get planning!

## 📁 Step 1: Load Datasets

We begin by loading four public datasets related to places of interest:
- **Museums Dataset**: Cultural institutions from the IMLS dataset
- **Yelp Restaurants Dataset**: Dining places with location and category
- **Valley Metro Stations**: Public transport stations in the Phoenix area
- **GNIS Natural Features**: Parks, lakes, trails, etc. from GNIS

These will be used as input for our retrieval and GenAI itinerary generation.

In [1]:
# Dataset sources:
# - Museums: https://www.kaggle.com/datasets/imls/museum-directory/data

import pandas as pd

# Load datasets
museums_df = pd.read_csv("/kaggle/input/museum-directory/museums.csv", low_memory=False)  # From IMLS Kaggle dataset
yelp_df = pd.read_csv("/kaggle/input/yelp-restaurants/yelp_restaurants.csv")  # Custom-cleaned subset from Yelp Academic
valley_metro_df = pd.read_csv("/kaggle/input/phoenix-valley-metro-rail-stations/ValleyMetroRailStations.csv")  # Phoenix Light Rail station dataset
gnis_df = pd.read_csv("/kaggle/input/gnisnational/gnis.csv")

### 🧹 Step 2: Preprocess the Datasets

We standardize and clean each dataset to ensure they are usable in our GenAI pipeline:

- Museums: Extract name, type, city/state, and location
- Yelp Restaurants: Keep name, cuisine categories, and coordinates
- Valley Metro: Extract station name, location, and address
- GNIS: Extract natural feature names, types, and geo-coordinates

All datasets are filtered to remove entries missing geolocation.

In [2]:
# --- 🏛️ Museums ---
museums_poi = museums_df[[
    'Museum Name',
    'Museum Type',
    'Latitude',
    'Longitude',
    'City (Administrative Location)',
    'State (Administrative Location)'
]].rename(columns={
    'Museum Name': 'name',
    'Museum Type': 'type',
    'Latitude': 'latitude',
    'Longitude': 'longitude',
    'City (Administrative Location)': 'city',
    'State (Administrative Location)': 'state'
}).dropna(subset=['latitude', 'longitude'])

# --- 🍽️ Yelp Restaurants ---
yelp_poi = yelp_df[[
    'name', 'categories', 'latitude', 'longitude', 'city', 'state'
]].dropna(subset=['latitude', 'longitude'])

# --- 🚉 Valley Metro Stations ---
metro_poi = valley_metro_df[[
    'StationName', 'POINT_Y', 'POINT_X', 'Address'
]].rename(columns={
    'StationName': 'name',
    'POINT_Y': 'latitude',
    'POINT_X': 'longitude',
    'Address': 'address'
}).dropna(subset=['latitude', 'longitude'])

# --- 🌄 GNIS Natural Features ---
gnis_poi = gnis_df[[
    'FEATURE_NAME', 'FEATURE_CLASS', 'PRIM_LAT_DEC', 'PRIM_LONG_DEC', 'STATE_ALPHA'
]].rename(columns={
    'FEATURE_NAME': 'name',
    'FEATURE_CLASS': 'type',
    'PRIM_LAT_DEC': 'latitude',
    'PRIM_LONG_DEC': 'longitude',
    'STATE_ALPHA': 'state'
}).dropna(subset=['latitude', 'longitude'])

# ✅ Show Cleaned Samples
print("🏛️ Cleaned Museums:")
display(museums_poi.head())

print("🍽️ Cleaned Restaurants:")
display(yelp_poi.head())

print("🚉 Cleaned Metro Stations:")
display(metro_poi.head())

print("🌄 Cleaned Natural Features:")
display(gnis_poi.head())

🏛️ Cleaned Museums:


Unnamed: 0,name,type,latitude,longitude,city,state
0,ALASKA AVIATION HERITAGE MUSEUM,HISTORY MUSEUM,61.17925,-149.97254,ANCHORAGE,AK
1,ALASKA BOTANICAL GARDEN,"ARBORETUM, BOTANICAL GARDEN, OR NATURE CENTER",61.1689,-149.76708,ANCHORAGE,AK
2,ALASKA CHALLENGER CENTER FOR SPACE SCIENCE TEC...,SCIENCE & TECHNOLOGY MUSEUM OR PLANETARIUM,60.56149,-151.21598,KENAI,AK
3,ALASKA EDUCATORS HISTORICAL SOCIETY,HISTORIC PRESERVATION,60.5628,-151.26597,KENAI,AK
4,ALASKA HERITAGE MUSEUM,HISTORY MUSEUM,61.17925,-149.97254,ANCHORAGE,AK


🍽️ Cleaned Restaurants:


Unnamed: 0,name,categories,latitude,longitude,city,state
0,Emerald Chinese Restaurant,Specialty Food|Restaurants|Dim Sum|Imported Fo...,43.605499,-79.652289,Mississauga,ON
1,Musashi Japanese Restaurant,Sushi Bars|Restaurants|Japanese,35.092564,-80.859132,Charlotte,NC
2,Taco Bell,Restaurants|Breakfast & Brunch|Mexican|Tacos|T...,33.495194,-112.028588,Phoenix,AZ
3,Marcos Pizza,Italian|Restaurants|Pizza|Chicken Wings,41.70852,-81.359556,Mentor-on-the-Lake,OH
4,Carluccios Tivoli Gardens,Restaurants|Italian,36.100016,-115.128529,Las Vegas,NV


🚉 Cleaned Metro Stations:


Unnamed: 0,name,latitude,longitude,address
0,19th Ave / Dunlap,33.56709,-112.099389,1935 W Dunlap Ave
1,Center / Main St,33.415098,-111.83066,26 East Main Street
2,Northern / 19th Ave,33.55319,-112.09936,7832 N 19th Ave
3,Glendale / 19th Ave,33.538643,-112.099329,6813 N 19th Ave
4,44th St / Washington,33.44817,-111.987983,4203 East Washington Street


🌄 Cleaned Natural Features:


Unnamed: 0,name,type,latitude,longitude,state
0,Agua Sal Creek,Stream,36.461112,-109.478439,AZ
1,Agua Sal Wash,Valley,36.546112,-109.517607,AZ
2,Aguaje Draw,Valley,34.577496,-109.213616,AZ
3,Arlington State Wildlife Area,Park,33.248655,-112.773505,AZ
4,Bar X Wash,Stream,32.470904,-109.936185,AZ


## 🧠 Step 3: Generate Embeddings + Personalized Matching (with Gemini)

To personalize our itinerary, we use **Gemini's embedding API** to semantically match user interests to museums.

We embed:
- Each museum's name and type
- The user's interest string (e.g. "art and vegetarian food")

Then we calculate cosine similarity to find the top 5 matches.

In [3]:
# ✅ Install Gemini SDK
!pip install -q google-generativeai


# 🔐 Load Gemini API Key from Kaggle Secrets
import google.generativeai as genai
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
genai.configure(api_key=GOOGLE_API_KEY)

# 🧠 Combine name + type for museum entries
# museums_poi["text"] = museums_poi["name"] + " - " + museums_poi["type"]

# # 🔢 Generate Gemini embeddings
# poi_embeddings = []
# for text in museums_poi["text"]:
#     response = genai.embed_content(
#         model="models/embedding-001",
#         content=text,
#         task_type="RETRIEVAL_DOCUMENT"
#     )
#     poi_embeddings.append(response["embedding"])

sample_museums = museums_poi.head(20).copy()
sample_museums["text"] = sample_museums["name"] + " - " + sample_museums["type"]

poi_embeddings = []
for text in sample_museums["text"]:
    response = genai.embed_content(
        model="models/embedding-001",
        content=text,
        task_type="RETRIEVAL_DOCUMENT"
    )
    poi_embeddings.append(response["embedding"])

### 📍 3.1 Region Filter: Focus on Arizona (AZ)

To streamline results and relevance, we limit all datasets to Arizona.
This allows us to generate personalized, localized itineraries and keep the project scoped to a real-world region.

In [4]:
# 🌵 Filter all POIs to Arizona
museums_poi = museums_poi[museums_poi["state"] == "AZ"]
yelp_poi = yelp_poi[yelp_poi["state"] == "AZ"]
gnis_poi = gnis_poi[gnis_poi["state"] == "AZ"]

### 🧠 Step 3.2: Unified POI Embeddings + Semantic Matching (Gemini)

We merge museums, restaurants, and natural features from Arizona,
generate semantic embeddings using Gemini, and match them to the user’s interest.

This builds the foundation for our RAG pipeline.

In [5]:
# 🎯 Sample 100 from each source
sample_museums = museums_poi.head(100).copy()
sample_yelp = yelp_poi.head(100).copy()
sample_gnis = gnis_poi.head(100).copy()

# 🏷️ Add a source column
sample_museums["source"] = "museum"
sample_yelp["source"] = "restaurant"
sample_gnis["source"] = "natural"

# ✍️ Create a unified text column
sample_museums["text"] = sample_museums["name"] + " - " + sample_museums["type"]
sample_yelp["text"] = sample_yelp["name"] + " - " + sample_yelp["categories"]
sample_gnis["text"] = sample_gnis["name"] + " - " + sample_gnis["type"]

# 🧱 Unify the POIs
unified_pois = pd.concat([
    sample_museums[["name", "text", "latitude", "longitude", "source"]],
    sample_yelp[["name", "text", "latitude", "longitude", "source"]],
    sample_gnis[["name", "text", "latitude", "longitude", "source"]],
], ignore_index=True)

print(f"Total POIs to embed: {len(unified_pois)}")
display(unified_pois.head())

Total POIs to embed: 300


Unnamed: 0,name,text,latitude,longitude,source
0,ADOBE MOUNTAIN RAILROAD MUSEUM AND DESERT RAIL...,ADOBE MOUNTAIN RAILROAD MUSEUM AND DESERT RAIL...,33.69775,-112.15233,museum
1,AFRICAN AMERICAN MULTICULTURAL MUSEUM,AFRICAN AMERICAN MULTICULTURAL MUSEUM - HISTOR...,33.45481,-111.92598,museum
2,AGUA CALIENTE,AGUA CALIENTE - HISTORIC PRESERVATION,32.27952,-110.72825,museum
3,AJO HISTORICAL SOCIETY,AJO HISTORICAL SOCIETY - HISTORIC PRESERVATION,32.36256,-112.8721,museum
4,ALWUN HOUSE FOUNDATION,ALWUN HOUSE FOUNDATION - ART MUSEUM,33.45894,-112.05542,museum


### 🎯 Step 3.2 Continued: Match Top POIs with User Interest (via Cosine Similarity)

We now embed the user's interest and compare it against our unified POI dataset using cosine similarity.
The top 10 results form the personalized context for Gemini to generate the itinerary.

In [6]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# 👤 User interest
user_interest = "art, nature, and vegetarian food"

# 🧠 Embed user interest using Gemini
user_embedding = genai.embed_content(
    model="models/embedding-001",
    content=user_interest,
    task_type="RETRIEVAL_QUERY"
)["embedding"]

# 🔢 Generate Gemini embeddings for each POI
poi_embeddings = []
for text in unified_pois["text"]:
    response = genai.embed_content(
        model="models/embedding-001",
        content=text,
        task_type="RETRIEVAL_DOCUMENT"
    )
    poi_embeddings.append(response["embedding"])

# 🔁 Convert embeddings to NumPy
poi_embeddings_np = np.array(poi_embeddings)
user_embedding_np = np.array(user_embedding).reshape(1, -1)

# 📈 Compute cosine similarity
cosine_similarities = cosine_similarity(user_embedding_np, poi_embeddings_np)[0]

# 🔍 Get top 10 matching POIs
top_indices = np.argsort(cosine_similarities)[-10:][::-1]
personalized_pois = unified_pois.iloc[top_indices].reset_index(drop=True)

# ✅ Show personalized POIs
print("🎯 Top 10 personalized POIs based on user interest:")
display(personalized_pois[["name", "source"]])

🎯 Top 10 personalized POIs based on user interest:


Unnamed: 0,name,source
0,AWC CAMPUS GALLERY,museum
1,CENTER FOR CREATIVE PHOTOGRAPHY,museum
2,BEASLEY GALLERY,museum
3,ARIZONA STATE UNIVERSITY ART MUSEUM,museum
4,ART MUSEUM AT NELSON FINE ARTS CENTER,museum
5,BOYCE THOMPSON SOUTHWESTERN ARBORETUM,museum
6,CAREFREE DESERT GARDENS,museum
7,ARBORETUM AT ARIZONA STATE UNIVERSITY,museum
8,ALWUN HOUSE FOUNDATION,museum
9,ARIZONA FOLKLORE PRESERVE,museum


### 🗺️ Step 3.3: Location-Aware Personalized POI Matching

To make the itinerary realistic, we enhance retrieval by calculating the distance from the user's location to each POI using the Haversine formula.

We then rank the POIs using a combination of:
- Semantic similarity to user interest
- Physical proximity (closer places ranked higher)

To ensure our travel recommendations are realistic and local,
we limit personalized POIs to only those within **40 kilometers** of the user's location (Tempe, AZ).

In [7]:
from math import radians, cos, sin, sqrt, atan2
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# 🌍 Define user's location (Tempe, AZ)
user_lat = 33.4255
user_lon = -111.9400

# 📏 Haversine distance function
def haversine(lat1, lon1, lat2, lon2):
    R = 6371
    dlat = radians(lat2 - lat1)
    dlon = radians(lon2 - lon1)
    a = sin(dlat/2)**2 + cos(radians(lat1)) * cos(radians(lat2)) * sin(dlon/2)**2
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    return R * c

# 📉 Compute distance for all POIs
unified_pois["distance_km"] = unified_pois.apply(
    lambda row: haversine(user_lat, user_lon, float(row["latitude"]), float(row["longitude"])),
    axis=1
)

# ✅ Filter to 40 km radius
nearby_pois = unified_pois[unified_pois["distance_km"] <= 40].copy().reset_index(drop=True)

# 🔢 Recompute Gemini embeddings only for nearby POIs
print(f"Recomputing embeddings for {len(nearby_pois)} nearby POIs...")
poi_embeddings = []
for text in nearby_pois["text"]:
    response = genai.embed_content(
        model="models/embedding-001",
        content=text,
        task_type="RETRIEVAL_DOCUMENT"
    )
    poi_embeddings.append(response["embedding"])

# 🧠 Embed user interest again (if not already)
user_embedding = genai.embed_content(
    model="models/embedding-001",
    content=user_interest,
    task_type="RETRIEVAL_QUERY"
)["embedding"]

# 🔁 Cosine similarity
poi_embeddings_np = np.array(poi_embeddings)
user_embedding_np = np.array(user_embedding).reshape(1, -1)
cosine_similarities = cosine_similarity(user_embedding_np, poi_embeddings_np)[0]

# 📊 Add similarity score and rank
nearby_pois["similarity"] = cosine_similarities
ranked_pois = nearby_pois.sort_values(by=["similarity", "distance_km"], ascending=[False, True])
personalized_pois = ranked_pois.head(10).reset_index(drop=True)

# ✅ Show results
print("📍 Top 10 personalized & nearby POIs (within 40 km):")
display(personalized_pois[["name", "source", "distance_km", "similarity"]])

Recomputing embeddings for 133 nearby POIs...
📍 Top 10 personalized & nearby POIs (within 40 km):


Unnamed: 0,name,source,distance_km,similarity
0,ARIZONA STATE UNIVERSITY ART MUSEUM,museum,1.035829,0.596632
1,ALWUN HOUSE FOUNDATION,museum,11.336484,0.590015
2,ARIZONA FOLKLORE PRESERVE,museum,19.16745,0.589866
3,CENTER FOR NATIVE AND URBAN WILDLIFE,museum,10.816119,0.587132
4,Mornin Moonshine,restaurant,12.875929,0.581498
5,ARIZONA ZOOLOGICAL SOCIETY,museum,2.738513,0.579692
6,ARIZONA AFRICAN ART MUSEUM,museum,13.323803,0.576758
7,CHALLENGER LEARNING CENTER OF ARIZONA,museum,39.260943,0.572813
8,Panera Bread,restaurant,21.806448,0.56892
9,ARIZONA MUSEUM OF NATURAL HISTORY,museum,9.925565,0.56502


## 💬 Step 4: Gemini-Powered Itinerary Generation (Using Top 10 Nearby POIs)

We now generate a 2-day structured travel plan using Gemini Pro.  
The itinerary is grounded in the top 10 POIs ranked by both **semantic similarity** to the user’s interests and **proximity** to their current location (within 40 km).

In [8]:
# 🧾 Format POIs for the prompt
poi_list = "\n".join([
    f"{i+1}. {row['name']} ({row['source']}, {round(row['distance_km'], 1)} km away)"
    for i, row in personalized_pois.iterrows()
])

# 🧠 Compose prompt for Gemini
prompt = f"""
You are a helpful travel planner.

The user is located in Tempe, Arizona and is interested in: {user_interest}.

Here are 10 recommended nearby places within 40 km that match their interests:
{poi_list}

Please create a 2-day travel itinerary using these locations.
Split each day into Morning, Afternoon, and Evening.
Be creative but practical.

Return the response in strict JSON format like this:

{{
  "Day 1": {{
    "Morning": "...",
    "Afternoon": "...",
    "Evening": "..."
  }},
  "Day 2": {{
    "Morning": "...",
    "Afternoon": "...",
    "Evening": "..."
  }}
}}
"""

# 🔮 Call Gemini Pro
model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content(prompt)

# 📦 Display Gemini's structured itinerary
print(response.text)

```json
{
  "Day 1": {
    "Morning": "Start your day immersed in art at the **ARIZONA STATE UNIVERSITY ART MUSEUM** (1.0 km). Explore the diverse collections and exhibitions. Allow at least 2 hours to fully appreciate the artwork.",
    "Afternoon": "Head to the **CENTER FOR NATIVE AND URBAN WILDLIFE** (10.8 km) to connect with nature and learn about local wildlife conservation efforts. This museum provides an engaging experience for those interested in the environment and animal life.",
    "Evening": "Enjoy a delicious vegetarian meal at **Mornin Moonshine** (12.9 km). Indulge in their plant-based offerings and enjoy the relaxed atmosphere. Check their menu beforehand for vegetarian options."
  },
  "Day 2": {
    "Morning": "Explore the **ARIZONA MUSEUM OF NATURAL HISTORY** (9.9 km). Journey through the history of Arizona's natural world, from dinosaurs to ancient civilizations. This offers a blend of history and natural science.",
    "Afternoon": "Discover the unique folk art and