# City Explorer: Multi-Source Attraction & Activity Discovery with Routing

This notebook is intentionally **readable** and **step-based**.

## What it does
1. Loads required environment variables (API keys + DB path)
2. Initializes a shared SQLite database (migrations + tables)
3. Retrieves **Top 10** items for a city (ranked by review count) from:
   - Google Places (New) — tourist attractions
   - TripAdvisor Content API — attractions/activities (depending on API behavior)
4. Uses a city-level snapshot cache (`city_top10`) and an item-level cache (`item_summary`)
5. Provides an interactive UI using **ipywidgets**

## Required environment variables
- `GOOGLE_MAPS_API_KEY`
- `TRIPADVISOR_API_KEY`
- `DABN23_DB_PATH` (full file path, e.g. `G:\My Drive\dabn23_SharedDatabase\dabn23_cache.sqlite`)


In [1]:
# 0) Dependency check (optional)
# This notebook does NOT auto-install by default (cleaner + more reproducible).
AUTO_INSTALL = False

required = [
    ("requests", "requests"),
    ("pandas", "pandas"),
    ("ipywidgets", "ipywidgets"),
]

missing = []
for import_name, pip_name in required:
    try:
        __import__(import_name)
    except ImportError:
        missing.append(pip_name)

if missing:
    print("Missing packages:", ", ".join(missing))
    print("Install command:")
    print("  pip install " + " ".join(missing))
    if AUTO_INSTALL:
        import sys, subprocess
        subprocess.check_call([sys.executable, "-m", "pip", "install", *missing])
        print("Installed. Re-run this cell if needed.")
else:
    print("All required packages are installed.")


All required packages are installed.


In [2]:
# 1) Make sure we can import from /src (works when running from notebooks/ folder)
import sys
from pathlib import Path

PROJECT_ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print("Project root:", PROJECT_ROOT)


Project root: c:\Users\Samuel\Desktop\Git Repo\dabn23-project1\dabn23


In [3]:
# 2) Load configuration (API keys + DB path)
# config.py fails fast with a helpful error message if something is missing.

from src.config import GOOGLE_API_KEY, TA_API_KEY, DB_PATH

print("Google API key loaded (length):", len(GOOGLE_API_KEY))
print("TripAdvisor API key loaded (length):", len(TA_API_KEY))
print("DB_PATH:", DB_PATH)


Google API key loaded (length): 39
TripAdvisor API key loaded (length): 32
DB_PATH: G:\My Drive\dabn23_SharedDatabase\dabn23_places_cache.sqlite


In [4]:
# 3) Initialize the shared SQLite database (creates the file if it doesn't exist)

from pathlib import Path
from src.db import connect, migrate_if_needed, create_tables

# Ensure parent folder exists (SQLite can create the file, but not the folder)
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)

conn = connect(DB_PATH)
migrate_if_needed(conn)   # handles legacy schemas (e.g., place_ids_json -> item_ids_json)
create_tables(conn)

tables = conn.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
print("✅ DB ready. Tables:", [t[0] for t in tables])


✅ DB ready. Tables: ['city_top10', 'item_summary']


In [5]:
import src.pipelines as p
with open(p.__file__, "r", encoding="utf-8") as f:
    txt = f.read()

print("Length:", len(txt))
print(txt[:800])

Length: 5897
# src/pipelines.py
"""
Top-10 pipelines (snapshot + cache) for DABN23.

This module is meant to replace the notebook-defined pipeline functions so that:
- notebooks stay thin (just call functions)
- snapshot + caching logic lives in src/
- TripAdvisor group filtering is applied BEFORE snapshot is saved
"""

from __future__ import annotations

from typing import Any, Dict, List, Optional

from .cache import (
    get_city_snapshot_item_ids,
    save_city_snapshot_item_ids,
    get_cached_item_summary,
    upsert_item_summary,
)

from . import google_places as g
from . import tripadvisor as ta


def top10_google_attractions(
    conn,
    city: str,
    n: int = 10,
    language: str = "en",
    search_pool: int = 50,
) -> List[Dict[str, Any]]:
    """Top-N Google tourist attractions by revi


In [6]:
# Step 4 definitions with imports

from src.pipelines import top10_city

ALLOW = ["Tours", "Food & Drink", "Outdoor Activities", "Boat Tours & Water Sports", "Nightlife", "Shopping"]
DENY  = ["Sights & Landmarks", "Museums"]

def city_search(city: str):
    return top10_city(conn, city, allow_groups=ALLOW, deny_groups=DENY)

## 5) Interactive search UI (ipywidgets)

Use the controls to choose:
- city
- data source (Google or TripAdvisor)
- type (attraction/activity)

Then click **Search Top 10**.


In [7]:
from src.ui import build_city_widget
build_city_widget(city_search)

VBox(children=(HBox(children=(Text(value='Stockholm', description='City:', layout=Layout(width='420px'), place…

## 6) Optional: "closest two" demo (fallback)

This uses a straight-line distance fallback (Haversine) so the demo works even before
Google Routes API is integrated into `src/routing.py`.


In [8]:
from src.routing import closest_two_fallback

# Example: compute closest two among Google top-10 (needs lat/lng)
city = "Paris"
results = top10_google_attractions(city)

start = results[0]
others = results[1:]

closest = closest_two_fallback(start, others)

print("Start:", start.get("name"))
print("Closest two (fallback distance):")
for c in closest:
    print(" -", c.get("name"))


NameError: name 'top10_google_attractions' is not defined