API Key: AIzaSyBS1GuwhRtZ4zd88h2JDiEpdTyNeWQNGzE

# DABN23 ‚Äì Google Places ‚ÄúSight Finder‚Äù (Google Colab + SQLite on Google Drive)

This notebook:
1) Lets you enter a **city**
2) Finds the **top 10** places tagged as `tourist_attraction`, sorted by **number of reviews**
3) Fetches detailed fields (rating, review count, types, accessibility, opening hours, website/phone)
4) **Caches** results in a **SQLite database stored in Google Drive**
   ‚Üí fewer API calls, faster repeat runs, and persistent storage across sessions.

## What gets stored in SQLite?
- `place_id` (primary key)
- compact fields: name, address, rating, review_count, types, accessibility, opening hours, website/phone
- the full `summary_json`
- `fetched_at_utc` timestamp


## 1) Load API key (Colab Secrets)

In [1]:
'''from google.colab import userdata

API_KEY = userdata.get("GOOGLE_MAPS_API_KEY")

if not API_KEY:
    raise RuntimeError(
        "API key not found. Add GOOGLE_MAPS_API_KEY in Colab Secrets (üîë icon on the left) "
        "and re-run this cell."
    )

print("API key loaded (length):", len(API_KEY))'''

'from google.colab import userdata\n\nAPI_KEY = userdata.get("GOOGLE_MAPS_API_KEY")\n\nif not API_KEY:\n    raise RuntimeError(\n        "API key not found. Add GOOGLE_MAPS_API_KEY in Colab Secrets (üîë icon on the left) "\n        "and re-run this cell."\n    )\n\nprint("API key loaded (length):", len(API_KEY))'

In [2]:
#pip install python-dotenv

In [3]:
# API LOADING LOCALLLY AND NOT IN COLAB
import os
from dotenv import load_dotenv

load_dotenv(override=True)  # reads .env and injects into os.environ

API_KEY = os.getenv("GOOGLE_MAPS_API_KEY")

if not API_KEY:
    raise RuntimeError(
        "API key not found. Make sure GOOGLE_MAPS_API_KEY is set in your .env file "
        "in the same directory as this notebook."
    )

print("API key loaded (length):", len(API_KEY))


RuntimeError: API key not found. Make sure GOOGLE_MAPS_API_KEY is set in your .env file in the same directory as this notebook.

## 2) Mount Google Drive and configure the SQLite database

We store the SQLite file in Google Drive so it persists across sessions.

Default path:
`/content/drive/MyDrive/dabn23_places_cache.sqlite`


In [None]:
from google.colab import drive
import os

drive.mount("/content/drive")

DB_PATH = "/content/drive/MyDrive/dabn23_places_cache.sqlite"
print("SQLite DB path:", DB_PATH)
print("DB exists already?", os.path.exists(DB_PATH))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
SQLite DB path: /content/drive/MyDrive/dabn23_places_cache.sqlite
DB exists already? True


: 

## 3) Imports, endpoints, and DB setup

In [None]:
import requests
import sqlite3
import json
from datetime import datetime, timezone
from typing import Dict, Any, List, Optional

PLACES_TEXT_SEARCH_URL = "https://places.googleapis.com/v1/places:searchText"
PLACES_DETAILS_URL_TMPL = "https://places.googleapis.com/v1/places/{place_id}"

# Connect DB and create tables if needed
conn = sqlite3.connect(DB_PATH)
conn.execute("PRAGMA journal_mode=WAL;")

conn.execute("""
CREATE TABLE IF NOT EXISTS place_summary (
    place_id TEXT PRIMARY KEY,
    name TEXT,
    address TEXT,
    rating REAL,
    review_count INTEGER,
    category_primary TEXT,
    types_json TEXT,
    wheelchair_accessible_entrance INTEGER,
    opening_hours_json TEXT,
    website TEXT,
    phone TEXT,
    summary_json TEXT NOT NULL,
    fetched_at_utc TEXT NOT NULL
);
""")

conn.execute("""
CREATE TABLE IF NOT EXISTS city_top10 (
  city_key TEXT PRIMARY KEY,
  city_display TEXT,
  place_ids_json TEXT NOT NULL,
  created_at_utc TEXT NOT NULL
);
""")
conn.commit()

conn.execute("CREATE INDEX IF NOT EXISTS idx_review_count ON place_summary(review_count);")
conn.commit()

print("DB ready.")

DB ready.


: 

## 4) Places API functions + summary builder

In [None]:
def text_search_many(query: str, language_code: str = "en", max_results: int = 20) -> List[Dict[str, Any]]:
    # Text Search (New): get a list of candidate places
    headers = {
        "Content-Type": "application/json",
        "X-Goog-Api-Key": API_KEY,
        "X-Goog-FieldMask": ",".join([
            "places.id",
            "places.displayName",
            "places.formattedAddress",
            "places.rating",
            "places.userRatingCount",
            "places.primaryType",
            "places.types",
        ])
    }
    payload = {"textQuery": query, "languageCode": language_code, "maxResultCount": max_results}
    r = requests.post(PLACES_TEXT_SEARCH_URL, json=payload, headers=headers, timeout=30)
    r.raise_for_status()
    return r.json().get("places", [])


def place_details(place_id: str, language_code: str = "en") -> Dict[str, Any]:
    # Place Details (New): fetch rich fields for one place_id
    url = PLACES_DETAILS_URL_TMPL.format(place_id=place_id)
    headers = {
        "X-Goog-Api-Key": API_KEY,
        "X-Goog-FieldMask": ",".join([
            "id",
            "displayName",
            "formattedAddress",
            "rating",
            "userRatingCount",
            "primaryType",
            "types",
            "accessibilityOptions",
            "regularOpeningHours",
            "websiteUri",
            "nationalPhoneNumber",
        ])
    }
    params = {"languageCode": language_code}
    r = requests.get(url, headers=headers, params=params, timeout=30)
    r.raise_for_status()
    return r.json()


def summarize_place(place: Dict[str, Any]) -> Dict[str, Any]:
    # Normalize Place Details JSON into a compact dictionary
    name = (place.get("displayName") or {}).get("text")
    acc = place.get("accessibilityOptions") or {}
    hours = place.get("regularOpeningHours") or {}
    weekday_desc = hours.get("weekdayDescriptions") or []

    return {
        "name": name,
        "address": place.get("formattedAddress"),
        "rating": place.get("rating"),
        "review_count": place.get("userRatingCount"),
        "category_primary": place.get("primaryType"),
        "types": place.get("types", []),
        "wheelchair_accessible_entrance": acc.get("wheelchairAccessibleEntrance"),
        "opening_hours_weekday_descriptions": weekday_desc,
        "website": place.get("websiteUri"),
        "phone": place.get("nationalPhoneNumber"),
        "place_id": place.get("id"),
    }

def normalize_city(city: str) -> str:
    return city.strip().lower()


: 

## 5) SQLite cache helpers (load/save + TTL)

In [None]:
def normalize_city(city: str) -> str:
    # Used as the PRIMARY KEY in the city_top10 table
    return city.strip().lower()

def get_city_snapshot_place_ids(city: str) -> Optional[List[str]]:
    """
    Returns the stored list of top-10 place_ids for this city if it exists,
    otherwise returns None.
    """
    city_key = normalize_city(city)

    cur = conn.execute(
        "SELECT place_ids_json FROM city_top10 WHERE city_key = ?",
        (city_key,)
    )
    row = cur.fetchone()

    if not row:
        return None

    return json.loads(row[0])

from datetime import datetime, timezone

def save_city_snapshot_place_ids(city: str, place_ids: List[str]) -> None:
    """
    Saves the computed top-10 place_ids for a city.
    If the city already exists, it overwrites (UPSERT).
    """
    city_key = normalize_city(city)

    conn.execute(
        """
        INSERT INTO city_top10 (city_key, city_display, place_ids_json, created_at_utc)
        VALUES (?, ?, ?, ?)
        ON CONFLICT(city_key) DO UPDATE SET
          city_display = excluded.city_display,
          place_ids_json = excluded.place_ids_json,
          created_at_utc = excluded.created_at_utc
        """,
        (
            city_key,
            city.strip(),
            json.dumps(place_ids),
            datetime.now(timezone.utc).isoformat(),
        )
    )
    conn.commit()

def utc_now_iso() -> str:
    return datetime.now(timezone.utc).isoformat()

def iso_to_dt(iso_str: str) -> datetime:
    return datetime.fromisoformat(iso_str)

def get_cached_summary(place_id: str, max_age_days: int = 30) -> Optional[Dict[str, Any]]:
    cur = conn.execute(
        "SELECT summary_json, fetched_at_utc FROM place_summary WHERE place_id = ?",
        (place_id,)
    )
    row = cur.fetchone()
    if not row:
        return None

    summary_json, fetched_at = row

    try:
        fetched_dt = iso_to_dt(fetched_at)
    except Exception:
        return None

    age_days = (datetime.now(timezone.utc) - fetched_dt).total_seconds() / 86400.0
    if age_days > max_age_days:
        return None

    return json.loads(summary_json)

def upsert_summary(summary: Dict[str, Any]) -> None:
    place_id = summary.get("place_id")
    if not place_id:
        return

    w = summary.get("wheelchair_accessible_entrance")
    w_int = 1 if w is True else 0 if w is False else None

    conn.execute(
        """INSERT INTO place_summary (
            place_id, name, address, rating, review_count, category_primary,
            types_json, wheelchair_accessible_entrance, opening_hours_json,
            website, phone, summary_json, fetched_at_utc
        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        ON CONFLICT(place_id) DO UPDATE SET
            name=excluded.name,
            address=excluded.address,
            rating=excluded.rating,
            review_count=excluded.review_count,
            category_primary=excluded.category_primary,
            types_json=excluded.types_json,
            wheelchair_accessible_entrance=excluded.wheelchair_accessible_entrance,
            opening_hours_json=excluded.opening_hours_json,
            website=excluded.website,
            phone=excluded.phone,
            summary_json=excluded.summary_json,
            fetched_at_utc=excluded.fetched_at_utc
        """,
        (
            place_id,
            summary.get("name"),
            summary.get("address"),
            summary.get("rating"),
            summary.get("review_count"),
            summary.get("category_primary"),
            json.dumps(summary.get("types", []), ensure_ascii=False),
            w_int,
            json.dumps(summary.get("opening_hours_weekday_descriptions", []), ensure_ascii=False),
            summary.get("website"),
            summary.get("phone"),
            json.dumps(summary, ensure_ascii=False),
            utc_now_iso(),
        )
    )
    conn.commit()

def get_place_summary_cached(place_id: str, language_code: str = "en") -> dict:
    cached = get_cached_summary(place_id)

    if cached is not None:
        cached["_source"] = "cache"
        return cached

    details = place_details(place_id, language_code=language_code)
    summary = summarize_place(details)
    upsert_summary(summary)
    summary["_source"] = "api"
    return summary


: 

## 6) Top 10 tourist attractions in a city (ranked by review count) + caching

In [None]:
def top_tourist_attractions_by_reviews_static_city(
    city: str,
    n: int = 10,
    language_code: str = "en",
    search_pool: int = 50,
) -> List[Dict[str, Any]]:
    """
    Static city-level cache:
    - If the city already exists in city_top10 -> reuse the stored place_ids
    - Otherwise compute top N once, store place_ids, then reuse forever
    """

    # 1) Try to load the city snapshot (the stored list of place_ids)
    place_ids = get_city_snapshot_place_ids(city)

    if place_ids is not None:
        city_source = "city_snapshot"   # we did NOT recompute the top 10
    else:
        city_source = "computed"        # we WILL compute top 10 now

        # 2) Compute top N place_ids for the city (first time only)
        candidates = text_search_many(
            f"tourist attractions in {city}",
            language_code=language_code,
            max_results=search_pool
        )

        # Strict filter: only tourist attractions
        filtered = [
            p for p in candidates
            if "tourist_attraction" in (p.get("types") or [])
        ]

        # Sort by number of reviews (descending)
        filtered_sorted = sorted(
            filtered,
            key=lambda p: p.get("userRatingCount", 0) or 0,
            reverse=True
        )

        place_ids = [p["id"] for p in filtered_sorted[:n]]

        # 3) Save the snapshot so next time we don't recompute
        save_city_snapshot_place_ids(city, place_ids)

    # 4) Resolve place_ids -> detailed summaries (cached per place_id or fetched once)
    results: List[Dict[str, Any]] = []
    for pid in place_ids[:n]:
        s = get_place_summary_cached(pid, language_code=language_code)
        s["_city_source"] = city_source   # helpful for demo/table
        results.append(s)

    return results


: 

## 7) Interactive UI: enter a city + click Search (shows cache vs API source)

In [None]:
import pandas as pd
import ipywidgets as widgets
from IPython.display import display

def results_to_dataframe(results: List[Dict[str, Any]]) -> pd.DataFrame:
    df = pd.DataFrame(results)
    cols = [
        "name",
        "rating",
        "review_count",
        "category_primary",
        "wheelchair_accessible_entrance",
        "address",
        "website",
        "phone",
        "place_id",
        "_city_source",
        "_source",
    ]
    cols = [c for c in cols if c in df.columns]
    return df[cols]

def print_opening_hours(summary: Dict[str, Any]) -> None:
    hours = summary.get("opening_hours_weekday_descriptions") or []
    if not hours:
        print("No opening hours available.")
        return
    for line in hours:
        print(line)

city_input = widgets.Text(
    value="Paris",
    description="City:",
    placeholder="e.g., Paris, Rome, Stockholm",
    layout=widgets.Layout(width="420px")
)

button = widgets.Button(description="Search", button_style="primary")
output = widgets.Output()

def on_button_click(_):
    with output:
        output.clear_output()
        city = city_input.value.strip()
        if not city:
            print("Please enter a city name.")
            return

        print(f"Searching top tourist attractions in {city} (ranked by review count)...")
        print(f"Cache TTL: {ttl_input.value} days\n")

        try:
            results = top_tourist_attractions_by_reviews_static_city(
                city,
                n=10,
                language_code="en",
                search_pool=50,
                cache_max_age_days=int(ttl_input.value),
            )

            if not results:
                print("No tourist attractions found (type=tourist_attraction). Try another city.")
                return

            display(results_to_dataframe(results))

            print("\nExample opening hours (top result):")
            print("-", results[0].get("name"), f"(source: {results[0].get('_source')})")
            print_opening_hours(results[0])

        except requests.HTTPError as e:
            resp = getattr(e, "response", None)
            if resp is not None:
                print("HTTPError:", resp.status_code)
                print(resp.text[:1200])
            else:
                print("HTTPError:", str(e))

button.on_click(on_button_click)

display(city_input, button, output)

Text(value='Paris', description='City:', layout=Layout(width='420px'), placeholder='e.g., Paris, Rome, Stockho‚Ä¶

Button(button_style='primary', description='Search', style=ButtonStyle())

Output()

: 

## Step 8) Inspect city snapshots‚Äù

In [None]:
df_cities = pd.read_sql_query(
    """
    SELECT
      city_display,
      city_key,
      created_at_utc,
      place_ids_json
    FROM city_top10
    ORDER BY created_at_utc DESC
    """,
    conn
)
df_cities


Unnamed: 0,city_display,city_key,created_at_utc,place_ids_json
0,Paris,paris,2026-02-19T17:13:47.640035+00:00,"[""ChIJLU7jZClu5kcR4PcOOO6p3I0"", ""ChIJD3uTd9hx5..."


: 

In [None]:
conn.execute("SELECT COUNT(*) FROM city_top10").fetchone()


(1,)

: 

## Step 9) New Code Cell

In [None]:
# STEP 9 ‚Äî Optional: Force recompute a city's top 10 snapshot

import ipywidgets as widgets
from IPython.display import display

force_city_input = widgets.Text(
    value="Paris",
    description="City:",
    layout=widgets.Layout(width="400px")
)

force_button = widgets.Button(
    description="Force Recompute",
    button_style="warning"
)

force_output = widgets.Output()

def on_force_click(_):
    with force_output:
        force_output.clear_output()
        city = force_city_input.value.strip()

        if not city:
            print("Please enter a city name.")
            return

        print(f"Forcing recompute for {city}...\n")

        # Recompute top 10
        candidates = text_search_many(
            f"tourist attractions in {city}",
            language_code="en",
            max_results=50
        )

        filtered = [
            p for p in candidates
            if "tourist_attraction" in (p.get("types") or [])
        ]

        filtered_sorted = sorted(
            filtered,
            key=lambda p: p.get("userRatingCount", 0) or 0,
            reverse=True
        )

        place_ids = [p["id"] for p in filtered_sorted[:10]]

        # Overwrite snapshot in DB
        save_city_snapshot_place_ids(city, place_ids)

        print("City snapshot updated successfully.")

force_button.on_click(on_force_click)

display(force_city_input, force_button, force_output)


Text(value='Paris', description='City:', layout=Layout(width='400px'))



Output()

: 

## 10) Tripadvisor (api version)

In [None]:
#RETRIEVE MY IP
import urllib.request

def get_public_ip():
    try:
        # Queries a free API that returns your public IP address
        public_ip = urllib.request.urlopen('https://api.ipify.org').read().decode('utf8')
        return public_ip
    except Exception as e:
        return f"Error retrieving public IP: {e}"

print(f"Your Public IP is: {get_public_ip()}")

Your Public IP is: 194.47.249.12


: 

## 11) Tripadvisor (selenium verison)

In [None]:
#!pip install requests beautifulsoup4 selenium lxml
#uncomment if needed

: 

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd

def dismiss_cookies():
    """Handle TripAdvisor OneTrust cookie banner."""
    try:
        # Step 1: Wait for and click "Show Purposes"
        show_btn = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.ID, "onetrust-pc-btn-handler"))
        )
        show_btn.click()
        time.sleep(1)  # Brief pause for expansion
    except:
        print("Show Purposes not found or already expanded")
    
    try:
        # Step 2: Wait for and click "Reject All"
        reject_btn = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, ".ot-pc-refuse-all-handler"))
        )
        reject_btn.click()
        time.sleep(2)  # Allow dismissal
        print("Cookies dismissed")
    except:
        print("Reject All not found")

def get_tripadvisor_info(place_name: str, lang: str = 'en') -> dict:
    search_url = f"https://www.tripadvisor.com/Search?q={place_name.replace(' ', '+')}&searchNear=&o=0&{lang}"
    driver.get(search_url)
    time.sleep(2)
    
    # Dismiss cookies immediately after load
    dismiss_cookies()
    
    # Proceed with search results
    try:
        # Find and click top result (adjust selector if needed via inspect)
        top_link = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, "div.result a.review_count"))
        )
        top_link.click()
        time.sleep(3)
        
        # Extract data (refine selectors as needed)
        rating_elem = driver.find_element(By.CSS_SELECTOR, "svg[data-testid='icon-rating']")
        rating = rating_elem.get_attribute('aria-label')
        review_count = driver.find_element(By.CSS_SELECTOR, "[data-test-target='REVIEWS_HEADER_COUNT']").text
        
        return {
            'place': place_name,
            'rating': rating,
            'reviews': review_count,
            'url': driver.current_url
        }
    except Exception as e:
        return {'error': str(e), 'place': place_name}

In [None]:
   
''' FIX ANTI ROBOT '''
# Setup headless Chrome
options = webdriver.ChromeOptions()
#options.add_argument('--headless')
driver = webdriver.Chrome(options=options)

venice_info = get_tripadvisor_info("Venice")
print(venice_info)
    
driver.quit()


Show Purposes not found or already expanded
Reject All not found
{'error': 'Message: invalid session id; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#invalidsessionidexception\nStacktrace:\nSymbols not available. Dumping unresolved backtrace:\n\t0x7ff74f444435\n\t0x7ff74f444490\n\t0x7ff74f1dd286\n\t0x7ff74f227eb3\n\t0x7ff74f2616d2\n\t0x7ff74f25b821\n\t0x7ff74f25aab3\n\t0x7ff74f1a6295\n\t0x7ff74f713b30\n\t0x7ff74f70e3b8\n\t0x7ff74f72e72a\n\t0x7ff74f4607e5\n\t0x7ff74f46978c\n\t0x7ff74f1a50fa\n\t0x7ff74f876798\n\t0x7fffbd12e8d7\n\t0x7fffbe6ac40c\n', 'place': 'Venice'}


## Peak hours data using selenium - google maps

In [1]:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

def dismiss_google_consent(driver):
    """Dismiss the GDPR consent banner on Google Maps (EU only)."""
    try:
        accept_btn = WebDriverWait(driver, 8).until(
            EC.element_to_be_clickable((
                By.XPATH,
                '//button[.//span[contains(text(),"Accept all") '
                'or contains(text(),"Reject all")]]'
            ))
        )
        accept_btn.click()
        time.sleep(1)
        print("  Google consent dismissed.")
    except:
        print("  No consent popup found.")


In [2]:
import os
import sqlite3
import pathlib

# Path to TripAdvisor cache in the same folder as this notebook's working dir
TA_DB_PATH = str(pathlib.Path().cwd() / "dabn23_tripadvisor_cache.sqlite")

print("TripAdvisor DB path:", TA_DB_PATH)
print("Exists?", os.path.exists(TA_DB_PATH))

taconn = sqlite3.connect(TA_DB_PATH)
taconn.execute("PRAGMA journal_mode=WAL;")  # safe even if wal/shm files present


TripAdvisor DB path: c:\Users\Fabrizio\OneDrive\LUND DABE\SCRAP SCRAP\Jupiter Notebook\Project 1\dabn23_tripadvisor_cache.sqlite
Exists? True


<sqlite3.Cursor at 0x28bbf4c89c0>

In [3]:
import json
import sqlite3

def get_attraction_names(city: str, conn: sqlite3.Connection):
    """
    Look up stored attraction names for a city from ta_city_top10 / ta_place_summary.
    Returns a list of name strings, or None if city not found.
    """
    citykey = city.strip().lower()

    cur = conn.cursor()

    # Read the stored place_ids_json for this city
    row = cur.execute(
        "SELECT place_ids_json FROM ta_city_top10 WHERE city_key = ?",
        (citykey,)
    ).fetchone()

    if not row:
        # City not in DB
        return None

    place_ids = json.loads(row[0])
    if not place_ids:
        return []

    # Fetch names in the same ranked order
    placeholders = ",".join("?" * len(place_ids))
    name_rows = cur.execute(
        f"SELECT place_id, name FROM ta_place_summary "
        f"WHERE place_id IN ({placeholders})",
        place_ids
    ).fetchall()

    name_map = {pid: name for pid, name in name_rows}

    # Preserve original ranking order
    return [name_map[pid] for pid in place_ids if pid in name_map]


In [4]:
from datetime import datetime

def current_hour_label() -> str:
    """
    Returns current hour in Google Maps format, e.g.:
      14:xx  ‚Üí  '2 pm'
      09:xx  ‚Üí  '9 am'
      12:xx  ‚Üí  '12 pm'
      00:xx  ‚Üí  '12 am'
    """
    now = datetime.now()
    hour = now.hour

    if hour == 0:
        return "12 am"
    elif hour < 12:
        return f"{hour} am"
    elif hour == 12:
        return "12 pm"
    else:
        return f"{hour - 12} pm"

print(f"Current hour label: '{current_hour_label()}'")


Current hour label: '12 pm'


In [5]:
import re
from selenium.common.exceptions import NoSuchElementException, TimeoutException

def get_current_busyness(driver, attraction_name: str) -> int | None:
    """
    Searches Google Maps for attraction_name and returns current busyness
    as an integer percentage (e.g. 77), or None if not available.
    """
    print(f"\n  Searching: {attraction_name}")

    # --- 1. Navigate to Google Maps and search ---
    driver.get("https://www.google.com/maps")
    time.sleep(2)
    dismiss_google_consent(driver)

    # Find search bar using the exact class/attributes you provided
    search_bar = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.NAME, "q"))
    )
    search_bar.clear()
    search_bar.send_keys(attraction_name)

    # Click the search button
    search_btn = driver.find_element(By.CSS_SELECTOR, "button.mL3xi")
    search_btn.click()
    time.sleep(3)

    # --- 2. Handle disambiguation (list vs. direct result) ---
    try:
        # If we land on a results list, click the first result
        first_result = WebDriverWait(driver, 5).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, "a.hfpxzc"))
        )
        first_result.click()
        time.sleep(3)
        print(f"    ‚Üí Clicked top result from list")
    except TimeoutException:
        # Already on the place page directly
        print(f"    ‚Üí Landed directly on place page")

    # --- 3. Check for peak hours parent element ---
    try:
        peak_section = WebDriverWait(driver, 6).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, "div.UmE4Qe"))
        )
    except TimeoutException:
        print(f"    ‚úó No peak hours data available")
        return None

    # --- 4. Find the hourly bar matching the current time ---
    target_label = current_hour_label()
    hourly_bars = peak_section.find_elements(By.CSS_SELECTOR, "div.dpoVLd")

    for bar in hourly_bars:
        aria = bar.get_attribute("aria-label") or ""
        if target_label in aria:
            # Extract percentage with regex: "77% busy at 2 pm."
            match = re.search(r"(\d+)%", aria)
            if match:
                pct = int(match.group(1))
                print(f"    ‚úì Current busyness: {pct}% (matched '{aria.strip()}')")
                return pct

    print(f"    ‚úó No bar found for '{target_label}' (place may be closed now)")
    return None


In [6]:
from datetime import datetime

def scrape_peak_hours(city: str, conn: sqlite3.Connection):
    now_label = datetime.now().strftime("%H:%M")
    results_key = f"Crowdedness_now_{now_label}"

    # Step 1: check city in DB
    print(f"Looking up '{city}' in ta_city_top10...")
    names = get_attraction_names(city, conn)

    if names is None:
        print("No Data on this city!")
        driver.quit()
        return

    print(f"Found {len(names)} attractions: {names}\n")

    # Step 2: scrape each attraction
    crowdedness = {}
    for name in names:
        pct = get_current_busyness(driver, name)
        crowdedness[name] = pct
        time.sleep(2)  # polite delay between searches

    # Step 3: wrap in the named dict and print
    output = {results_key: crowdedness}
    print(f"\n{'='*50}")
    print(f"Results ‚Äî {results_key}")
    print('='*50)
    for attraction, value in crowdedness.items():
        display_val = f"{value}%" if value is not None else "N/A"
        print(f"  {attraction:<45} {display_val}")

    driver.quit()
    print("\nDriver closed.")
    return output




In [None]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# --- Run it ---

options = Options()

options.add_argument(
    "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/120.0.0.0 Safari/537.36"
)

options.add_argument("--lang=en-US")
options.add_experimental_option("prefs", {
    "intl.accept_languages": "en-US,en",
    "profile.default_content_settings.geolocation": 2  # Block location access
})  # Sets Accept-Language header for sites like Google Maps
# Headless intentionally OFF so you can see the browser
# options.add_argument("--headless=new")  # leave this commented out

# Optional: Spoof US IP/geolocation to fully bypass Sweden detection
options.add_experimental_option("useAutomationExtension", False)
options.add_experimental_option("excludeSwitches", ["enable-automation"])


driver = webdriver.Chrome(options=options)
print("Driver started.")

result = scrape_peak_hours("Paris", taconn)

print("\nFinal result dict:")
print(result)

driver.close()
print("Done.")



Driver started.
Looking up 'Paris' in ta_city_top10...
Found 10 attractions: ['The Paris Catacombs', 'Paris Metro', 'Paris by Mouth', 'Cathedrale Notre-Dame de Paris', 'ExperienceFirst Paris', 'Galeries Lafayette Paris Haussmann', 'Big Bus Paris', 'The Seine', 'Tootbus Paris', 'Musee Carnavalet - Histoire de Paris']


  Searching: The Paris Catacombs
  Google consent dismissed.
    ‚Üí Landed directly on place page
    ‚úó No peak hours data available

  Searching: Paris Metro
  No consent popup found.
    ‚Üí Clicked top result from list
    ‚úó No bar found for '12 pm' (place may be closed now)

  Searching: Paris by Mouth
  No consent popup found.


KeyboardInterrupt: 

In [12]:
#if manually you need to close the instance, outside of the function
driver.close()