## Phase 1 - Georgia Transit Location Repository (Using Google Maps API )

**1. Goal:**
   To create a statewide dataset (`.csv`) of potential public transit stop and hub locations using the Google Places API.

**2. Inputs:**
    *   Google API Key.
    *   Geographic Area Definition (Georgia Bounding Box).
    *   List of relevant Transit Place Types (e.g., `bus_stop`, `subway_station`).
    *   Grid search parameters (step size, search radius).

**3. Outputs:**
    *   `georgia_transit_locations_with_hub.csv`: A CSV file containing unique potential transit locations with their:
        *   `place_id`
        *   `name`
        *   `latitude`
        *   `longitude`
        *   `types` (from Google API)
        *   `vicinity`
        *   `is_potential_hub` (boolean flag based on type/name rules).

**4. Method:**
    1.  Generate grid points covering Georgia.
    2.  For each grid point, iterate through the specified list of transit `place_type`s.
    3.  For each type, call the Google Places API (Nearby Search - Legacy) to find locations matching that `type` within the defined `radius`.
    4.  Handle API pagination to retrieve multiple pages of results if necessary.
    5.  For each result found:
        *   Check if the `place_id` is already recorded (de-duplication).
        *   If new, apply rules to determine `is_potential_hub`.
        *   Add the location's details to a list.
    6.  Periodically and finally, save the complete list of unique locations to the output CSV file.

---

## Code

In [14]:
import os
import re
import csv
import time
import requests
from datetime import datetime

# --- Configuration ---

API_KEY = "AIzaSyC5sawi_f7fI6X47XaKwUBFNvFRJq3VDJc"  
BASE_URL = "https://maps.googleapis.com/maps/api/place/nearbysearch/json"

# Timestamped output file to avoid overwrite
timestamp = datetime.now().strftime("%Y%m%d_%H%M")
OUTPUT_FILE = f"georgia_transit_locations_with_hub_{timestamp}.csv"

# Bounding box over Georgia
MIN_LAT = 30.35
MAX_LAT = 35.00
MIN_LON = -85.60
MAX_LON = -80.84

# Grid and search settings
LAT_STEP = 0.08
LON_STEP = 0.08
RADIUS_METERS = 2000

# Transit types (query each separately)
PLACE_TYPES = [
    "bus_station", "bus_stop", "light_rail_station",
    "subway_station", "train_station", "transit_station"
]

# Rules for identifying "potential hubs"
HUB_INDICATING_TYPES = {"bus_station", "subway_station", "train_station", "transit_station"}
HUB_NAME_KEYWORDS_REGEX = re.compile(r'\b(station|center|hub)\b', re.IGNORECASE)

# Tracking sets and data
found_place_ids = set()
all_places_data = []

# --- Functions ---

def generate_grid_points(min_lat, max_lat, min_lon, max_lon, lat_step, lon_step):
    points = []
    lat = min_lat + lat_step / 2.0
    while lat < max_lat:
        lon = min_lon + lon_step / 2.0
        while lon < max_lon:
            points.append({"lat": lat, "lon": lon})
            lon += lon_step
        lat += lat_step
    return points

def fetch_nearby_places(lat, lon, radius, place_type, api_key, page_token=None):
    params = {
        "location": f"{lat},{lon}",
        "radius": radius,
        "type": place_type,
        "key": api_key,
    }
    if page_token:
        params["pagetoken"] = page_token
        time.sleep(2)  # Required delay for next_page_token

    try:
        response = requests.get(BASE_URL, params=params, timeout=10)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error fetching {place_type} at {lat:.4f},{lon:.4f}: {e}")
        return None

def check_potential_hub(place_types_list, place_name):
    if HUB_INDICATING_TYPES.intersection(place_types_list):
        return True
    if place_name and HUB_NAME_KEYWORDS_REGEX.search(place_name):
        return True
    return False

def save_progress(filename, data):
    try:
        if data:
            fieldnames = data[0].keys()
            with open(filename, 'w', newline='', encoding='utf-8') as f_out:
                writer = csv.DictWriter(f_out, fieldnames=fieldnames)
                writer.writeheader()
                writer.writerows(data)
            print(f"Progress saved: {len(data)} places → {filename}")
        else:
            print("No data to save yet.")
    except IOError as e:
        print(f"Error writing file: {e}")

# --- Main Execution ---

print("Generating grid points...")
grid_points = generate_grid_points(MIN_LAT, MAX_LAT, MIN_LON, MAX_LON, LAT_STEP, LON_STEP)
print(f"Total grid points to search: {len(grid_points)}\n")

for i, point in enumerate(grid_points):
    print(f"\n--- Grid Point {i+1}/{len(grid_points)} — ({point['lat']:.4f}, {point['lon']:.4f}) ---")
    grid_point_new_places = 0

    for place_type in PLACE_TYPES:
        next_page_token = None

        while True:
            results_json = fetch_nearby_places(
                point["lat"], point["lon"], RADIUS_METERS, place_type, API_KEY, next_page_token
            )

            if results_json and results_json.get("status") == "OK":
                for place in results_json.get("results", []):
                    place_id = place.get("place_id")
                    if place_id and place_id not in found_place_ids:
                        found_place_ids.add(place_id)
                        place_name = place.get("name", "")
                        place_types_list = place.get("types", [])

                        is_hub = check_potential_hub(set(place_types_list), place_name)

                        place_data = {
                            "place_id": place_id,
                            "name": place_name,
                            "latitude": place.get("geometry", {}).get("location", {}).get("lat", ""),
                            "longitude": place.get("geometry", {}).get("location", {}).get("lng", ""),
                            "types": ";".join(place_types_list),
                            "vicinity": place.get("vicinity", ""),
                            "is_potential_hub": is_hub,
                            "source_type": place_type,
                        }
                        all_places_data.append(place_data)
                        grid_point_new_places += 1

                next_page_token = results_json.get("next_page_token")
                if not next_page_token:
                    break
            elif results_json and results_json.get("status") == "OVER_QUERY_LIMIT":
                print("⚠️ Rate limit hit. Sleeping 60 seconds...")
                time.sleep(60)
                continue
            else:
                break

    print(f"New places found at this point: {grid_point_new_places}")

    time.sleep(0.2)

    if (i + 1) % 10 == 0:
        print(f"\n💾 [Checkpoint] Saving progress at {i+1} grid points...")
        save_progress(OUTPUT_FILE, all_places_data)

# --- Final Save ---

print(f"\ninished all grid points. Total unique places found: {len(all_places_data)}")
print("Saving final results...")
save_progress(OUTPUT_FILE, all_places_data)
print("🎉 Done.")

Generating grid points...
Total grid points to search: 3480


--- Grid Point 1/3480 — (30.3900, -85.5600) ---
New places found at this point: 4

--- Grid Point 2/3480 — (30.3900, -85.4800) ---
New places found at this point: 0

--- Grid Point 3/3480 — (30.3900, -85.4000) ---
New places found at this point: 1

--- Grid Point 4/3480 — (30.3900, -85.3200) ---
New places found at this point: 0

--- Grid Point 5/3480 — (30.3900, -85.2400) ---
New places found at this point: 0

--- Grid Point 6/3480 — (30.3900, -85.1600) ---
New places found at this point: 2

--- Grid Point 7/3480 — (30.3900, -85.0800) ---
New places found at this point: 1

--- Grid Point 8/3480 — (30.3900, -85.0000) ---
New places found at this point: 1

--- Grid Point 9/3480 — (30.3900, -84.9200) ---
New places found at this point: 0

--- Grid Point 10/3480 — (30.3900, -84.8400) ---
New places found at this point: 1

💾 [Checkpoint] Saving progress at 10 grid points...
Progress saved: 10 places → georgia_transit_locations_w

In [19]:
# Data Quality Check 

import pandas as pd

df = pd.DataFrame(all_places_data) 

duplicates = df[df.duplicated()]
print("🔁 Duplicated Rows:\n", duplicates)

missing_rows = df[df.isnull().any(axis=1)]
print("\n❓ Rows with Missing Values:\n", missing_rows)

🔁 Duplicated Rows:
 Empty DataFrame
Columns: [place_id, name, latitude, longitude, types, vicinity, is_potential_hub, source_type]
Index: []

❓ Rows with Missing Values:
 Empty DataFrame
Columns: [place_id, name, latitude, longitude, types, vicinity, is_potential_hub, source_type]
Index: []
