## Data exploration

Adapted from: https://www.kaggle.com/code/nagellette/taxi-trajectory-data-analysis

In [None]:
import numpy as np
import os
import pandas as pd
import rich
import sys
from functools import partial
from tqdm import tqdm

In [None]:
data_path = "data/train.csv"

In [None]:
def load_data(path: str, nrows=10000):
    df = pd.read_csv(path, nrows=nrows)
    return df

_df = load_data(data_path)
_df.head()

The column that we care about here is the `POLYLINE` column. This column contains a list of GPS coordinates that represent the path that the taxi took. The first and last coordinates in the list represent the start and end points of the trip, respectively. The rest of the coordinates represent the path that the taxi took from the start to the end point. We'll preprocess the data here to make it easier to work with later with `geopandas`.

In [None]:
df = _df.copy()

def preprocess_data(df: pd.DataFrame):
    df.drop(df[df["MISSING_DATA"] == True].index, axis=0, inplace=True)
    df["POLYLINE"] = df["POLYLINE"].str.replace("\[", "", regex=False)
    df["POLYLINE"] = df["POLYLINE"].str.replace("\]", "", regex=False)
    df["geo_len"] = df["POLYLINE"].apply(lambda x: len(x))
    df["POLYLINE"] = df["POLYLINE"].apply(lambda x: x.split(","))
    df["POLYLINE"] = df["POLYLINE"].str.join(" ")
    df.drop(df[df["geo_len"] == 0].index, axis=0, inplace=True)


preprocess_data(df)
df.head()

In [None]:
coords = df["POLYLINE"][0]
coords


In [None]:
import re

pattern = r"\[(-?\d+\.\d+) (-?\d+\.\d+)\]"
matches = re.findall(pattern, coords)
for match in matches:
    print(f"x: {match[0]}, y: {match[1]}")

Here, we create a `geometry` column for working with the `LineString` data type

In [None]:
from shapely.geometry import LineString


def add_geometry_column(df: pd.DataFrame):
    temp_all = []
    pattern = r"\[(-?\d+\.\d+) (-?\d+\.\d+)\]"
    df["geometry"] = ""
    df["geometry"] = df["geometry"].astype("object")
    for index, row in tqdm(df.iterrows(), total=df.shape[0]):
        temp = []
        # i = 1
        # temp_coords = "".join(row["POLYLINE"])
        matches = re.findall(pattern, "".join(row["POLYLINE"]))
        # for coord in temp_coords:
        #     if i % 2 == 0:
        #         b = float(coord)
        #         temp.append((a, b))
        #     else:
        #         a = float(coord)

        #     i += 1
        for match in matches:
            temp.append((float(match[0]), float(match[1])))
            # print(f"x: {match[0]}, y: {match[1]}")

        temp_all.append(temp)
        df.at[index, "geometry"] = temp
    df["geo_len"] = df["geometry"].apply(lambda x: len(x))
    df.drop(df[df["geo_len"] < 2].index, axis=0, inplace=True)
    df["geometry"] = df["geometry"].apply(lambda x: LineString(x))


add_geometry_column(df)
df.head()

Now that the data is formatted properly, we can use `geopandas`

In [None]:
import geopandas as gpd
from shapely import wkt

gdf = gpd.GeoDataFrame(df, geometry='geometry')
gdf.head()

In [None]:
gdf["geometry"].values[-1]

## Assign labels

This is the major portion of the project, as deciding whether a user is lost, parking, or driving normally is the objective. We will use the `geopandas` library to help us with this task later.

In [None]:
# TODO: take polylines and convert to labels `['LOST','PARKING','NORMAL']`

## Google Maps API Setup
For geocoding and reverse geocoding, we will use the Google Maps API. You will need to set up a Google Cloud Platform account and enable the Google Maps API. You will also need to create an API key. You can find instructions on how to do this here: https://developers.google.com/maps/documentation/geocoding/get-api-key. For our application, we put the API key in a file called `GOOGLE_MAPS_API_KEY.txt` in the same directory as this notebook.

In [None]:
from geopy.geocoders import GoogleV3
api_key_file = "GOOGLE_MAPS_API_KEY.txt"

In [None]:
with open(api_key_file, "r") as f:
    api_key = f.read()
adapter = GoogleV3(api_key=api_key)

We'll use the Chihuly Garden and Glass museum in Seattle as an example. We'll first geocode the address to get the latitude and longitude coordinates. From there, we can find nearest parking spaces to the museum and display them appropriately. This is another portion of the project we are working on

In [None]:
glass_garden_address = "305 Harrison St, Seattle, WA 98109"  # Chihuly Garden and Glass
loc = adapter.geocode(glass_garden_address)
loc

In [None]:
lat, lon = loc.latitude, loc.longitude
loc.point

### Find parking spots near location
https://stackoverflow.com/questions/23025011/google-place-api-for-parking-spots

https://developers.google.com/maps/documentation/places/web-service/search-nearby#PlaceSearchRequests

In [None]:
import requests

request_url = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?"
radius = 1000 # meters
type_ = "parking"
location = f"{lat},{lon}"
request_url = f"{request_url}location={location}&radius={radius}&type={type_}&key={api_key}"
# request_url

In [None]:
# headers = {'content-type': 'application/json', 'Accept-Charset': 'UTF-8'}
res = requests.get(request_url).json()
res["results"][0]

In [None]:
def get_nearest_parking(lat: int = 0.0, lon: int = 0.0, query: str = None, radius=1000):
    if query is not None:
        loc = adapter.geocode(query)
        lat, lon = loc.latitude, loc.longitude
    request_url = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?"
    location = f"{lat},{lon}"
    request_url = (
        f"{request_url}location={location}&radius={radius}&type=parking&key={api_key}"
    )
    res = requests.get(request_url).json()

    return res["results"]


nearby_parking = get_nearest_parking(lat, lon)
nearby_parking[0]["geometry"]["location"]

In [None]:
for parking in nearby_parking:
    lat = parking["geometry"]["location"]["lat"]
    lng = parking["geometry"]["location"]["lng"]
    print(parking["vicinity"], (lat, lng))


### Visualize using `gmaps`

In [None]:
!jupyter nbextension enable --py --sys-prefix widgetsnbextension
!jupyter nbextension enable --py --sys-prefix gmaps

In [None]:
import gmaps
gmaps.configure(api_key=api_key)

In [None]:
fig = gmaps.figure(center=(lat, lon), zoom_level=12)
fig