# Processing data for Yestermap

In [None]:
import pandas as pd
import geopandas
import json

I downloaded my location history for Google Takeout

In [None]:
with open("data/Takeout/Location History/Location History.json") as f:
    data = json.load(f)

In [None]:
from pandas import json_normalize
df = json_normalize(data, "locations")

Preparing coordinates for converting into a Geopandas dataframe.

In [None]:
df["latitude"] = df["latitudeE7"] / 10 ** 7
df["longitude"] = df["longitudeE7"] / 10 ** 7

In [None]:
df

Convert to Geopandas dataframe keeping only the timestamp and geometry.

In [None]:
gdf = geopandas.GeoDataFrame(df, geometry=geopandas.points_from_xy(df.longitude, df.latitude))[["timestampMs", "geometry"]]

Setting the CRS in preparation for spatial joins.

In [None]:
gdf = gdf.set_crs(epsg=4326)

I'm setting the index here because we'll need it later on.

In [None]:
gdf = gdf.reset_index()

In [None]:
gdf

Got geometries for PH cities from https://gadm.org/. I had to convert and simplify shp to geojson.

In [None]:
ph_cities = geopandas.read_file("data/ph_cities.geojson")

Combining city and province because PH likes to repeat location names :P

In [None]:
ph_cities["NAME"] = ph_cities["NAME_2"].str.upper() + ", " + ph_cities["NAME_1"].str.upper()

In [None]:
ph_cities = ph_cities[["NAME", "geometry"]]

In [None]:
ph_cities

Running a spatial join on PH cities and my location history coordinates. We're only keeping the city geometries to reduce granularity.

In [None]:
gdf_ph = geopandas.sjoin(ph_cities, gdf, how="inner", op="intersects")

In [None]:
gdf_ph

Here's were the index from above would be useful. I got all remaining locations that weren't within the PH cities geometries.

In [None]:
gdf_missing = gdf.iloc[gdf.index.difference(gdf_ph_cities.index)]

In [None]:
gdf_missing

This one I got from https://github.com/drei01/geojson-world-cities.

In [None]:
world_cities = geopandas.read_file("data/world_cities.geojson")

In [None]:
gdf_world = geopandas.sjoin(cities, gdf_missing, how="inner", op="intersects")

In [None]:
gdf_world

Combining and cleaning the results.

In [None]:
gdf_cleaned = pd.concat([gdf_ph, gdf_world])

In [None]:
gdf_cleaned = gdf_cleaned.set_crs(epsg=4326)

Getting the centroids of the cities since I only need points.

In [None]:
gdf_cleaned["longitude"] = gdf_cleaned.geometry.centroid.x
gdf_cleaned["latitude"] = gdf_cleaned.geometry.centroid.y

In [None]:
gdf_output = gdf_cleaned.rename(columns={"NAME": "name"})[["timestampMs", "name", "longitude", "latitude"]]

Final output is an ndjson file w/c I'll be loading into Firestore.

In [None]:
gdf_output.to_json("data/location_history.ndjson", orient="records", lines=True)