In this notebook, I introduce how to detect road from open street map and how to create grid points from road data. 
The generated grid points may be used for "snap to grid". Please refer [original notebook](https://www.kaggle.com/robikscube/indoor-navigation-snap-to-grid-post-processing ) to know the detail of "snap to grid".

Actually, I haven't applied these grids to "Snap to Grid" well yet by some problem, and I'm still trying to figure it out.
Plese comment if there are my mistakes or any idea.

Reference sites:  
https://medium.com/@brendan_ward/how-to-leverage-geopandas-for-faster-snapping-of-points-to-lines-6113c94e59aa

In [None]:
import numpy as np
import pandas as pd
import glob
import os
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
from pathlib import Path
import plotly.express as px

In [None]:
data_dir = Path("../input/google-smartphone-decimeter-challenge")
train_df = pd.read_csv(data_dir / "baseline_locations_train.csv")

In [None]:
# get all ground truth dataframe
gt_df = pd.DataFrame()
for (collection_name, phone_name), df in tqdm(train_df.groupby(["collectionName", "phoneName"])):
    path = data_dir / f"train/{collection_name}/{phone_name}/ground_truth.csv"
    df = pd.read_csv(path)  
    gt_df = pd.concat([gt_df, df]).reset_index(drop=True)   
gt_df.head()

In [None]:
fig = px.scatter_mapbox(gt_df,

                    # Here, plotly gets, (x,y) coordinates
                    lat="latDeg",
                    lon="lngDeg",
                    text='phoneName',

                    #Here, plotly detects color of series
                    color="collectionName",
                    labels="collectionName",

                    zoom=9,
                    center={"lat":37.423576, "lon":-122.094132},
                    height=600,
                    width=800)
fig.update_layout(mapbox_style='stamen-terrain')
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.update_layout(title_text="GPS trafic")
fig.show()

## Target place
Let's take '2021-04-29-US-SJC-2' as an example.

In [None]:
target_collection = '2021-04-29-US-SJC-2'
target_gt_df = gt_df[gt_df["collectionName"]==target_collection].reset_index(drop=True)
target_collection

In [None]:
fig = px.scatter_mapbox(target_gt_df,

                    # Here, plotly gets, (x,y) coordinates
                    lat="latDeg",
                    lon="lngDeg",
                    text='phoneName',

                    #Here, plotly detects color of series
                    color="collectionName",
                    labels="collectionName",

                    zoom=15,
                    center={"lat":37.33351, "lon":-121.8906},
                    height=600,
                    width=800)
fig.update_layout(mapbox_style='stamen-terrain')
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.update_layout(title_text="GPS trafic")
fig.show()

## Import geographical library

In [None]:
!pip install osmnx momepy geopandas

In [None]:
from shapely.geometry import Point
import osmnx as ox
import momepy
import geopandas as gpd

In [None]:
# change pd.DataFrame -> gpd.GeoDataFrame
target_gt_df["geometry"] = [Point(p) for p in target_gt_df[["lngDeg", "latDeg"]].to_numpy()]
target_gt_gdf = gpd.GeoDataFrame(target_gt_df, geometry=target_gt_df["geometry"])
target_gt_gdf.head(5)

In [None]:
target_gt_gdf.plot()

We can get road data from open street map by creating bounding box. 

In [None]:
# get road data from open street map by osmnx
offset = 0.1**5
bbox = target_gt_gdf.bounds + [-offset, -offset, offset, offset]
east = bbox["minx"].min()
west = bbox["maxx"].max()
south = bbox["miny"].min()
north = bbox["maxy"].max()
G = ox.graph.graph_from_bbox(north, south, east, west, network_type='drive')

In [None]:
ox.folium.plot_graph_folium(G)

In [None]:
nodes, edges = momepy.nx_to_gdf(G)

In [None]:
nodes.head()

In [None]:
edges.head()

In this notebook, I use only edges data.

In [None]:
edges.plot()

Since it still contains extra roads, we will leave only the relevant roads.

In [None]:
edges = edges.dropna(subset=["geometry"]).reset_index(drop=True)
hits = bbox.apply(lambda row: list(edges.sindex.intersection(row)), axis=1)
tmp = pd.DataFrame({
    # index of points table
    "pt_idx": np.repeat(hits.index, hits.apply(len)),
    # ordinal position of line - access via iloc later
    "line_i": np.concatenate(hits.values)
})
# Join back to the lines on line_i; we use reset_index() to 
# give us the ordinal position of each line
tmp = tmp.join(edges.reset_index(drop=True), on="line_i")
# Join back to the original points to get their geometry
# rename the point geometry as "point"
tmp = tmp.join(target_gt_gdf.geometry.rename("point"), on="pt_idx")
# Convert back to a GeoDataFrame, so we can do spatial ops
tmp = gpd.GeoDataFrame(tmp, geometry="geometry", crs=target_gt_gdf.crs)

In [None]:
tmp.head()

## Find closest road

In [None]:
tmp["snap_dist"] = tmp.geometry.distance(gpd.GeoSeries(tmp.point))

# Discard any lines that are greater than tolerance from points
tolerance = 0.0005  
tmp = tmp.loc[tmp.snap_dist <= tolerance]
# Sort on ascending snap distance, so that closest goes to top
tmp = tmp.sort_values(by=["snap_dist"])

# group by the index of the points and take the first, which is the
# closest line 
closest = tmp.groupby("pt_idx").first()
# construct a GeoDataFrame of the closest lines
closest = gpd.GeoDataFrame(closest, geometry="geometry")
closest = closest.drop_duplicates("line_i").reset_index(drop=True)

In [None]:
closest.plot()

In [None]:
closest.head()

Then, we can obtain the road data corresponding to the target data. These features may be used for modeling.
  
Next, I generate grid points from road data.

## Generate road points

In [None]:
line_points_list = []
split = 50  # param: number of split in each LineString
for dist in range(0, split, 1):
    dist = dist/split
    line_points = closest["geometry"].interpolate(dist, normalized=True)
    line_points_list.append(line_points)
line_points = pd.concat(line_points_list).reset_index(drop=True)
line_points = line_points.reset_index().rename(columns={0:"geometry"})
line_points["lngDeg"] = line_points["geometry"].x
line_points["latDeg"] = line_points["geometry"].y

In [None]:
fig = px.scatter_mapbox(line_points,

                    # Here, plotly gets, (x,y) coordinates
                    lat="latDeg",
                    lon="lngDeg",

                    zoom=15,
                    center={"lat":37.33351, "lon":-121.8906},
                    height=600,
                    width=800)
fig.update_layout(mapbox_style='stamen-terrain')
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.update_layout(title_text="GPS trafic")
fig.show()

I shared the road detection and creating grid points in this notebook.

I hope it helps. Thanks!