# Groundtruthing using GMaps

## Preparations

modules

In [1]:
import gmaps  # plotting locations on gmaps
import ipywidgets  # plotting next to each other
import pandas as pd  # working with dataframes
from ipywidgets.embed import embed_minimal_html  # saving html files
from helper_functions import (create_lat_long_tuples, aggregate_counts, 
                              extract_polygon)  # own functions

options

In [2]:
# automatic reloading of modules and functions before each cell
%load_ext autoreload
%autoreload 2

In [3]:
# insert your own gmaps api key here
api_key = ''

loading data

In [4]:
df = pd.read_csv('data/vienna_scooter_positions.csv', index_col=0)

## Groundtruthing

First we need to configure our API access and restrict to unique locations.

In [5]:
gmaps.configure(api_key=api_key)

In [6]:
uloc = df[['id', 'lat', 'lon']].drop_duplicates()

In [7]:
print('We only have %s unique locations out of the %s observations.'%(len(uloc), len(df)))

We only have 58182 unique locations out of the 869641 observations.


In [8]:
fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(create_lat_long_tuples(uloc), 
                                  point_radius=7))
embed_minimal_html('plots/all_obs.html', views=[fig])
fig

Figure(layout=FigureLayout(height='420px'))

We can already se a couple of hotspots where there are many location-observations during the entire day. Interestingly enough this does not only concern the inner city, but als more distant places like e.g. Erdberg. According to https://en.wikipedia.org/wiki/Decimal_degrees 4 decimal places are sufficient to identify the street. We can use this to identify hotspots which hold the most scooters during the whole day.

In [9]:
hotspots_4dp = aggregate_counts(uloc, digits=4, newname='scooter_count')\
                                .sort_values('scooter_count', ascending=False)

In [10]:
fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(create_lat_long_tuples(hotspots_4dp[:3]), 
                                  weights=hotspots_4dp.scooter_count[:3],
                                  point_radius=20))
fig.add_layer(gmaps.traffic_layer())
fig

Figure(layout=FigureLayout(height='420px'))

The top three hotspots are all in the western part of the city and not necessarily in the inner city. The *Vienna University*, the *Linke Wienzeile* close to Naschmarkt as well as the recreational area close to the *Burg*. **Third main finding**: Students seem to be a good target group.

In [11]:
len(hotspots_4dp)

17354

More than 17 thousand different districts or zones do not seem feasible for a daily reallocation to the best of my knowledge. For a scalable solution we will repeat the process for three decimal places.

In [12]:
hotspots_3dp = aggregate_counts(df, digits=3, newname='scooter_count')

In [13]:
print('This leaves us with %s unqiue locations.'%len(hotspots_3dp))

This leaves us with 2050 unqiue locations.


This is a number of districts, we can classify inside an activity range, which has to potential to be a valuable input for reallocation. Keep in mind, we have 997 scooters in total to allocate into these locations, so a lot of them will be left empty.

Next question is, does this correspond to where the scooters are being placed in the morning and where they are being left in the evening? As a proxy to plot this we will use the first and the last observation of each scooter.

In [14]:
# first & last locations
df_sorted = df.sort_values(['id', 'time']).reset_index(drop=True)
floc = df_sorted.groupby('id').first()
lloc = df_sorted.groupby('id').last()

In [15]:
len(floc) == len(lloc) == len(df.id.drop_duplicates())

True

In [16]:
center = df_sorted[['lat', 'lon']].agg('median')

In [17]:
fig1 = gmaps.figure(center=center, zoom_level=11)
fig1.add_layer(gmaps.heatmap_layer(create_lat_long_tuples(floc), 
                                   point_radius=7))
fig1.layout.width = '50%'
fig2 = gmaps.figure(center=center, zoom_level=11)
fig2.add_layer(gmaps.heatmap_layer(create_lat_long_tuples(lloc), 
                                   point_radius=7))
fig2.layout.width = '50%'


html_header = ipywidgets.HTML('<h2>First and last locations of scooters</h2>')
ipywidgets.VBox([html_header, ipywidgets.HBox([fig1, fig2])])

VBox(children=(HTML(value='<h2>First and last locations of scooters</h2>'), HBox(children=(Figure(layout=Figur…

We can see that the scooters are more widely spread at the beginning of the day than at the end. A possible explanation could be that they are mostly being used to get into the city.

Now we want to combine this information and investigate the number of scooters in the morning to the number in the evening for the 2050 locations with three decimal places.

In [18]:
morning_count = aggregate_counts(floc, digits=3, newname='morning_count')
evening_count = aggregate_counts(lloc, digits=3, newname='evening_count')

In [19]:
locations_3dp = hotspots_3dp.merge(morning_count, how='outer', on=['lat', 'lon'])\
                            .merge(evening_count, how='outer', on=['lat', 'lon']).fillna(0)

In [20]:
locations_3dp.sort_values('morning_count', ascending=False, inplace=True)
locations_3dp.reset_index(drop=True, inplace=True)
locations_3dp.head()

Unnamed: 0,lat,lon,scooter_count,morning_count,evening_count
0,48.185,16.416,2634,61.0,3.0
1,48.187,16.413,746,18.0,1.0
2,48.186,16.416,478,17.0,0.0
3,48.226,16.361,3234,13.0,5.0
4,48.2,16.365,1993,11.0,1.0


This is definitely a find. Locationa with high numbers of morning scooters also have a relatively high activity throughout the day. Unfortunately however, only few scooters remain once the day is over. Reallocating scooters back to these high activity areas is the **fourth action point**.

Let's have an examplatory look of how this area would look like in the first three polygons of a grid system.

In [21]:
pol1 = extract_polygon(locations_3dp, i=1, level=3, color='blue')
pol2 = extract_polygon(locations_3dp, i=2, level=3, color='red')
pol3 = extract_polygon(locations_3dp, i=3, level=3, color='green')

In [22]:
fig = gmaps.figure(center=pol1.path[0], zoom_level=15)
fig.add_layer(gmaps.drawing_layer(features=[pol1, pol2, pol3],
                                   show_controls=False))
fig

Figure(layout=FigureLayout(height='420px'))

These allocation polygons combined with the domain knowledge of the operations team can be the base for a reallocation algorithm in the early hours of a day or even during low times of activity.

## Saving results

Before we move to feature engineering for scooters and locations, we want to save our new dataframes.
Additionally we already know the last location of each scooter, which will be helpful for relocating them over night.

In [23]:
scooter_lloc = lloc.drop('time', axis=1)
scooter_lloc['lat'] = round(scooter_lloc['lat'], 3)
scooter_lloc['lon'] = round(scooter_lloc['lon'], 3)

In [24]:
scooter_lloc.shape

(997, 2)

In [25]:
locations_3dp.to_csv('data/locations_3dp.csv', index=False)
df_sorted.to_csv('data/df_sorted.csv', index=False)
scooter_lloc.to_csv('data/scooter_last_location.csv')