In [None]:
import sys, os
sys.path.append(os.path.abspath('..'))
%load_ext autoreload
%autoreload 2

from modules.config import *
from modules import h3_visualization

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

**Note**:
Before committing we removed all cell outputs. We understand that this is inconvenient, but the plotly maps are too large to be included in the repository.
We are sorry and hope that the execution of this notebook does not take too long. 

As an alternative we uploaded a version with the outputs to [sciebo](https://uni-koeln.sciebo.de/s/uOWeBconyLHOzLf) (the password is the same as for the sciebo folder as provided on ilias).

# Spatial And Temporal Analysis Of Availability
In this notebook we will analyse the temporal and spatial availability of bicycles.
We will also vary the spatial and temporal resolution to see how the displayed information changes.

In [None]:
availability_all = pd.read_parquet(AVAILABILITY_PATH)

In [None]:
availability_all.head()

First we define methods that return the data in different shapes, in which we
can easiliy create plots.  
These methods also take the temporal and spatial resolution as an input.

In [None]:
def get_availability(h3_res: int, time_interval_length: int):
    return availability_all.xs((h3_res, time_interval_length)).reset_index()


def get_average_availability_per_hexagon(h3_res: int, time_interval_length: int):
    return get_availability(h3_res, time_interval_length).groupby("hex_id").mean()


def get_average_availability_per_hexagon_per_freq(
    h3_res: int, time_interval_length: int, freq: str
):
    return (
        get_availability(h3_res, time_interval_length)
        .set_index(["hex_id", "datetime"])
        .groupby([pd.Grouper(level="hex_id"), pd.Grouper(level="datetime", freq=freq)])
        .sum()
        .reset_index()
    )


def get_average_daily_availability(h3_res: int, time_interval_length: int):
    return (
        get_average_availability_per_hexagon_per_freq(h3_res, time_interval_length, "d")
        .groupby("datetime")
        .mean()
    )


## Availability Per Hexagon

First we will look at the average availability of bicycles in each hexagon.

In [None]:
def plot_availability_per_hexagon(h3_res: int, time_interval_length: int):
	h3_visualization.plot_choropleth(
		get_average_availability_per_hexagon(h3_res, time_interval_length).reset_index(),
		hex_col="hex_id",
		color_by_col="n_bikes",

		hover_name="hex_id",
		hover_data=['n_bikes'],
		labels={'n_bikes': '# available bikes'},
		opacity=0.7,
		color_continuous_scale="blues",

		zoom=10,
		width=800,
		height=600,
		center={"lat": 51.3397, "lon": 12.3731},
		mapbox_style="open-street-map",
	)

In [None]:
plot_availability_per_hexagon(9, 6)

We can see the hexagon with the highest average availability of roughly 6 bicycles per 6 hour time interval is in the west of the city.
When investigating the area around it using Google Street View, we can see that the hexagon moslty consists of residential buildings and some restaurants.
![](../figures/availability_max_location.png)
It seems as many people use bicycles to get home and less people use them to get from home away.
Let us also look at the landuse for that hexagon.

### Compare With Landuse 

In [None]:
top_available_hexagons = (
    get_average_availability_per_hexagon(9, 6)
    .reset_index()
    .sort_values("n_bikes", ascending=False)
    .head(3)
)
top_available_hexagons


In [None]:
landuse = pd.read_parquet(HEXAGONS_WITH_LAND_USE_PATH)
landuse.loc[top_available_hexagons['hex_id'].iloc[0]].idxmax()

As we cann see the hexagon ist mostly covered in `land_use_3`, which is "Continuous urban fabric". This is coherent with our observation from Google Street View. 

The hexagon next to the highest availability hexagon that go along the main road also have an increased availability of bicycles.

If we focus our attention to the center of the map, we can see three neighboring hexagons that have high availability. Inside of those hexagons is Leipzig's main train station, which is very busy. Therefore it seems plausible that the availability of bicycles in these hexagons is high.

### Vary Spatial Resolution

Let us look what happens when we decrease the spatial resolution. 

In [None]:
plot_availability_per_hexagon(8, 6)

As we can see the patterns of the previous resolution are still visible, however the hexagon with the highest availability is now in the center of the map, where the train station is. In this resolution we are not really able to identify the smaller hotspot from the previous resolution.

### Availability Per Hexagon Per Month

Next let's plot the availability per hexagon again, but this time for each month. Then we can try to find seasonal patterns in the spatial availability of bicycles.

In [None]:
availability_month_hex = get_average_availability_per_hexagon_per_freq(
    9, 6, "M"
)

In [None]:
availability_month_hex['month'] = availability_month_hex.datetime.dt.month

In [None]:
# only for vscode
# https://github.com/microsoft/vscode-jupyter/issues/4364#issuecomment-817352686
import plotly.io as pio
pio.renderers.default = 'notebook_connected'

In [None]:
h3_visualization.plot_choropleth(
    availability_month_hex.reset_index(),
    hex_col="hex_id",
    color_by_col="n_bikes",
	animation_frame="month",

    hover_data=['n_bikes'],
    labels={'n_bikes': '# available bikes'},

    opacity=0.7,
    color_continuous_scale="blues",


    center={"lat": 51.3397, "lon": 12.3731},
    height=600,
    width=800,
	zoom=10,
    mapbox_style="open-street-map",
)


For seasonal patterns we can see a small increase of availability in August in the far east of the map, next to a Lake called "Kulkwitzer See". This increase could be due to the fact that people like to go to the lake in the summer. However, we cannot observe the same behaviour for other lakes in in the area.  
We also observe that during the winter the availability at the train station in the center of the map is very high, while it decreases compared to other hotspots during the summer. This could be due to the fact that when it's cold outside people prefer public transport to bicycles and only use bicycles to get to the nearest train station (e.g. the main train station).

### Daily Availability
Next we will look at the daily availability of bicycles.

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
get_average_daily_availability(9,6).plot(ax=ax)
ax.set_ylabel('Number of bikes')
ax.set_xlabel('Date')
ax.set_title('Daily availability')
plt.show()

**Note:** When plotted only along the time dimension the availability basically approximates the fleet size.
We can clearly see that the availability of bicycles increases in the summer months. Most likely NextBike increases the fleet size during the summer as they expect an increased amount of demand.  
Interestingly, we see a drop in availability in the middle of March, that is followed by an increase in availability. It is possible that NextBike takes a large proportion of bicycles out of the system for maintenance before the summer starts. 


In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
get_average_daily_availability(9,1).plot(ax=ax)
ax.set_ylabel('Number of bikes')
ax.set_xlabel('Date')
ax.set_title('Daily availability')
plt.show()

Varying the temporal resolution results in a very similar graphs with more fluctuations.