# Policy Analysis Notebook

The purpose of this notebook is to facilitate the generation of shared micro-mobility policy analysis.
It serves as the central hub from which parameters are declared and subsequent functions for data processing, analysis, and visualisation are run from, while keeping the code and "back-end" processes hidden and concise.

### Imports

In [None]:
import datetime as dt
import scripts.style
from scripts.api_smhi_weather import *
from scripts.data_processing import *
from scripts.data_visualisations import *

### Policy Parameters

**Path**: path to where the trip data is located, ideally placed in /data, and assumed to be a *CSV file*

**City Name**: the place where the policy was implemented

**Before Start / End date**: The timeframe of the selected period before the policy introduction (earliest and latest)

**After Start / End date**: The timeframe of the selected period after the policy introduction (earliest and latest)

**City Coordinates**: The area to which to restrict the policy analysis to, in the following order: North, South, East, West.
Used for data cleaning (i.e. set to broad perimeter if data is already cleaned of spatial outliers), e.g. from previous datasets:
* STH - 59.39, 59.28, 18.18, 17.94;
* GBG - 57.74, 57.67, 12.02, 11.90

In [None]:
data_path = 'data/data_raw.csv'
city_name = 'Stockholm'

before_start_date = dt.date(2022, 8, 5)
before_end_date = dt.date(2022, 8, 27)

after_start_date = dt.date(2022, 9, 15)
after_end_date = dt.date(2022, 9, 30)

city_coordinates = (59.39, 59.28, 18.18, 17.94)

### Preprocessing

The data is read and parsed according to the methods in `scripts/data_processing`. A brief visualisation of the distribution over days, weekdays and operators follows as a sanity check.

In [None]:
df = pd.read_csv(data_path)
df = parse_data(df)
df = clean_data(df, city_coordinates)

In [None]:
vis_dataset_overview(df, before_start_date, after_end_date)

In [None]:
# counts number of hours per day where no data was recorded, visualisation follows later
df_missing = missing_data_hours(df, before_start_date, before_end_date, after_start_date, after_end_date)

### Weather Parameters

In order to filter out any potential effects of the weather on usage behaviour, such as intense rain-/snowfall or very low temperatures, weather data from [SMHI](https://www.smhi.se/data/meteorologi/ladda-ner-meteorologiska-observationer/#param=airTemperatureMinAndMaxOnceEveryDay,stations=active) is fetched.

Currently, the parameters are minimum and maximum temperature, as well as daily precipitation in mm. More information can be found on SMHI's website. The cells below aid in selecting a weather station that records all three parameters, and is within a radius of 30km of the dataset's geographic midpoint.e

In [None]:
# Min temperature, max temperature, and precipitation in mm (daily sum)
params = [19, 20, 5]
data_mid_lat = df.o_lat.mean()
data_mid_lon = df.o_lng.mean()

weather_stations = get_available_stations(params, data_mid_lat, data_mid_lon)
map_stations(weather_stations, data_mid_lat, data_mid_lon)

##### Select a weather station from the plot above and declare its ID (hover to retrieve) in the cell below:

In [None]:
station_key = 98230 #change ID to desired station
df_weather = fetch_weather_data(station_key, params, before_start_date, after_end_date)

In [None]:
plot_weather(df_weather, df_missing, city_name)

### Data Selection

The above plot shows the weatherconditions during the timeframe as well as number of hours during which no data/trips have been recorded (e.g. due to system outage), such that such outlier days can be filtered out. Additionally, the dataset is split into before and after, and additional columns are being computed.

##### Select in the cell below which days should not be included in the dataset, based on weather conditions and missing data throughout the day:

In [None]:
# dataset is split into before and after
df_before = df[(df.isodate >= pd.to_datetime(before_start_date)) & (df.isodate <= pd.to_datetime(before_end_date))]
df_after = df[(df.isodate >= pd.to_datetime(after_start_date)) & (df.isodate <= pd.to_datetime(after_end_date))]

In [None]:
# update days which should be filtered out
df_before = df_before[~df_before.Date.isin([5, 12, 14, 20, 25, 28])]
df_after = df_after[~df_after.Date.isin([16, 17, 18, 19, 20, 21, 22, 26, 30])]