# Build the data set

Um unser Projektziel umzusetzen, stellen wir uns einen Datensatz zusammen, der folgende Daten enthält:

- tägliche Anzahl gemessener Fahrradfahrer
- mittlere tägliche Temperatur
- boolean, ob Tag Wochentag oder Werktag ist

Dazu importieren wir im Folgenden tägliche Zählersummen verschiedener Fahrradfahrerzählstationen in Baden-Württemberg aus den Jahren 2021 und 2022.

Zu jedem Zählerstandort ermitteln wir die tägliche mittlere Tagestemperatur.

Außerdem berechnen wir für jedes Datum vom 2021-01-01 bis 2022-12-31, ob es sich dabei um einen Werktag oder Wochenende handelt.

All diese Daten fügen wir dann in einem gemeinsamen Datensatz zusammen. Dieser ist weiter unten als 'combined_daily_dat' zu finden.

With hourly bike rider counts from https://www.mobidata-bw.de/dataset/eco-counter-fahrradzahler and temperature data from https://dev.meteostat.net/python/, we construct a dataset of the following form:

| location | daily mean temperature | is business day | daily bike rider count |
| --- | --- | --- | --- |
| --- |... | ... | ... |

_daily mean temperature_ is measured in °C.  
_is business day_ is either $0$ (Saturday or Sunday) or $1$ (Monday, Tuesday, Wednesday, Thursday or Friday)  

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
from datetime import datetime
from meteostat import Point, Daily

## Import bike rider counts data

We import a cleaned dataset that contains the daily number of bike riders at different counting station in Baden-Württemberg in 2021 and 2022.

In [2]:
# import data
daily_bike_rider_counts_cleaned = pd.read_pickle('./../data/processed/daily_bike_rider_counts_cleaned.pkl')

In [3]:
# check whether import of data worked
daily_bike_rider_counts_cleaned

Unnamed: 0,standort,counter_site,channel_name,channel_id,longitude,latitude,date,rider_count
0,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-01,521
1,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-02,1131
2,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-03,764
3,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-04,1607
4,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-05,1668
...,...,...,...,...,...,...,...,...
59090,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-25,795
59091,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-26,529
59092,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-27,556
59093,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-28,517


## Add business day information

In [4]:
# determine for each date whether it is a business day
daily_bike_rider_counts_cleaned['is_busday'] = daily_bike_rider_counts_cleaned['date'].apply(lambda x: x.weekday() < 5) # for Monday-Friday: weekday < 5

In [5]:
# check whether column was added correctly
daily_bike_rider_counts_cleaned

Unnamed: 0,standort,counter_site,channel_name,channel_id,longitude,latitude,date,rider_count,is_busday
0,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-01,521,True
1,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-02,1131,False
2,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-03,764,False
3,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-04,1607,True
4,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-05,1668,True
...,...,...,...,...,...,...,...,...,...
59090,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-25,795,False
59091,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-26,529,True
59092,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-27,556,True
59093,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-28,517,True


## Collect daily average temperature information for counter sites

Get daily average temperature for a time period at a specific location, specified by latitude and longitude.

In [6]:
def get_daily_average_temp_for_time_period(lat, lon, start, end):
    '''
    Get daily average temperature for specific location and time period.
    
    Parameters
    ----------
    lat : number
        latitude
    lon : number
        longtitude
    start : Datetime
        Startdate of period for which to get average temperature
    end : Datetime
        Enddate of period for which to get average temperature
    
    Returns
    -------
    number TODO anpassen
        Daily average temperature for specific location and time period.
    '''
    # location
    loc = Point(lat, lon)

    # import daily data
    data = Daily(loc, start, end)
    data = data.fetch()
    data = data['tavg']

    return data

Collect daily average temperature for all counter sites for the whole year of 2021 and 2022.

In [7]:
# retrieve daily average temperature for all counter sites in 2021 and 2022

# counter sites
unique_counter_sites = daily_bike_rider_counts_cleaned.groupby(by=['standort', 'counter_site', 'longitude', 'latitude'], as_index=False).size()

# time period
mystart = datetime(2021, 1, 1)
myend = datetime(2022, 12, 31)

# initialize dataframe that contains daily average temperature for each counter
# site
temp_df = pd.DataFrame(columns=['date', 'counter_site', 'temperature'])

# for each counter site
for pos_index in np.arange(len(unique_counter_sites)):

    # get position data
    pos = unique_counter_sites.iloc[pos_index]

    # get temperature data for counter site
    temp_dat_for_counter_site = get_daily_average_temp_for_time_period(pos['latitude'], pos['longitude'], mystart, myend)

    # TODO Kommentar?
    # to enable merging of temperature data and bike rider counts, create a
    # dataframe of daily average temperature data that contains date and location information
    temp_df_for_counter_site = pd.DataFrame({'date': temp_dat_for_counter_site.index.date, 'counter_site': pos['counter_site'], 'temperature': temp_dat_for_counter_site.values})

    # append temperature data for the current counter site to overall
    # temperature dataframe
    temp_df = pd.concat([temp_df, temp_df_for_counter_site], ignore_index = True)

# display
temp_df

Unnamed: 0,date,counter_site,temperature
0,2021-01-01,FR1 Dreisam / Otto-Wels-Str.,1.4
1,2021-01-02,FR1 Dreisam / Otto-Wels-Str.,0.8
2,2021-01-03,FR1 Dreisam / Otto-Wels-Str.,-0.5
3,2021-01-04,FR1 Dreisam / Otto-Wels-Str.,0.0
4,2021-01-05,FR1 Dreisam / Otto-Wels-Str.,-0.7
...,...,...,...
18245,2022-12-27,Blautal Lupferbrücke,3.5
18246,2022-12-28,Blautal Lupferbrücke,3.3
18247,2022-12-29,Blautal Lupferbrücke,6.6
18248,2022-12-30,Blautal Lupferbrücke,6.6


## Combine bike rider counts, weekday and temperature information

Merge bike rider counts, weekday and temperature information.

In [8]:
# combine temperature data to daily bike rider counts data
combined_daily_dat = pd.merge(daily_bike_rider_counts_cleaned, temp_df, how='left')
combined_daily_dat

Unnamed: 0,standort,counter_site,channel_name,channel_id,longitude,latitude,date,rider_count,is_busday,temperature
0,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-01,521,True,1.4
1,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-02,1131,False,0.8
2,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-03,764,False,-0.5
3,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-04,1607,True,0.0
4,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,101014585,7.862301,47.99054,2021-01-05,1668,True,-0.7
...,...,...,...,...,...,...,...,...,...,...
53257,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-25,795,False,8.9
53258,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-26,529,True,7.6
53259,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-27,556,True,8.2
53260,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,104057246,9.957680,48.40080,2021-04-28,517,True,10.8


In [9]:
# reorder columns to have the dependent variable in the last column
combined_daily_dat = combined_daily_dat[['standort', 'counter_site', 'channel_name', 'longitude', 'latitude', 'date', 'temperature', 'is_busday', 'rider_count']]
combined_daily_dat

Unnamed: 0,standort,counter_site,channel_name,longitude,latitude,date,temperature,is_busday,rider_count
0,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,7.862301,47.99054,2021-01-01,1.4,True,521
1,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,7.862301,47.99054,2021-01-02,0.8,False,1131
2,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,7.862301,47.99054,2021-01-03,-0.5,False,764
3,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,7.862301,47.99054,2021-01-04,0.0,True,1607
4,Stadt Freiburg,FR1 Dreisam / Otto-Wels-Str.,FR1 Dreisam / Hindenburgstr.,7.862301,47.99054,2021-01-05,-0.7,True,1668
...,...,...,...,...,...,...,...,...,...
53257,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,9.957680,48.40080,2021-04-25,8.9,False,795
53258,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,9.957680,48.40080,2021-04-26,7.6,True,529
53259,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,9.957680,48.40080,2021-04-27,8.2,True,556
53260,Stadt Ulm,Blautal Lupferbrücke,Blautal Lupferbrücke Ri. Süden,9.957680,48.40080,2021-04-28,10.8,True,517


## Export data

In [10]:
# export dataframe
combined_daily_dat.to_pickle('./../data/processed/combined_daily_dat.pkl')