# BICYCLE COUNTER AT BERG
Number of Bike Passes to Slovakia and Austria at the Berg in 2016 by Time and Date of Transit

## The problem
The mayors of Bratislava and Berg agreed that there will be maintenance in the main road between the two cities somewhere next year. However, it needs to be on a season where it does not interrupt the daily commuters. Can you do an analysis and identify the best time to do the maintenance?

The mayors have a few specific times in mind and they would like to know if there is going to be people commuting in that road.

In [None]:
from glob import glob
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## Data Loading

In [None]:
df = pd.read_csv("Berg_bicycle_counter_2016_2019.csv")
df["date"] = pd.to_datetime(df["date"])

#This one is for you data from 2016 - 2018
df_train = df[df["date"] < pd.Timestamp(2019,1,1)].copy()

#This is "future" data. 2019 is the year that will be maintenance. Use this data only for testing your algorithms
df_test_filtered = df[df["date"] >= pd.Timestamp(2019,1,1)].sample(100)
df_test = df_test_filtered[["date", "is_holiday", "tavg", "wspd", "pres"]].copy()
y_true = df_test_filtered[["hastrip"]]

In [None]:
df_train.head(20)

### Dataset overview
* "date" -> year, month, day, hour
* "Do Slovenska" -> Number of bicycle trips to Slovakia
* "Do Rakuska" -> Number of bicycle trips to Austria
* "hastrip" -> Flag that shows if there was any trip on that time (1 is yes and 0 is no) 
* "is_holiday" -> Flag that shows if the given day was an holiday in either Austria or Slovakia (1 is yes and 0 is no) 
* "Hradza Berg" -> Total of bicycle trips
* "tavg" -> average temperature on that day
* "wspd" -> average windspeed on that day
* "pres" -> average air pressure on that day

In [None]:
#Example of dates that the mayors will ask
df_test.head(10)

In [None]:
print("How many NANs do we have on training set? ", df_train.isna().sum())
print("-"*100)
print("How many NANs do we have on test set? ", df_test.isna().sum())

## Data Manipulation

In [None]:
#Data Manipulation
#we have NANs and that's not good!
df_train = df_train.dropna(axis=0)

In [None]:
#Data Transformation Example
df_train['day'] = df_train.date.dt.day
df_train['month'] = df_train.date.dt.month
df_train['year'] = df_train.date.dt.year
df_train['hour'] = df_train.date.dt.hour
df_train['weekday'] = df_train.date.dt.weekday

## Exploratory Data Analysis (EDA)

In [None]:
#Data Analysis example
#Example 1
plt.figure(figsize=(10,5))
plt.title("Boxplot counting the numbers of hours that have a trip")
sns.boxplot(x="month", y="Hradza Berg", data=df_train)

In [None]:
#Example 2
df_grouped_hour = df_train.groupby(["month"]).sum().reset_index()
plt.figure(figsize=(10,5))
plt.title("Pointplot counting the numbers of hours that have a trip")
sns.pointplot(x="month", y="Hradza Berg", data=df_grouped_hour)

In [None]:
# Example 3
plt.figure()
sns.violinplot(x="hastrip", y="tavg", data=df_train, inner="quart")

In [None]:
# Example 4
plt.figure()
ax = sns.barplot(data=df_train, x='is_holiday', y="Hradza Berg")

### 🤔 Can you think of additional analysis that would be useful to present to the Mayor? Add them here!

## Train an algorithm

In [None]:
# Prepare the data
columns_used = ["month", "hour", "tavg", "pres"]
X_train = df_train[columns_used]
y_train = df_train["hastrip"]

# Prepare the model
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

In [None]:
# Test on the dates mayor gave us
df_test['day'] = df_test.date.dt.day
df_test['month'] = df_test.date.dt.month
df_test['year'] = df_test.date.dt.year
df_test['hour'] = df_test.date.dt.hour
df_test['weekday'] = df_test.date.dt.weekday
X_test = df_test[columns_used]

y_pred = model.predict(X_test)

In [None]:
# Evaluate the results

from sklearn.metrics import confusion_matrix, plot_confusion_matrix, accuracy_score
acc_score = accuracy_score(y_true, y_pred)
print("Accuracy: ", acc_score*100)
plot_confusion_matrix(model, X_test, y_true, values_format = 'd', cmap=plt.cm.Blues)

### 🤔 How can you improve your model so that the Mayor trusts you?

Here are some tips:

+ Use a more powerful model (you can find several ones [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble))
+ Add more features to your model (for instance, `is_holiday`)
+ Normalize your features (check [this](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html))