# Project Group - 

Members: Arbman, Kelvin; Houterman, Simon; Koetsier, Lars; Linders, Joris 

Student numbers: 

# Research Objective

*Requires data modeling and quantitative research in Transport, Infrastructure & Logistics*

Research Question
* Application project
* Road safety-data
* COVID-data
* Location: Germany
* Timespan: five years before pandemic - 2021 (mid-pandemic)

How was the road safety in Germany affected during the COVID pandemic compared to the previous five years?


# Contribution Statement

*Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling*

**Author 1**:

**Author 2**:

**Author 3**:

# Data Used

* Road Safety
* Covid Positive Tests
* Covid Deaths
* Covid Intensive Care Cases
* Traffic Intensity?


# Libaries Used

In [3]:
import pandas as pd
import numpy as np
import math
import scipy

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

import plotly.io as pio

import datetime


# Data Pipeline

In [4]:
# Pipeline road Safety Data

# File path
file_path = '/Users/kelvinarbman/Documents/Github_test/data/Road Safety Germany.csv'

# Open File
df_road_safety_germany = pd.read_csv(file_path,  delimiter=';')

# Adjust Data

# Step 1: This file has the columns Year (YYYY) & Month (mmm)(de), for convience we will add a date column as the table index. 
#         This column will contain the date of the first day of the corresponding month.
# - Step 1.1: Convert the month values (mmm)(de) (e.i. Januar, Februar, März, ..) to month (mm) values (i.e. 01, 02, 03, ..)
df_road_safety_germany = df_road_safety_germany.replace(['Januar', 'Februar', 'März', 'April', 'Mai', 'Juni', 'Juli', 
                                                         'August', 'September', 'Oktober', 'November', 'Dezember'], 
                                                        ['01','02','03','04','05','06','07','08','09','10','11','12'])

# - Step 1.2: Add [Date] column to dataframe
df_road_safety_germany['Date'] = '01-' + df_road_safety_germany['Month'] + '-' + df_road_safety_germany['Year'].astype(str)
# - Convert string to date value
df_road_safety_germany['Date'] = pd.to_datetime(df_road_safety_germany['Date'], format = '%d-%m-%Y')
# - Set date column as index 
df_road_safety_germany.set_index('Date', inplace=True)

# Step 2: Clean Data, this dataframe contains '...' values for future dates since there is no data available yet.
#         These values will be replaced by empty values 
df_road_safety_germany = df_road_safety_germany.replace({'...': None})



# Step 3: Select only the necesarry columns from the dataframe 
df_road_safety_germany = df_road_safety_germany[['Unfälle mit Personenschaden - Insgesamt', 
                                                'Schwerwiegende Unfälle mit Sachschaden i.e.S - Insgesamt',
                                                'Sonst. Unfälle unter dem Einfluss berausch. Mittel - Insgesamt',
                                                'Übrige Sachschadensunfälle - Insgesamt', 
                                                'Insgesamt - Insgesamt']]

# Step 4: Rename column names from German to English
df_road_safety_germany = df_road_safety_germany.rename(columns={
    'Unfälle mit Personenschaden - Insgesamt': 'Accidents involving human injury', 
    'Schwerwiegende Unfälle mit Sachschaden i.e.S - Insgesamt': 'Serious accidents with material damange',
    'Sonst. Unfälle unter dem Einfluss berausch. Mittel - Insgesamt': 'Accidents under the influence of toxins', 
    'Übrige Sachschadensunfälle - Insgesamt': 'Other accidents',
    'Insgesamt - Insgesamt': 'Total accidents'})

# Step 5: Convert to int
#         First drop future dates without values
df_road_safety_germany = df_road_safety_germany.dropna()
df_road_safety_germany = df_road_safety_germany.astype({'Accidents involving human injury': 'int64'})

# Read File
df_road_safety_germany

Unnamed: 0_level_0,Accidents involving human injury,Serious accidents with material damange,Accidents under the influence of toxins,Other accidents,Total accidents
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2011-01-01,16448,7045,1142,151332,175967
2011-02-01,16227,6138,1043,137738,161146
2011-03-01,21569,5919,1114,154508,183110
2011-04-01,26411,5717,1245,156631,190004
2011-05-01,30831,6099,1291,170684,208905
...,...,...,...,...,...
2022-01-01,16819,5731,1118,152944,176612
2022-02-01,16067,4889,1110,148594,170660
2022-03-01,21398,4457,1184,164299,191338
2022-04-01,20758,5082,1206,168157,195203


In [8]:
# Pipeline Covid Positive Test

# File path
file_path = 'data/positive covid tests germany.csv'

# Open File
df_positive_covid_test_germany = pd.read_csv(file_path,  delimiter=';')

# Adjust Data
# Step 1: This file has the column Year_Week (YYYY_ww) for convience we will add a date column as the table index. 
#         This column will contain the date of the first day of the corresponding week.
# - Step 1.1: create date column
df_positive_covid_test_germany['Date'] = None
# - Step 1.1: Convert the year and week data to dates using the strptime function. 
#             For the year 2020 the week needs to be offset by 7 days, this is due to different interpretations of a 53th 
#             week in 2019.
for i in range(len(df_positive_covid_test_germany)):
    d = df_positive_covid_test_germany['Year_Week'][i]
    if d.startswith('2020'):
        df_positive_covid_test_germany['Date'][i] = datetime.datetime.strptime(d + '-1', "%Y_%W-%w") - datetime.timedelta(days=7)
    elif d.startswith('2021'):
        df_positive_covid_test_germany['Date'][i] = datetime.datetime.strptime(d + '-1', "%Y_%W-%w")
    else:
        df_positive_covid_test_germany['Date'][i] = datetime.datetime.strptime(d + '-1', "%Y_%W-%w")

# - Set date column as index 
df_positive_covid_test_germany.set_index('Date', inplace=True)

# Step 2: Clean Data



# Step 3: Select only the necesarry columns from the dataframe 
df_positive_covid_test_germany = df_positive_covid_test_germany[['Gesamt']]

# Step 4: Rename column names from German to English
df_positive_covid_test_germany = df_positive_covid_test_germany.rename(columns={
    'Gesamt': 'Number of Positive COVID-19 tests per 100.000 capita'})

df_positive_covid_test_germany['Number of Positive COVID-19 tests per 100.000 capita'] = df_positive_covid_test_germany['Number of Positive COVID-19 tests per 100.000 capita'].astype(float)

# Read File
df_positive_covid_test_germany

ValueError: could not convert string to float: '1,09'

In [5]:
# Pipeline Covid Deaths

# File path
file_path = '/Users/kelvinarbman/Documents/Github_test/data/COVID Deaths Germany.csv'

# Open File
df_covid_deaths_germany = pd.read_csv(file_path,  delimiter=';')

# Adjust Data
# Step 1: This file has the column Year_Month (YYYY-mm), for convience we will add a date column as the table index. 
#         This column will contain the date of the first day of the corresponding month.
df_covid_deaths_germany['Date'] = df_covid_deaths_germany['Year-Month'] + '-01'
# - Convert string to date value
df_covid_deaths_germany['Date'] = pd.to_datetime(df_covid_deaths_germany['Date'], format = '%Y-%m-%d')
# - Set date column as index 
df_covid_deaths_germany.set_index('Date', inplace=True)

# Step 2: Clean Data

# Step 3: Select only the necesarry columns from the dataframe 
df_covid_deaths_germany = df_covid_deaths_germany[['Number of Covid Deaths']]

# Step 4: Rename column names


# Read File
df_covid_deaths_germany

Unnamed: 0_level_0,Number of Covid Deaths
Date,Unnamed: 1_level_1
2020-03-01,1120
2020-04-01,6069
2020-05-01,1572
2020-06-01,320
2020-07-01,135
2020-08-01,152
2020-09-01,206
2020-10-01,1480
2020-11-01,8604
2020-12-01,22035


In [31]:
# Pipeline Covid Intensive Care Cases

# File path
file_path = 'data/intensive care covid cases germany.csv'

# Open File
df_ic_cases_covid_germany = pd.read_csv(file_path,  delimiter=',')


# Adjust Data
# Step 1: Convert date data from datetime to date
for i in range(len(df_ic_cases_covid_germany)):
    d = df_ic_cases_covid_germany['date'][i] 
    df_ic_cases_covid_germany['date'][i] = datetime.datetime.strptime(d, '%Y-%m-%dT%H:%M:%S%z').date()


# Step 2: Clean Data

# Step 3: Select only the necesarry columns from the dataframe 

# Step 4: Rename column names
df_ic_cases_covid_germany = df_ic_cases_covid_germany.rename(columns={
    'date': 'Date',
    'COVID-19-Fälle': 'Covid Cases on IC'})

df_ic_cases_covid_germany.set_index('Date', inplace=True)
# Read File
df_ic_cases_covid_germany

# ------------ Remark ---------------
# I have no idea why this error is showing, but is the only way I can convert the datetime value to a date value
# ERROR: 
    # A value is trying to be set on a copy of a slice from a DataFrame
    #
    # See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
    # df_ic_cases_covid_germany['date'][i] = datetime.strptime(d, '%Y-%m-%dT%H:%M:%S%z').date()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0_level_0,Covid Cases on IC
Date,Unnamed: 1_level_1
2020-03-20,200
2020-03-21,308
2020-03-22,364
2020-03-23,451
2020-03-24,616
...,...
2022-10-05,1294
2022-10-06,1344
2022-10-07,1366
2022-10-08,1406


# Chapter 1: Analyse Road Safety Data
* Highlight seasonal trends


In [32]:
df = df_road_safety_germany
fig = px.line(df, x= df_road_safety_germany.index, y= "Accidents involving human injury")
fig.show()

In [33]:
df2 = df_positive_covid_test_germany
fig = px.line(df2, x= df_positive_covid_test_germany.index, y= "Number of Positive COVID-19 tests per 100.000 capita")
fig.show()

## Chapter 2: Analyse Covid Data
* Highlight Covid Trends
* Highlight big change in positive tests
* Hightlight relationship between tests, deaths and IC cases

* Introduce timeline of memorable covid desicions in Germany
    * Possibly simulate the timeline automatically


## Chapter 3: Analyse Relationship between Road Safety Data & Covid Data
* 
