# Report on Analysis of Road Accidents and related Traffic Signs in Berlin in 2020

## Summary

This project report aims to delve deeper into the road accidents that occured in and around Berlin in 2020. With this report, we have mainly tried to identify which types of traffic signs are most commonly associated with road accidents, and whether certain signs are overrepresented at accident sites. The findings of this study could be used to inform the development of targeted traffic safety measures aimed at reducing the number of accidents and injuries on Berlin's roads.

Throughout this report, we have tried to answer the the below three questions to get a more actionable insight into our issues:

1. Which areas in Berlin are more prone to road accidents ?
2. Which traffic signs occur the most and at which locations ?
3. Are there specific traffic signs which appear more frequently near the accident location than others?

### Datasets

We will be considering two datasets for this analysis:

1. Name: Datasource-3502300782194642410: Strassenverkehrsunf√§lle nach Unfallort in Berlin 2020.
   <br>URL: https://www.statistik-berlin-brandenburg.de/opendata/AfSBBB_BE_LOR_Strasse_Strassenverkehrsunfaelle_2020_Datensatz.csv

3. Name: Datasource-7259270924735185334: Traffic Signs: Berlin, 2020.
   <br>URL: https://www.mcloud.de/downloads/mcloud/722EDEC3-38BA-4FE2-B087-18C0434CA34E/traffic_sign_analysis.json


## Install dependencies

In [15]:
%pip install pandas
%pip install plotly
%pip install 'SQLAlchemy==1.4.46'
%pip install nbformat
%pip install scikit-learn
%pip install matplotlib

[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.


## Load data

In [19]:
import pandas as pd

# Read the table Loaction_Traffic_Accidents into a Pandas dataframe
df = pd.read_sql_table('Location_Traffic_Accidents', 'sqlite:///my_database.db')

# Read the table Loaction_Traffic_Signs into a Pandas dataframe
df_road_signs = pd.read_sql_table('Location_Traffic_Signs', 'sqlite:///my_database.db')

## 1. Which areas in Berlin are more prone to road accidents ?

To answer our 1st question, we will use plotly to draw a scatterplot to understand the frequency of accidents. 

In [23]:
import plotly.graph_objects as go
from plotly.offline import iplot

# Create a scatter plot of accident locations
fig = go.Figure(data=go.Scattermapbox(
    lat=df['Latitude'],
    lon=df['Longitude'],
    mode='markers',
    marker=dict(
        size=9,
        color='red',
        opacity=0.7
    ),
    text=df['Categorie_of_the_accident'],
))

# Set the layout with OpenStreetMap style
fig.update_layout(
    mapbox=dict(
        style='open-street-map',
        center=dict(
            lat=df['Latitude'].mean(),
            lon=df['Longitude'].mean()
        ),
        zoom=10
    )
)

# Render the figure as a static image in Jupyter Notebook
iplot(fig, show_link=False)


## Who runs trainstops in germany and where?
To answer our initial question, we use plotly to draw a scatterplot of all train stops in the dataset, overlaying it on a map from OpenStreetMap.

The train stops will be colored based on the `Betreiber_Name`, allowing us to see what area an operator services.