# Austin's Car Crash
#### By: Luca Comba, Hung Tran, Steven Tran

<img src="https://upload.wikimedia.org/wikipedia/en/thumb/a/a0/Seal_of_Austin%2C_TX.svg/1024px-Seal_of_Austin%2C_TX.svg.png" width="100" height="100">

#### Table of Contents
1. [Introduction](#introduction)
2. [Feature Selection](#feature-selection)
3. [Modeling](#modeling)
4. [Conclusion](#conclusion)


In [None]:
# imports
import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

RANDOM_SEED=42

In [2]:
# read data
df = pd.read_csv('data/austin_car_crash_cleaned.csv')

# Introduction

<div id="introduction" />

The original dataset includes records of traffic accidents in Austin, Texas, from 2010 to today, with 216,088 instances and 45 features, including both numerical and categorical data. The dataset can be found at [Austin Crash Report Data](https://catalog.data.gov/dataset/vision-zero-crash-report-data).

## Data Cleaning

We went over the dataset cleansing in the file [cleaning.ipynb](./cleaning.ipynb).


## Exploratory Data Analysis

We went over the exploratory data analysis in the file [exploratory.ipynb](./exploratory.ipynb).

# Feature Selection
<div id="feature-selection" />

# Modeling
<div id="modeling" />

## Feature Selection

In [None]:
# feature variables 
X = df.drop('Outcome',axis=1) 
y = df['Outcome'] 

# Splitting the Data

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=RANDOM_SEED) 

## Pipeline

In [None]:
# creating a pipe using the make_pipeline method 
pipe = make_pipeline(StandardScaler(), LogisticRegression())
  
#fitting data into the model 
pipe.fit(X_train, y_train) 
  
# predicting values 
y_pred = pipe.predict(X_test) 
  
# calculating accuracy score 
accuracy_score = accuracy_score(y_pred,y_test) 
print('accuracy score : ',accuracy_score)

# Conclusion
<div id="conclusion" />