# Streamlit Model Preparation
In this notebook we will be creating more tailored models for each of our four cities. We will be using a [free weather api](https://www.weatherapi.com/) that will let us get current weather data for any gps coordinates that we choose and we will be translating this to weather data that our model can use.

As always lets start with our imports and get our data loaded in. We'll be looking at all 4 cities we've explored so far!

In [1]:
import pandas as pd
import numpy as np
import pickle
import warnings

import xgboost as xg
from sklearn.model_selection import train_test_split
pd.set_option("display.max_columns", None)

np.random.seed(42)

In [2]:
# filtering depricated warnings for XGBoost classifier. Uncomment code below if you would like to keep warning behavior as normal
warnings.filterwarnings("ignore")

In [3]:
# there's a slight discrepancy in the chicago data for how twilight is being tracked but we wont be using any features related to it in this model so we'll be removing all related columns from all dataframes
atl_data = pd.read_csv('../data/atl_df_cleaned.csv').drop(columns=['sunrise_sunset_Day','sunrise_sunset_Night','sunrise_sunset_nan','civil_twilight_Day','civil_twilight_Night','civil_twilight_nan','nautical_twilight_Day','nautical_twilight_Night','nautical_twilight_nan','astronomical_twilight_Day','astronomical_twilight_Night','astronomical_twilight_nan'])
bos_data = pd.read_csv('../data/bos_df_cleaned.csv').drop(columns=['sunrise_sunset_Day','sunrise_sunset_Night','sunrise_sunset_nan','civil_twilight_Day','civil_twilight_Night','civil_twilight_nan','nautical_twilight_Day','nautical_twilight_Night','nautical_twilight_nan','astronomical_twilight_Day','astronomical_twilight_Night','astronomical_twilight_nan'])
chi_data = pd.read_csv('../data/chi_df_cleaned.csv').drop(columns=['sunrise_sunset','civil_twilight','nautical_twilight','astronomical_twilight'])
dia_data = pd.read_csv('../data/dia_df_cleaned.csv').drop(columns=['sunrise_sunset_Day','sunrise_sunset_Night','sunrise_sunset_nan','civil_twilight_Day','civil_twilight_Night','civil_twilight_nan','nautical_twilight_Day','nautical_twilight_Night','nautical_twilight_nan','astronomical_twilight_Day','astronomical_twilight_Night','astronomical_twilight_nan'])
chi_data.head()

Unnamed: 0,severity,start_lat,start_lng,end_lat,end_lng,distance(mi),temperature(f),wind_chill(f),humidity(%),pressure(in),visibility(mi),wind_speed(mph),precipitation(in),amenity,bump,crossing,give_way,junction,no_exit,railway,roundabout,station,stop,traffic_calming,traffic_signal,turning_loop,start_time_ep,end_time_ep,weather_timestamp_ep,year,month,week,accident_duration,side_L,side_R,wind_direction_CALM,wind_direction_E,wind_direction_ENE,wind_direction_ESE,wind_direction_N,wind_direction_NE,wind_direction_NNE,wind_direction_NNW,wind_direction_NW,wind_direction_S,wind_direction_SE,wind_direction_SSE,wind_direction_SSW,wind_direction_SW,wind_direction_VAR,wind_direction_W,wind_direction_WNW,wind_direction_WSW,weather_condition_Clear,weather_condition_Cloudy,weather_condition_Fog,weather_condition_Freezing Rain,weather_condition_Heavy Rain,weather_condition_Heavy Snow,weather_condition_Light Drizzle,weather_condition_Light Freezing Drizzle,weather_condition_Light Freezing Fog,weather_condition_Light Freezing Rain,weather_condition_Light Ice Pellets,weather_condition_Light Rain,weather_condition_Light Rain Showers,weather_condition_Light Rain with Thunder,weather_condition_Light Snow,weather_condition_Light Thunderstorms and Snow,weather_condition_Mist,weather_condition_Overcast,weather_condition_Partly Cloudy,weather_condition_Rain,weather_condition_Snow,weather_condition_Thunder,weather_condition_Thunderstorms and Rain,day_Friday,day_Monday,day_Saturday,day_Sunday,day_Thursday,day_Tuesday,day_Wednesday,hour_00,hour_01,hour_02,hour_03,hour_04,hour_05,hour_06,hour_07,hour_08,hour_09,hour_10,hour_11,hour_12,hour_13,hour_14,hour_15,hour_16,hour_17,hour_18,hour_19,hour_20,hour_21,hour_22,hour_23,weather_condition_Light Snow with Thunder,weather_condition_Smoke / Windy,weather_condition_Light Snow and Sleet,weather_condition_Haze / Windy,weather_condition_Heavy Snow / Windy,weather_condition_Blowing Dust / Windy,weather_condition_Heavy Blowing Snow,is_DST
0,2,42.30596,-87.96015,42.306916,-87.960918,0.077,37.0,31.0,79.0,29.59,10.0,8.1,0.0,0,0,0,0,0,0,0,0,0,0,0,1,0,1480517000.0,1480539000.0,1480518000.0,2016,11,48,21600.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,2,42.17587,-88.13577,42.17462,-88.135794,0.086,37.0,28.1,76.0,29.62,10.0,15.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1,0,1480518000.0,1480540000.0,1480518000.0,2016,11,48,21600.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,3,42.175897,-88.135769,42.174612,-88.135794,0.089,37.0,28.1,76.0,29.62,10.0,15.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1,0,1480519000.0,1480541000.0,1480518000.0,2016,11,48,22524.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,3,41.860591,-87.992749,41.86059,-87.9927,0.003,37.0,30.4,76.0,29.61,10.0,9.2,0.0,0,0,0,0,0,0,0,0,0,0,0,1,0,1480522000.0,1480544000.0,1480522000.0,2016,11,48,21600.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,3,41.75033,-87.66344,41.75033,-87.66344,0.0,37.9,30.0,73.0,29.65,10.0,12.7,0.0,0,0,0,0,0,0,0,0,1,0,0,1,0,1480524000.0,1480545000.0,1480524000.0,2016,11,48,21600.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Through the weather api we wont be able to collect data related to significant road features. We'll lose a bit of accuracy for dropping them but we get a huge boost in user-friendliness of the app we'll be designing.

In [4]:
def build_models(df_dict):
    '''Reads in a dictionary of city names assigned to related dataframes and fits an XGBoost model for each city. Each model will be pickled along with its column names in accordance with the name of the city so that it can be used in other notebooks or applications. Returns a list of strings that reports the training and testing accuracty of each model'''
    model_info = []
    for city in df_dict.keys():
        # drop features that can't be collected in the app
        drop_features = ['severity','end_lat','end_lng','distance(mi)','amenity','bump','crossing','give_way','junction','no_exit','railway',
                         'roundabout','station','stop','traffic_calming','traffic_signal','turning_loop','end_time_ep','weather_timestamp_ep',
                         'accident_duration','side_R','is_DST']

        # assign features and target
        X = df_dict[city].drop(drop_features, axis=1)
        y = df_dict[city]['severity']
        
        # split
        X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)
        
        # fit
        xg_model = xg.sklearn.XGBClassifier(random_state=42)
        xg_model.fit(X_train,y_train)
        
        # store
        with open(f'./models/{city.lower()}.pkl', mode='wb') as pickle_out:
            pickle.dump(xg_model, pickle_out)
        with open(f'./models/{city.lower()}_cols.pkl', mode='wb') as pickle_out:
            pickle.dump(X_train.columns, pickle_out)
        
        # store metrics
        model_info.append(f'{city}: Training Accuracy: {xg_model.score(X_train,y_train)} Testing Accuracy: {xg_model.score(X_test,y_test)}')
        
    return model_info

In [5]:
frame_dict = {
    'Atlanta': atl_data,
    'Boston' : bos_data,
    'Chicago': chi_data,
    'Denver' : dia_data
}

build_models(frame_dict)



['Atlanta: Training Accuracy: 0.8484369709965015 Testing Accuracy: 0.6624915368991199',
 'Boston: Training Accuracy: 0.9628818570020804 Testing Accuracy: 0.8489326765188834',
 'Chicago: Training Accuracy: 0.8927722997819649 Testing Accuracy: 0.8327801639984381',
 'Denver: Training Accuracy: 0.9142683772538142 Testing Accuracy: 0.734789391575663']

We can see here that not all cities are performing the same. We might need to adjust our data sets to focus in on different coordinates to get a better picture for each city. It also might be possible that traffic flows are much more difficult to model in different cities due to the overall infrastructure layout. The Chicago data performs the best since we spent the most time on EDA and cleaning here but we can still get some degree of usefulness from the other cities we explored! If you want to see the mechanics of the app head on over to the streamlit_deployment.py file or if you want to see it in action then enter ```streamlit run streamlit_deployment.py``` into your terminal to try out a usable version!