This machine learning model will attempt to use regression to accurately predict the number of fatalities result from automobile-related incidents at a certain time, given the hour, day of the week, and weather. It uses data from this source: https://github.com/spencergoff/automobile-fatality-prediction

The Features:
 
* day: Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday
* hour: 0:00am-0:59am, 1:00am-1:59am, 2:00am-2:59am ... 11:00pm-11:59pm 
* weather: Clear, Rain, Sleet or Hail, Snow, Fog or Smoke or Smog, Severe Crosswinds, Other, Cloudy, Blowing Snow, Freezing Rain or Drizzle

The Label: 
* num_fatalities: 0 or any positive integer

In [66]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

In [24]:
data = pd.read_csv('auto_fatalities_data_numeric.csv')

In [25]:
data.head() #shows the first part of the data set

Unnamed: 0,day,hour,weather,num_fatalities
0,0,0,0,211
1,0,0,1,17
2,0,0,2,1
3,0,0,3,2
4,0,0,4,5


In [27]:
features = data.drop('num_fatalities',axis=1) #the features are day, hour, and weather

In [28]:
label = data['num_fatalities']

In [29]:
# training data is randomly chosen from data and comprises 70% of the total data entries
# the other 30% of data entries are assigned to testing, which determines the accuracy of the model after training
# x values are the feature columns (day, hour, and weather)
# y values are the label column (num_fatalities)
x_train, x_test, y_train, y_test = train_test_split(features,label,test_size=0.3,
                                                    random_state=101)

** Scale the Feature Data **

** Use sklearn preprocessing to create a MinMaxScaler for the feature data (in this case, num_fatalities). This scales the data so that the smallest value becomes 0, the largest value becomes 1, and the values in between are assigned proportionally. Fit this scaler only to the training data, since test data may go beyond the bounds of training. Use the scaler to transform x_test and x_train. Then, use the scaled x_test and x_train along with pd.Dataframe to re-create two dataframes (i.e. tables) of scaled data.**

In [37]:
scaler_model = MinMaxScaler()

In [38]:
scaler_model.fit(x_train)

MinMaxScaler(copy=True, feature_range=(0, 1))

In [40]:
x_test = pd.DataFrame(data=scaler_model.transform(x_test),columns=x_test.columns,index=x_test.index)

In [41]:
x_train = pd.DataFrame(data=scaler_model.transform(x_train),columns=x_train.columns,index=x_train.index)

In [43]:
x_test.head() #displays the first 5 rows of x_test

Unnamed: 0,day,hour,weather
1423,0.833333,0.956522,0.333333
850,0.5,0.565217,0.0
808,0.5,0.347826,0.888889
868,0.5,0.608696,0.888889
901,0.5,0.782609,0.111111


** Create the necessary tf.feature_column objects for the estimator. **

In [52]:
day_feature_column = tf.feature_column.numeric_column('day')
hour_feature_column = tf.feature_column.numeric_column('hour')
weather_feature_column = tf.feature_column.numeric_column('weather')

In [53]:
feat_cols = [day_feature_column,hour_feature_column,weather_feature_column]

** Create the input function for the estimator object. **

In [54]:
input_func = tf.estimator.inputs.pandas_input_fn(x=x_train,y=y_train,batch_size=10,
                                               num_epochs=1,shuffle=True)

** Create the estimator model using a DNNRegressor **

In [55]:
# 3 layers of 6 (total # of features = 6) neurons each is a good starting point
model = tf.estimator.DNNRegressor(hidden_units=[6,6,6],feature_columns=feat_cols)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/yh/ftwq6yz50j970hfx8k08g4880000gn/T/tmpna55gig6', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1086b3940>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


##### ** Train the model **

In [56]:
model.train(input_fn=input_func,steps=1000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /var/folders/yh/ftwq6yz50j970hfx8k08g4880000gn/T/tmpna55gig6/model.ckpt.
INFO:tensorflow:loss = 33508.07, step = 1
INFO:tensorflow:global_step/sec: 572.688
INFO:tensorflow:loss = 13581.41, step = 101 (0.175 sec)
INFO:tensorflow:Saving checkpoints for 118 into /var/folders/yh/ftwq6yz50j970hfx8k08g4880000gn/T/tmpna55gig6/model.ckpt.
INFO:tensorflow:Loss for final step: 33193.812.


<tensorflow.python.estimator.canned.dnn.DNNRegressor at 0x1086b34e0>

** Create a prediction input function and create a list of predictions on the test data. **

In [57]:
pred_input_func = tf.estimator.inputs.pandas_input_fn(x=x_test,batch_size=10,num_epochs=1,shuffle=False)

In [58]:
predictions = list(model.predict(pred_input_func))
print(predictions[:5])

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/yh/ftwq6yz50j970hfx8k08g4880000gn/T/tmpna55gig6/model.ckpt-118
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[{'predictions': array([48.224464], dtype=float32)}, {'predictions': array([41.28799], dtype=float32)}, {'predictions': array([21.93421], dtype=float32)}, {'predictions': array([27.125229], dtype=float32)}, {'predictions': array([45.599342], dtype=float32)}]


** Calculate the RMSE (Root Mean Squared Error).  **

In [69]:
final_predictions = []
for pred in predictions:
    final_predictions.append(pred['predictions'])

In [60]:
mean_squared_error(y_test,final_predictions)**0.5

121.06305816930738