# Finding Indicators of Heavy Traffic on I-94

## Aim
Identify correlations between environmental variables such as weather and time of day and high traffic volumes in order to better target strategies to reduce the negative impacts of slow-moving traffic on the local economy.

## Methodology
Data is likely parametric and monotonic, perhaps linear for date and time and the data set is large enough at 48k rows for any calculation.

Therefore, correlations can best be calculated using:
- Pearson coefficient for continuous data.
- Spearman rank coefficient for categorical data.

## Results
- Strong correlations
- Weak correlations


# Data Set
## Source:
John Hogue, john.d.hogue '@' live.com, Social Data Science & General Mills

Available from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume).

## Data Set Information:
Hourly Interstate 94 Westbound traffic volume for MN DoT ATR station 301, roughly midway between Minneapolis and St Paul, MN. Hourly weather features and holidays included for impacts on traffic volume.



## Data Dictionary
|#|column|Variable|Description|
|---| :---- |  :---   |  :------   |
|0|holiday|Categorical|US National holidays plus regional holiday, Minnesota State Fair
|1|temp| Numeric| Average temp in kelvin
|2|rain_1h| Numeric| Amount in mm of rain that occurred in the hour
|3|snow_1h| Numeric| Amount in mm of snow that occurred in the hour
|4|clouds_all| Numeric| Percentage of cloud cover
|5|weather_main| Categorical| Short textual description of the current weather
|6|weather_description| Categorical| Longer textual description of the current weather
|7|date_time| DateTime| Hour of the data collected in local CST time
|8|traffic_volume| Numeric| Hourly I-94 ATR 301 reported westbound traffic volume

In [1]:
import pandas as pd
traffic = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
traffic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48204 entries, 0 to 48203
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   holiday              48204 non-null  object 
 1   temp                 48204 non-null  float64
 2   rain_1h              48204 non-null  float64
 3   snow_1h              48204 non-null  float64
 4   clouds_all           48204 non-null  int64  
 5   weather_main         48204 non-null  object 
 6   weather_description  48204 non-null  object 
 7   date_time            48204 non-null  object 
 8   traffic_volume       48204 non-null  int64  
dtypes: float64(3), int64(2), object(4)
memory usage: 3.3+ MB


In [4]:
traffic.head()

Unnamed: 0,holiday,temp,rain_1h,snow_1h,clouds_all,weather_main,weather_description,date_time,traffic_volume
0,,288.28,0.0,0.0,40,Clouds,scattered clouds,2012-10-02 09:00:00,5545
1,,289.36,0.0,0.0,75,Clouds,broken clouds,2012-10-02 10:00:00,4516
2,,289.58,0.0,0.0,90,Clouds,overcast clouds,2012-10-02 11:00:00,4767
3,,290.13,0.0,0.0,90,Clouds,overcast clouds,2012-10-02 12:00:00,5026
4,,291.14,0.0,0.0,75,Clouds,broken clouds,2012-10-02 13:00:00,4918


In [5]:
traffic.tail()

Unnamed: 0,holiday,temp,rain_1h,snow_1h,clouds_all,weather_main,weather_description,date_time,traffic_volume
48199,,283.45,0.0,0.0,75,Clouds,broken clouds,2018-09-30 19:00:00,3543
48200,,282.76,0.0,0.0,90,Clouds,overcast clouds,2018-09-30 20:00:00,2781
48201,,282.73,0.0,0.0,90,Thunderstorm,proximity thunderstorm,2018-09-30 21:00:00,2159
48202,,282.09,0.0,0.0,90,Clouds,overcast clouds,2018-09-30 22:00:00,1450
48203,,282.12,0.0,0.0,90,Clouds,overcast clouds,2018-09-30 23:00:00,954


# General Distribution of Traffic Volume


# Day / Night Distribution of Traffic Volume

# Monthly Distribution of Traffic Volume

# Daily Distribution of Traffic Volume

# Hourly Distribution of Traffic Volume

# Correlation: Traffic Volume and Weather

## Continuous variables: 
- temp
- rain_1h
- snow_1h
- clouds_all

# Correlation: Traffic Volume and Weather

## Categorical variables:
- weather_main
- weather_description