# Guided Project: Analysis of the I-94 Traffic Dataset.

This project involves the analysis of the I-94 dataset to understand the most significant drivers of traffic slowness on the route, factors and possible insights to reduce the incidence of traffic slowness.

The goal of this project is to determine indicators of heavy traffic about the westbound region on the I-94 highway on a monthly, weekly and hourly timeframe using weather indicators and time indicators.

Question: What is the greatest contributor to traffic slowness on the I-94 highway?

## 1. Exploring the DataSet

In [6]:
#Import the dataset into a pandas dataframe
import pandas as pd

traffic_data = pd.read_csv("Metro_Interstate_Traffic_Volume.csv")

print(traffic_data.head())

print(traffic_data.tail())

print(traffic_data.info())

ModuleNotFoundError: No module named 'pandas'

In [None]:
#Import matplotlib library to enable visualizations of the dataset.
import matplotlib.pyplot as plt
%matplotlib inline 
#Needed for jupyter to generate visualizations.


#Plot a quick vizualisation to have an idea of the traffic volume data behavior in the data set 
traffic_data["traffic_volume"].plot.hist()

In [None]:
#Use the dataframe series describe function to look into stats about traffic volume.
traffic_data["traffic_volume"].describe()



# Time Indicators

From the above information, it is worthwhile to consider the effects of daytime and nightime on traffic volume
on the traffic volume on the I-94 highway. 25% of the time, 1,193 cars or less traverse the highway indicating either nightime or road down time.
75% of the time, 4,933 cars or less traverse the road indicating a high probablility of day time travel.

A good starting point for analysis would be to section the dataset into two parts (daytime and nighttime) to give our analysis better direction.

Daytime data: 7 a.m. to 7 p.m. (12 hours)

Nighttime data: 7 p.m. to 7 a.m. (12 hours)

NB; The time periods chosen are arbitrary and only allow for a sense of guidance.


In [None]:
#Transform the date_time column to a pandas datetime object to enable ease in working with month, week, and hour metrics.
traffic_data["date_time"] = pd.to_datetime(traffic_data["date_time"])
print(traffic_data["date_time"].head())
print(type(traffic_data["date_time"]))



In [None]:
# Start with "hour" metric by using the dt.hour attribute to extract the hours in the date_time column in the traffic data dataset.
traffic_data["hour"] = traffic_data["date_time"].dt.hour

print(traffic_data["hour"].head())

In [None]:
#The code block below isolates daytime and nighttime hour data to enable a clearer and more distinct analysis.
daytime_data = traffic_data[(traffic_data["hour"] >= 7) & (traffic_data["hour"] < 19)]
print(daytime_data.head())

nighttime_data = traffic_data[(traffic_data["hour"] >= 19) | (traffic_data["hour"] < 7)]
print(nighttime_data.head())


In [None]:
#Plotting a simple visualization to display the relationship existing between day and night traffic volumes on an hourly basis.

#Creating a figure template to house the histograms
plt.figure(figsize=(8,7))

plt.subplot(3,2,1)
plt.hist(daytime_data["traffic_volume"])
plt.title("Traffic Volume - Daytime")
plt.xlabel("Traffic volume")
plt.ylim([0,8000])
plt.xlim([6,20])


plt.subplot(3,2,2)
plt.hist(nighttime_data["traffic_volume"])
plt.title("Traffic Volume - Nighttime")
plt.xlabel("Traffic volume")
plt.ylim([0,8000])
plt.xlim([19,7])





NameError: name 'plt' is not defined

From the above visualizations, Daytime traffic vizualisations indicate higher traffic volume during the day whilst the Nighttime traffic vizaulisations indicate light traffic at night. Night time traffic data should be reconsidered for this analysis to ensure accuracy.

Since our main goal is to find indicators of heavy traffic, working with daytime data solely is optimal.

The next actions would involve working with time indicators such as month, day of the week and time of the day to streamline heavy traffic behavior.



In [None]:
#This block of code shows the mean data of traffic volume grouped by month to reflect the 
# months with the highest traffic volume from January to December.

traffic_data["month"] = traffic_data["date_time"].dt.month
by_month = traffic_data.groupby("month").mean()
by_month["traffic_volume"]

#Generating a line plot to vizualise the traffic volume over the months of the year.
plt.plot(by_month["traffic_volume"])
plt.title("Traffic Volume - Months of the Year")
plt.xticks([1,2,3,4,5,6,7,8,9,10,11,12],['January','February','March','April','May','June','July',"August","September","October","November","December"],rotation=45)
plt.show()

From the line plot generated above for monthly traffic volume intensity, a steady rise was observed over the first half of the year with a sudden drop indicated in July. A drastic drop was also observed in the month of December.

In [None]:
# Generate a line plot to observe the behavior of traffic volume on the days
# time frame.

traffic_data["dayofweek"] = traffic_data["date_time"].dt.dayofweek
by_dayofweek = traffic_data.groupby("dayofweek").mean()
by_dayofweek["traffic_volume"]

# NB: 0 is Monday, 1 is Tuesday and so forth.

# Generate a line plot to vizualise the traffic volume over the days of the week.
plt.plot(by_dayofweek["traffic_volume"])
plt.title("Traffic Volume - Days of the Week")
plt.xticks([0,1,2,3,4,5,6],['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'],rotation=45)
plt.show()

From the above vizualisation, a significant decline was observed during the weekend on Saturdays and Sundays showing less traffic volume than on weekdays.

In [None]:
# Generate a line plot to capture traffic volume over the hours of each day.
# Considering the effects of the weekend on the total average values, it makes more sense to split the data to weekdays and weekends.
traffic_data["hour"] = traffic_data["date_time"].dt.hour
week_days = traffic_data.copy()[traffic_data["dayofweek"] <= 4] # 4 == Friday
week_end = traffic_data.copy()[traffic_data["dayofweek"] >= 5] # 5 == Saturday
by_hour_weekday = week_days.groupby("hour").mean()
by_hour_weekend = week_end.groupby("hour").mean()

print(by_hour_weekday["traffic_volume"])
print(by_hour_weekend["traffic_volume"])



In [None]:
# Plot line graphs on a grid chart to vizualise the behavior of traffic volume through the hours of the day.

#For the week day data.
plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.plot(by_hour_weekday["traffic_volume"])
plt.title("Traffic Volume - Weekdays")
plt.xlabel("Traffic volume")
plt.ylabel("Volume Count")
plt.ylim([0,7000])
plt.xlim([0,8])


#For the week end data.
plt.subplot(1,2,2)
plt.plot(by_hour_weekend["traffic_volume"])
plt.title("Traffic Volume - Weekend")
plt.xlabel("Traffic volume")
plt.ylim([0,7000])
plt.xlim([0,8])




The above vizualisation reflects the behavior of traffic volume over different hours in each day. Sectioned into weekdays and weekends, traffic volume was seen to be peak in the latter hours of the weekdays (i.e 6pm - 7pm) whilst over the weekend low traffic volume was recorded through the course of the day.

A high level summary can be deduced from the vizualisations on the monthly, weekly and hourly time scales.
1. The traffic is generally heavier during the Autumn - Summer months i.e March - October, compared to the Winter months i.e November - February.

2. The traffic is naturally heavier on weekdays than on weekends.

3. On weekdays (business days), peak traffic volume is reflected in the latter hours of the day i.e 4pm - 8pm.

# Weather Indicators

A viable contributor to heavy traffic is weather. The dataset provided contains metrics that point to weather behaviour i.e temperature, weather description etc.

Correlation values enable a better relationship of traffic volume/intensity and all the weather metrics povided by the dataset

In [None]:
traffic_data.corr()["traffic_volume"]



From the correlation values above, the temp weather metric correlates strongly to traffic volume on the I-94 highway.

A scatter plot with both metrics reflect the behavior in a visual and exploratory manner.

In [None]:
plt.figure(figsize=(6,4))
plt.scatter(traffic_data["traffic_volume"], traffic_data['temp'])
plt.title("Traffic Volume and Temperature")
plt.xlabel("Traffic Volume")
plt.ylabel("Temperature")
plt.ylim([230,320])
plt.xlim([-500,8000])

plt.show()

The above vizualisation does not reflect a reliable indicator of heavy traffic in spite of its leading correlation value amongst other metrics.

A viable route to consider to find useful data pointing out heavy traffic volume would be to look into categorical weather related columns: weather_main and weather_description.



In [None]:
by_weather_main = traffic_data.groupby('weather_main').mean()
by_weather_description = traffic_data.groupby('weather_description').mean()

In [None]:
by_weather_main['traffic_volume'].plot.barh()
plt.xlabel('Traffic Volume')
plt.show()

From the vizualisation above, it can be seen that weather markers of heavy traffic volume include Clouds and Haze. The max traffic volume exists within the 3000 - 4000 car range.



In [None]:
plt.figure(figsize=(7,10))
by_weather_description["traffic_volume"].plot.barh()
plt.xlabel("Traffic Volume")
plt.show()

According to the vizualisation above, traffic volume has exceeded 5000 in weather_description markers such as shower snow. Other notable markers include: sleet, proximity shower rain, light shower snow and freezing rain.

# Conclusion 

After extensive analysis on weather and time indicators, the following notes can be made:
Time indicators
i. The traffic is usually heavier during warm months (March–October) compared to cold months (November–February).
ii. The traffic is usually heavier on business days compared to the weekends.
iii. On business days, the rush hours are around 7 and 16.
Weather indicators
i. Shower snow
ii. Light rain and snow
Proximity thunderstorm with drizzle