<h1 id="tfghcccitle" style="color:white;background:#0087B6;padding:8px;border-radius:8px"> TPS-MAR2022 | Forecasting twelve-hours of traffic flow</h1>


<center><img src="https://accoladetechnology.com/wp-content/uploads/2018/01/iStock-614703824.jpg"></center>

## Goal
we will forecast twelve-hours of traffic flow in a major U.S. metropolitan area. Time, space, and directional features give us the chance to model interactions across a network of roadways.

## Data

**train.csv** - the training set, comprising measurements of traffic congestion across 65 roadways from April through September of 1991.
* row_id - a unique identifier for this instance
* time - the 20-minute period in which each measurement was taken
* x - the east-west midpoint coordinate of the roadway
* y - the north-south midpoint coordinate of the roadway
* direction - the direction of travel of the roadway. EB indicates "eastbound" travel, for example, while SW indicates a "southwest" direction of travel.
* congestion - congestion levels for the roadway during each hour; the target. The congestion measurements have been normalized to the range 0 to 100.

**test.csv** - the test set; you will make hourly predictions for roadways identified by a coordinate location and a direction of travel on the day of 1991-09-30.

**sample_submission.csv** - a sample submission file in the correct format

<h1 id="eetiteele" style="color:white;background:#0087B6;padding:8px;border-radius:8px"> Libraries </h1>

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn.metrics import mean_absolute_error
sns.set()

<h1 id="titssjjkvle" style="color:white;background:#0087B6;padding:8px;border-radius:8px"> Load data </h1>
Remember pandas can parse date columns when reading datasets

In [None]:
path='../input/tabular-playground-series-mar-2022'

train=pd.read_csv(path+'/train.csv', parse_dates=['time'])
test=pd.read_csv(path+'/test.csv', parse_dates=['time'])
sample=pd.read_csv(path+'/sample_submission.csv')

<h1 id="tikhgdstle" style="color:white;background:#0087B6;padding:8px;border-radius:8px"> Understanding directions and roadways </h1>

## Quick Look at Data

In [None]:
train

## Unique directions

In [None]:
list(train.direction.unique())

## Let's see how many directions are there for each location


In [None]:
#Get directions for each location
dirs_per_loc=train.groupby(['x','y']).direction.unique().reset_index()
dirs_per_loc

We know that the X and Y represent the central location and each location may have 3 ~ 8 directions. 

## The mean congestion for each roadway
x, y, direction -> Roadway

In [None]:
#Group by roadway
mean_con_per_road=train.groupby(['x','y','direction']).congestion.mean().reset_index()
mean_con_per_road

<div class="alert alert-info"><h3>  As you can see there are 65 roadways </h3>
</div>

## Let's convert directions to coordinates according to the following tip

<center><img src="https://s22.picofile.com/file/8448096142/2022_03_06_17_25_211.jpg"></center>

### The function below returns coordinates for each direction

In [None]:
NO_KEY = object()
def get_dir_num(dir_str):
    dir_nums = {'EB':[1,0],
            'NB':[0,1],
            'SB':[0,-1],
            'WB':[-1,0],
            'NE':[1,1],
            'SW':[-1,-1],
            'NW':[-1,1], 
            'SE':[1,-1]}
    return dir_nums.get(dir_str, NO_KEY)

## A simple way to visualize roadways and their Avg congestion
By itterating over the rows of the dataframe 'mean_con_per_road' that we created earlier we can plot Roadways using plt.arrow(), we can use the mean congestion as alpha to show intensity of congestion on roadways


In [None]:
plt.figure(figsize=(10,10))

#Plot directions for each location 
for index,row in mean_con_per_road.iterrows():
    plt.arrow(row.x,row.y,  # x,y
              get_dir_num(row.direction)[0]/4, #keep the arrows short by dividing
              get_dir_num(row.direction)[1]/4, 
              head_width = 0.02,
              linewidth= 5, 
              color='red',
              alpha=row.congestion/100 #Use mean congestion as alpha
             )
plt.xlabel('X'),plt.ylabel('Y')
plt.title('Roadways & Avg. Congestion intensity',size=15)
plt.show()

<h1 id="tikkgggtle" style="color:white;background:#0087B6;padding:8px;border-radius:8px"> Distribution of  congestion </h1>

In [None]:
plt.figure(figsize=(12,7))
sns.histplot(x=train.congestion).set_title('Distribution of  congestion',size=15)
plt.axvline(x=train.congestion.mean(),c='red',ls=':',label='Mean')
plt.axvline(x=train.congestion.median(),c='green',ls=':',label='Median')

plt.legend()
plt.show()

<h1 id="sdffajk" style="color:white;background:#0087B6;padding:8px;border-radius:8px"> Roadways with Avg congestion greater than 50 </h1>

In [None]:
tmp=mean_con_per_road[mean_con_per_road.congestion>50].sort_values(by='congestion',ascending=False)
tmp.head(10)

In [None]:
#Combine x,y
tmp['Location']='x'+ tmp.x.astype(str) + 'y'+tmp.y.astype(str)
#Plot
fig = px.bar(tmp, y='congestion',
                  x='Location',
                  color='direction',
                  barmode='group',
                  labels={'Location':'Location','congestion': 'Avg. congestion'},
                  height=400,
                  title="Roadways with Avg congestion greater than 50"
                   )
fig.show()

<div class="alert alert-info"><h3> x(2),y(0),WB is the most congested roadway </h3>

</div>



<h1 id="wekjds" style="color:white;background:#0087B6;padding:8px;border-radius:8px"> Roadways with Avg congestion less than 50 </h1>

In [None]:
tmp=mean_con_per_road[mean_con_per_road.congestion<50].sort_values(by='congestion',ascending=True)
tmp.head(10)

In [None]:
#Combine x,y
tmp['Location']='x'+ tmp.x.astype(str) + 'y'+tmp.y.astype(str)
#Plot
fig = px.bar(tmp, y='congestion',
                  x='Location',
                  color='direction',
                  barmode='group',
                  labels={'Location':'Location','congestion': 'Avg. congestion'},
                  height=400,
                  title="Roadways with Avg congestion less than 50"
                   )
fig.show()

<div class="alert alert-info"><h3> x2,y3,SW is the least congested roadway </h3>
</div>


<h1 id="werhd" style="color:white;background:#0087B6;padding:8px;border-radius:8px"> Working with datetime feature </h1>

### Separating week, day, hour and minute

In [None]:
def add_datetime_features(df):
    df['Month']=df['time'].dt.month
    df['week']=df['time'].dt.isocalendar().week.astype(int)
    df['day']=df['time'].dt.day
    df['weekday']=df['time'].dt.weekday
    df['hour']=df['time'].dt.hour
    df['minute']=df['time'].dt.minute

#Apply
add_datetime_features(train)
add_datetime_features(test)

#Drop time
train.drop('time',axis=1,inplace=True)
test.drop('time',axis=1,inplace=True)

## Daily Avg congestion for each month

In [None]:
tmp=train.groupby(['day','Month']).congestion.mean().reset_index()
tmp.Month=tmp.Month.map({4:"April", 5:"May", 6:"June", 7:"July",8:'August',9:'September'})
fig = px.line(tmp,x='day',y='congestion',facet_row='Month',
              title='Daily congestion per month',markers=True,height=800)           

fig.show()

## Weekly Average congestion for each location

In [None]:
tmp=train.groupby(['x','y','week']).congestion.mean().reset_index()
tmp

In [None]:
#Combine the columns x,y 
tmp['Location']='x'+ tmp.x.astype(str) + 'y'+tmp.y.astype(str)
mean=tmp.congestion.mean()
fig = px.line(tmp, y='congestion',
                    x="week",
                    color='Location',
                    labels={'y':'Location (x,y)'},
                    height=500,
                   title='Avg congestion for each location per week'
                   )

fig.add_hline(y=mean,line_dash="dot",annotation_text="Mean")

fig.show()

## Daily Avg congestion for each location

In [None]:
tmp=train.groupby(['x','y','weekday']).congestion.mean().reset_index()
tmp

In [None]:
#Combine the columns x,y 
tmp['Location']='x'+ tmp.x.astype(str) + 'y'+tmp.y.astype(str)
mean=tmp.congestion.mean()

fig = px.histogram(tmp, y='congestion',
                    x="weekday",
                    color='Location',
                    barmode='group',
                    histfunc='avg',
                    labels={'y':'Location (x,y)'},
                    height=450,
                   title='Avg congestion for each location per day'
                   )
fig.add_hline(y=mean,line_dash="dot",annotation_text="Mean")

fig.update_layout(
    xaxis = dict(
        tickmode = 'array',
        tickvals = [0,1, 2, 3, 4, 5, 6,],
        ticktext = ["Mon", "Tue", "Wed", "Thu",'Fri','Sat','Sun']
    ))              
fig.update_yaxes(range = [18,65])

fig.show()

## Hourly Average congestion for each location 

In [None]:
tmp=train.groupby(['x','y','hour']).congestion.mean().reset_index()
tmp

In [None]:
#Combine the columns x,y 
tmp['Location']='x'+ tmp.x.astype(str) + 'y'+tmp.y.astype(str)
mean=tmp.congestion.mean()

fig = px.histogram(tmp, y='congestion',
                    x="hour",
                    color='Location',
                   barmode='group',
                    histfunc='avg',
                    labels={'y':'Location (x,y)'}, height=450,
                   title='Avg congestion for each location per hour')
fig.add_hline(y=mean,line_dash="dot",annotation_text="Mean")
fig.update_yaxes(range = [18,70])

fig.show()

## Avg congestion on weekdays

In [None]:
tmp=train.groupby(['hour','weekday']).congestion.mean().reset_index()
tmp

In [None]:
tmp=train.groupby(['hour','weekday']).congestion.mean().reset_index()
tmp.weekday=tmp.weekday.map({0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3:'Thursday',
                     4:'Friday', 5:'Saturday', 6:'Sunday'
                                  })
fig = px.line(tmp,x='hour',y='congestion',color='weekday',
              title='Average congestion during the day',markers=True)

fig.show()

<div class="alert alert-info">
   <h3> The congestion is lower at weekends and it has a different trend.</h3>
    <h3>
    Similar trends of congestion can be seen on other days.</h3>
    
    
</div>

<h1 id="sdfsdf" style="color:white;background:#0087B6;padding:8px;border-radius:8px">  Forecasting traffic flow</h1> 
Let's see for what time we are asked to predict the congestion

In [None]:
test

#### As you can see we should predict congestion for Monday, september 30 from 12:00 to 23:40

## Here are some ways to do this
### Previous Monday-afternoon 

In [None]:
previous_monday_congestion=train[(train.weekday==0) &
                                 (train.hour>=12) &
                                 (train.week==39)].congestion.to_list()

### The mean congestion of all Mondays 

In [None]:
#The mean of congestion of all Mondays afternoon
All_Mondays_Congestion=pd.DataFrame()
for w in range(14,39):
    All_Mondays_Congestion['week_'+str(w)]=train[(train.weekday==0) &
                                                 (train.hour>=12) &
                                                 (train.week==w)].congestion.to_list()
#Mean congestion
All_Mondays_Congestion['mean']=All_Mondays_Congestion.mean(axis=1)

### The median congestion of all Mondays 

In [None]:
All_Mondays_Congestion['median']=All_Mondays_Congestion.drop('mean',axis=1).median(axis=1)

## Plot predictions

In [None]:
plt.figure(figsize=(13,6))

mask=(train.weekday==0) &( train.week==40)
#Plot congestion until 11:40
sns.lineplot(x=train[mask].hour,
             y=train[mask].congestion,err_style='bars',ci=None)

#Plot predictions for rest of the day
mask=(test.weekday==0) &( test.week==40)

sns.lineplot(x=test[mask].hour,
             y=All_Mondays_Congestion['median'],
             err_style='bars',label='Mondays_Median',linestyle='--',ci=None)

sns.lineplot(x=test[mask].hour,
             y=All_Mondays_Congestion['mean'],
             err_style='bars',label='Mondays_Mean',linestyle='--',ci=None)

sns.lineplot(x=test[mask].hour,
             y=previous_monday_congestion,
             err_style='bars',label='Previous Monday',linestyle='--',ci=None)

plt.legend()
plt.title(' Predictions for 12:00 ~ 23:40, Monday, September 30',size=15)

## Scores after submission

In [None]:
method=['Previous Monday','All_Mondays_Mean','All_Mondays_Median']
mae=[6.829,5.015,4.991]
fig=px.bar(x=method,y=mae,text_auto='',
           labels={'x':'Method','y':'Score ( MAE )'},
          title='Scores taken on test data')
fig.show()


### Submission

In [None]:
sample['congestion']=All_Mondays_Congestion['median'].round()
sample.to_csv('submission.csv',index=False)

In [None]:
sample