## Simple First Exploratory Analysis for traffic jam

As rule of thumb I usually: *Make any model/submission as fast as I can. Any model!*

Then, I started to make (first) predictions using constants or copying data from previous data points. After that, we can go further modeling.

In this notebook I will go through the data in order to find some useful information. 

Thanks [AMBROSM](https://www.kaggle.com/ambrosm/tpsmar22-eda-which-makes-sense) for the first insights.

#### **If you liked it, please upvote. It helps me a lot!!**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
from cycler import cycler
from IPython import display
import datetime

In [None]:
PATH ='../input/tabular-playground-series-mar-2022/'

train = pd.read_csv(PATH + 'train.csv', index_col = 'row_id', parse_dates=['time'])
test = pd.read_csv(PATH + 'test.csv', index_col = 'row_id', parse_dates=['time'])

# Let's keep those line below. Just in case we need split day and time in the future.
train['day'] = train.time.dt.date
train['time_day'] = train.time.dt.time

test['day'] = test.time.dt.date
test['time_day'] = test.time.dt.time

test.head()

### Information about the roadways

From the tables below we can see that there are 12 unique combinations of (x,y), which means all the possibilities (3x4).

Moreover, there are 65 roadways. Except for the (0,0,WB), it is possible to go and return in the same road.

In [None]:
roadways = train[['x','y']].drop_duplicates()
print('Unique combinations of x,y: ')
display.display(roadways.T)

road_dir = train[['x','y','direction']].drop_duplicates()
print('Unique combinations of x,y AND direction: ')
display.display(road_dir.T)

In [None]:
display.display(road_dir.groupby(['x','y']).count().T)

road_dir_test = test[['x','y','direction']].drop_duplicates()
display.display(road_dir_test.groupby(['x','y']).count().T)

## See the paths

Graphically, it is clearer the directions of the roadways.

In [None]:
dir_dict = {'EB': (1, 0), 'NB': (0, 1), 'SB': (0, -1), 'WB': (-1, 0), 
            'NE': (1, 1), 'SE': (1, -1), 'NW': (-1, 1), 'SW': (-1, -1)}

plt.figure(figsize=(10,10))
plt.scatter(roadways.x, roadways.y)
plt.gca().set_aspect('equal')
for _, x, y, d in road_dir.itertuples():
    dx, dy = dir_dict[d]
    dx, dy = dx/4, dy/4
    plt.plot([x, x+dx], [y, y+dy], 'b')
plt.gca().xaxis.set_major_locator(MaxNLocator(integer=True)) # only integer labels
plt.gca().yaxis.set_major_locator(MaxNLocator(integer=True)) # only integer labels
plt.xlabel('x')
plt.ylabel('y')
plt.show()


## TEST dataset

Before go further in the Exploratoty Data Analysis (EDA), it is a good practice to have a look in the test.

If we know what we need to predict, it gives us a good insight.

#### **Remember**, this is used just for the public leaderboard.

In [None]:
print("Dataset - TEST" )
print("Days in the test: ", test.time.dt.date.unique() )
print("Day of the week: ", test.time.dt.dayofweek.unique(), test.time.dt.day_name().unique())
print("Period: FROM ", test.time.dt.time.min(), ' TO ', test.time.dt.time.max())

## Roadway conditons

Let's see if there is a pattern during the day in a roadway.

Considering (x,y) = (2,2) in the previous Monday.

In [None]:
x,y = 2,2

day = pd.Timestamp(1991,9,23)

road1 = train[(train.x==x) & (train.y==y) & (train.day==day.date()) & (train.direction=='EB') ]
road2 = train[(train.x==x) & (train.y==y) & (train.day==day.date()) & (train.direction=='WB') ]
# road1 = train[ (train.x==x) & (train.y==y) & (train.direction==d1) & (train.day ==) ]
# road2 = train[ (train.x==x) & (train.y==y) & (train.direction==d2) ]

fig, ax = plt.subplots(figsize=(20,7))
ax.plot(road1.time, road1.congestion );
ax.plot(road2.time, road2.congestion );
ax.legend(['EB','WB'] );

What if we check all the 12 points in opposite directions, in the previous Monday

In [None]:
day = pd.Timestamp(1991,9,23)  # <- Previous Monday
fig, ax = plt.subplots( 4,3, figsize=(20,10), sharey=True, sharex=True )

for x in range(3):
    for y in range(4):
        d1, d2 = 'EB','WB'

        road1 = train[(train.x==x) & (train.y==y) & (train.day==day.date()) & (train.direction==d1) ]
        road2 = train[(train.x==x) & (train.y==y) & (train.day==day.date()) & (train.direction==d2) ]
        
        ax[y,x].plot(road1.time, road1.congestion );
        ax[y,x].plot(road2.time, road2.congestion );
        ax[y,x].legend(['EB','WB'] );
        ax[y,x].set_title([x,y])
        
fig.suptitle('East x West bound in previous monday');

In [None]:
day = pd.Timestamp(1991,9,23)  # <- Previous Monday
fig, ax = plt.subplots(4,3,figsize=(20,10), sharey=True, sharex=True )

for x in range(3):
    for y in range(4):
        d1, d2 = 'NB','SB'

        road1 = train[(train.x==x) & (train.y==y) & (train.day==day.date()) & (train.direction==d1) ]
        road2 = train[(train.x==x) & (train.y==y) & (train.day==day.date()) & (train.direction==d2) ]
        
        ax[y,x].plot(road1.time, road1.congestion );
        ax[y,x].plot(road2.time, road2.congestion );
        ax[y,x].legend(['NB','SB'] );
        ax[y,x].set_title([x,y]);
fig.suptitle('North x South bound in previous monday');

Sincerely, I was expecting something like a morning traffic Eastbound direction and an afternoon Westbound traffic.

However, we can see above there is not opposite behavior during the previous Monday. Then, it does not seem to be "pendular movements" in the same road in opposite directions morning and afternoon.

## Submission file

The preparation os a submission file requires attention to be accepted. Then, I usually make a simple predicition to check if the file is correctly prepared.

In this case, all congesstions are considered as 50. ( This gives 13.829 in Public Leaderboard)

In [None]:
# # First submission based on congestion = 50
# submission = pd.DataFrame(test.row_id)
# submission['congestion']  = 50
# submission.set_index('row_id', inplace = True)
# submission.to_csv('submission.csv')

Another possible first approach is to use the previous Monday data to check the test dataset. 
( This gives 6.829 in Public Leaderboard)

In [None]:
# Second submission using previous Monday (Sept, 23,1991) congestion data.

day = pd.Timestamp(1991,9,23)  # <- Previous Monday

# Get the traffic from this specifc day
traffic = train[(train.day==day.date())]

# Select the columns to use for merging
columns_merge = ['x', 'y', 'direction', 'time_day'] 

# Reload the test as a fresh DataFrame
test = pd.read_csv(PATH + 'test.csv', parse_dates=['time'])  # Don't use index_col='row_id'
test['time_day'] = test.time.dt.time

final = pd.merge(test,traffic, on =columns_merge, how='left')
# display.display(final)

submission = final[['row_id', 'congestion']]
submission.set_index('row_id', inplace= True)
display.display(submission)

submission.to_csv('submission_previous_monday.csv')
