# Solving the March TPS Without Machine Learning

We don't need machine learning to predict congestions for September 30. It suffices to determine the median congestion for every place and time of day and submit these medians. (See [this discussion post](https://www.kaggle.com/c/tabular-playground-series-mar-2022/discussion/310642) for a more thorough explanation.)

To be precise: September 30, 1991 was a Monday. We calculate the median over all working days (Monday - Friday). The [EDA](https://www.kaggle.com/ambrosm/tpsmar22-eda-which-makes-sense) has shown that Saturdays and Sundays have much less traffic and don't help predict the Monday.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator, PercentFormatter
from cycler import cycler
from IPython import display

oldcycler = plt.rcParams['axes.prop_cycle']
plt.rcParams['axes.facecolor'] = '#0057b8' # blue
plt.rcParams['axes.prop_cycle'] = cycler(color=['#ffd700'] +
                                         oldcycler.by_key()['color'][1:])

In [None]:
# Read the data
train = pd.read_csv('../input/tabular-playground-series-mar-2022/train.csv', index_col='row_id', parse_dates=['time'])
test = pd.read_csv('../input/tabular-playground-series-mar-2022/test.csv', index_col='row_id', parse_dates=['time'])

In [None]:
# Feature Engineering
for df in [train, test]:
    df['workday'] = df.time.dt.weekday < 5 # Monday - Friday
    df['hour'] = df.time.dt.hour
    df['minute'] = df.time.dt.minute
    

In [None]:
# Compute the median congestion for every place and time
medians = train.groupby(['x', 'y', 'direction', 'workday', 'hour', 'minute']).congestion.median().astype(int)
medians

In [None]:
# Write the submission file
sub = test.merge(medians, 
                 left_on=['x', 'y', 'direction', 'workday', 'hour', 'minute'],
                 right_index=True)[['congestion']]
sub.reset_index(inplace=True)
assert len(sub) == len(test)
sub.to_csv('submission_no_machine_learning.csv', index=False)
sub

In [None]:
# Plot the distribution of the test predictions
# compared to the other Monday afternoons
plt.figure(figsize=(16,3))
plt.hist(train.congestion[((train.time.dt.weekday == 0) &
                           (train.time.dt.hour >= 12)).values],
         bins=np.linspace(-0.5, 100.5, 102),
         density=True, label='Train',
         color='#ffd700')
plt.hist(sub['congestion'], np.linspace(-0.5, 100.5, 102),
         density=True, rwidth=0.5, label='Test predictions',
         color='r')
plt.xlabel('Congestion')
plt.ylabel('Frequency')
plt.title('Congestion on Monday afternoons')
plt.gca().yaxis.set_major_formatter(PercentFormatter(xmax=1, decimals=1))
plt.legend()
plt.show()