#### What are you trying to do in this notebook?
In this competition, we'll forecast twelve-hours of traffic flow in a major U.S. metropolitan area. Time, space, and directional features give you the chance to model interactions across a network of roadways.
- Provide animations for time space congestion visualisations.
- Animate the congestion change during time for all the locations and roadways.
- Data repartition for each roadway.
- Data flow over time.
- Animate histograms of the roadways.

#### Why are you trying it?
- Predict test congestions using an ensemble of gradient boosted trees.
- Replace the predictions to the nearest integer, etc.

**row_id**: a unique identifier for this instance

**time**: the 20-minute period in which each measurement was taken

**x**: the east-west midpoint coordinate of the roadway

**y**: the north-south midpoint coordinate of the roadway

**direction**: the direction of travel of the roadway.
EB indicates "eastbound" travel
SW indicates a "southwest" direction of travel.

**congestion**: congestion levels for the roadway during each hour; the target. The congestion measurements have been normalized to the range 0 to 100.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
!pip install -U lightautoml

In [None]:
import os
import time

In [None]:
import numpy as np
import pandas as pd
from sklearn.metrics import log_loss, accuracy_score
from sklearn.model_selection import train_test_split
import torch

In [None]:
from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task
from lightautoml.report.report_deco import ReportDeco

In [None]:
N_THREADS = 4 
RANDOM_STATE = 42
TIMEOUT = 5 * 3600
TARGET_NAME = 'congestion'

In [None]:
np.random.seed(RANDOM_STATE)
torch.set_num_threads(N_THREADS)

In [None]:
INPUT_DIR = '../input/tabular-playground-series-mar-2022/'

In [None]:
train_data = pd.read_csv(INPUT_DIR + 'train.csv', dtype={'time': str})
print(train_data.shape)
train_data.head()

In [None]:
test_data = pd.read_csv(INPUT_DIR + 'test.csv', dtype={'time': str})
print(test_data.shape)
test_data.head()

In [None]:
submission = pd.read_csv(INPUT_DIR + 'sample_submission.csv')
print(submission.shape)
submission.head()

In [None]:
dir_mapper = {'EB': [1,0], 
              'NB': [0,1], 
              'SB': [0,-1], 
              'WB': [-1,0], 
              'NE': [1,1], 
              'SW': [-1,-1], 
              'NW': [-1,1], 
              'SE': [1,-1]}


def feature_engineering(data):
    data['time'] = pd.to_datetime(data['time'])
    data['month'] = data['time'].dt.month
    data['weekday'] = data['time'].dt.weekday
    data['hour'] = data['time'].dt.hour
    data['minute'] = data['time'].dt.minute
    data['converted_direction_coord_0'] = data['direction'].map(lambda x: dir_mapper[x][0])
    data['converted_direction_coord_1'] = data['direction'].map(lambda x: dir_mapper[x][1])
    data['is_month_start'] = data['time'].dt.is_month_start.astype('int')
    data['is_month_end'] = data['time'].dt.is_month_end.astype('int')
    data['hour+minute'] = data['time'].dt.hour * 60 + data['time'].dt.minute
    data['is_weekend'] = (data['time'].dt.dayofweek > 4).astype('int')
    data['is_afternoon'] = (data['time'].dt.hour > 12).astype('int')
    data['x+y'] = data['x'].astype('str') + data['y'].astype('str')
    data['x+y+direction'] = data['x'].astype('str') + data['y'].astype('str') + data['direction'].astype('str')
    data['x+y+direction0'] = data['x'].astype('str') + data['y'].astype('str') + data['converted_direction_coord_0'].astype('str')
    data['x+y+direction1'] = data['x'].astype('str') + data['y'].astype('str') + data['converted_direction_coord_1'].astype('str')
    data['hour+direction'] = data['hour'].astype('str') + data['direction'].astype('str')
    data['hour+x+y'] = data['hour'].astype('str') + data['x'].astype('str') + data['y'].astype('str')
    data['hour+direction+x'] = data['hour'].astype('str') + data['direction'].astype('str') + data['x'].astype('str')
    data['hour+direction+y'] = data['hour'].astype('str') + data['direction'].astype('str') + data['y'].astype('str')
    data['hour+direction+x+y'] = data['hour'].astype('str') + data['direction'].astype('str') + data['x'].astype('str') + data['y'].astype('str')
    data['hour+x'] = data['hour'].astype('str') + data['x'].astype('str')
    data['hour+y'] = data['hour'].astype('str') + data['y'].astype('str')

In [None]:
for data in [train_data, test_data]:
    feature_engineering(data)

In [None]:
train_data.head()

In [None]:
task = Task('reg', metric='mae', loss='mae')

In [None]:
roles = {'target': TARGET_NAME,
         'drop': ['row_id']
         }

In [None]:
automl = TabularAutoML(task = task,
                       timeout = TIMEOUT,
                       cpu_limit = N_THREADS,
                       reader_params = {'n_jobs': N_THREADS, 'random_state': RANDOM_STATE},
                       general_params = {'use_algos': [['lgb']]}
                      )

In [None]:
oof_pred = automl.fit_predict(train_data, roles = roles, verbose=1)
print('oof_pred:\n{}\nShape = {}'.format(oof_pred, oof_pred.shape))

In [None]:
fast_fi = automl.get_feature_scores('fast')
fast_fi.set_index('Feature')['Importance'].plot.bar(figsize=(20, 10), grid=True)

#### Did it work?
This notebook aims to provide animations for time-space congestion visualizations. The idea is to animate the congestion change during time for all the 12 locations and 65 roadways. For a detail EDA, please visit the notebook.

Most top solutions to the March TPS competition follow the same three-step pattern:

- Predict test congestions using an ensemble of gradient-boosted trees
- Replace some predictions by so-called "special values" (EDA introducing the special values)
- Round the predictions to the nearest integer (Why rounding improves the score)

In this notebook, we generalize step 2: Rather than replacing some predictions by special values (which are medians of the training data), we clip all predictions to some quantiles of the training data.

#### What did you not understand about this process?
Well, everything provides in the competition data page. I've no problem while working on it. If you guys don't understand the thing that I'll do in this notebook then please comment on this notebook.

#### What else do you think you can try as part of this approach?
Forecast twelve-hours of traffic flow in a U.S. metropolis. The time series in this dataset are labelled with both location coordinates and a direction of travel -- a combination of features that will test our skill at spatio-temporal forecasting within a highly dynamic traffic network.