In [1]:
#| default_exp app

In this competition, you'll forecast twelve-hours of traffic flow in a major U.S. metropolitan area. Time, space, and directional features give you the chance to model interactions across a network of roadways.

Files and Field Descriptions
----------------------------

*   train.csv - the training set, comprising measurements of traffic congestion across 65 roadways from April through September of 1991.

*   row\_id - a unique identifier for this instance
*   time - the 20-minute period in which each measurement was taken
*   x - the east-west midpoint coordinate of the roadway
*   y - the north-south midpoint coordinate of the roadway
*   direction - the direction of travel of the roadway. EB indicates "eastbound" travel, for example, while SW indicates a "southwest" direction of travel.
*   congestion - congestion levels for the roadway during each hour; the target. The congestion measurements have been normalized to the range 0 to 100.

*   test.csv - the test set; you will make hourly predictions for roadways identified by a coordinate location and a direction of travel on the day of 1991-09-30.
*   sample\_submission.csv - a sample submission file in the correct format

Source
------

This dataset was derived from the [Chicago Traffic Tracker - Historical Congestion Estimates](https://www.google.com/url?q=https://data.cityofchicago.org/Transportation/Chicago-Traffic-Tracker-Historical-Congestion-Esti/sxs8-h27x&sa=D&source=editors&ust=1679047966796262&usg=AOvVaw0Ubg-W85BHF1zlI4pfmhb6) dataset.

### Imports and Downloading Datasets

In [2]:
#| export
from fastai.tabular.all import *

from sklearn.ensemble import RandomForestRegressor

In [3]:
#| export
try: import fastkaggle
except ModuleNotFoundError:
    !pip install -Uq fastkaggle

from fastkaggle import *

In [4]:
#| export
comp = 'tabular-playground-series-mar-2022'
path = setup_comp(comp, install='fastai')

### Transform Data

In [5]:
#| export
train_df = pd.read_csv(path/"train.csv")
test_df = pd.read_csv(path/"test.csv")
sample_df = pd.read_csv(path/"sample_submission.csv")

Combine training and test sets

In [6]:
#| export
comb_df = pd.concat([train_df, test_df]).reset_index(drop=True)

Convert `time` to datetime format and split into separate `date` and `time_of_day` columns

In [7]:
#| export
comb_df['date'] = pd.to_datetime(comb_df.time)

In [8]:
#| export
comb_df['time_of_day'] = comb_df.date.dt.time
comb_df['date'] = comb_df.date.dt.date

In [9]:
#| export
comb_df['time_of_day'] = pd.to_timedelta(comb_df.time_of_day.astype(str))
comb_df['date'] = pd.to_datetime(comb_df.date)


Get indexes of training(~80%) and validation(~20%) sets

In [28]:
train_idxs = np.where(train_df.index <= (round(len(train_df) * .8)))
valid_idxs = np.where(train_df.index >= len(train_idxs[0]))

In [23]:
import nbdev
nbdev.export.nb_export('traffic-flow.ipynb', 'app')
print("export successful")

export successful
