# 03 | Transform Time Series into Features and Targets #

In this notebook, we will put together pieces of code that we implemented in the previous notebooks. We won't copy and paste the previous codes. For maintenance purposes, we will extract the functions from Python packages that we will create in our `src` directory.

In [11]:
# Always reload our imports to make sure they reflect changes.
%reload_ext autoreload
%autoreload 2

In [12]:
from src.data import load_raw_data

We can now download available data for 2022 using our function.

In [13]:
rides = load_raw_data(year=2022)
rides

Downloading file 2022-01
Downloading file 2022-02
Downloading file 2022-03
Downloading file 2022-04
Downloading file 2022-05
Downloading file 2022-06
Downloading file 2022-07
Downloading file 2022-08
Downloading file 2022-09
Downloading file 2022-10
Downloading file 2022-11
Downloading file 2022-12


Unnamed: 0,pickup_datetime,pickup_location_id
0,2022-01-01 00:35:40,142
1,2022-01-01 00:33:43,236
2,2022-01-01 00:53:21,166
3,2022-01-01 00:25:21,114
4,2022-01-01 00:36:48,68
...,...,...
3399544,2022-12-31 23:46:00,16
3399545,2022-12-31 23:13:24,75
3399546,2022-12-31 23:00:49,168
3399547,2022-12-31 23:02:50,238


## Transform data into time series data ##
We can also import our premade functions. We'll use `add_missing_slots` and `transform_ts_data_into_ts_data`

In [16]:
from src.data import  transform_raw_data_into_ts_data

### Transform into time series data ###

In [17]:
ts_data = transform_raw_data_into_ts_data(rides)
ts_data

100%|██████████| 265/265 [00:06<00:00, 43.65it/s]


Unnamed: 0,pickup_hour,rides,pickup_location_id
0,2022-01-01 00:00:00,0,1
1,2022-01-01 01:00:00,0,1
2,2022-01-01 02:00:00,0,1
3,2022-01-01 03:00:00,0,1
4,2022-01-01 04:00:00,1,1
...,...,...,...
2321395,2022-12-31 19:00:00,2,265
2321396,2022-12-31 20:00:00,2,265
2321397,2022-12-31 21:00:00,7,265
2321398,2022-12-31 22:00:00,3,265


In [18]:
from src.data import transform_ts_data_into_features_and_target

features, targets = transform_ts_data_into_features_and_target(
    ts_data,
    input_seq_len=24*28*1, # one month
    step_size=24,
)

print(f'{features.shape=}')
print(f'{targets.shape=}')

100%|██████████| 265/265 [00:38<00:00,  6.94it/s]

features.shape=(89305, 674)
targets.shape=(89305,)





We can now save our output in a new tabular data. We create a new `tabular_data` with our `features`. Then, we add a new column, `target_rides_next_hour`, which shall consist of our `targets`. Following, we can import our paths and save our outpu in the appropriate directory.

In [19]:
tabular_data  = features
tabular_data["target_rides_next_hour"] = targets

from src.paths import TRANSFORMED_DATA_DIR

tabular_data.to_parquet(TRANSFORMED_DATA_DIR / "tabular_data.parquet")

After this step, we are done with the data transformation stage. We can move on to the following stage, which is building models (yay!)!