# **Demo: Ridge Regression for predicting taxi demand**

(YellowCab, NY)

Data: [https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page)


## **Contents**

- [About](#About)
- [1. Plotting the dataset](#1.-Plotting-the-dataset)
- [2. Displaying original data as timeseries](#2.-Displaying-original-data-as-timeseries)
- [3. Real demand vs. forecast](#3.-Real-demand-vs.-forecast)

## **About**

This notebook contains demonstration of the model predictions for the June, 2016. The model is a ridge regression with the following feature space:
- periodic functions (sines and cosines) expected to capture hourly details within a week ($T = 2 \pi/168$, $168 = 24 \cdot 7$);
- one-hot encoded seasonality feature (three 0-1 features, encoding seasons as $000$, $100$, $010$ and $001$);
- their pairwise interactions followed by feature selection by thresholding the coefficients);
- additional features as average ride distance per hour, average number of passengers per hour, average ride duration, average cost;
    - separate for every of $102$ regions;
    - logarithmically scaled, $x \to \log(1+x)$

$\textbf{\normalsize Note :}\quad$ At first, you should run the following two cells to load the modeling results and precompute he output.

In [1]:
# letting the interpreter know where to look for custom modules
import sys
sys.path.append("./src")

# importing custom modules
import backend # for processing the results
import interaction # for plotting the widgets

In [2]:
# extracting results
results: backend.ResultsDict = backend.init()

[*$\leftarrow$ back to Contents*](#Contents)

## 1. Plotting the dataset

- **The "Datetime" slider** allows one to select a date and hour and plot the corrseponding demand.

In [3]:
interaction.plot_dataset(results);

interactive(children=(SelectionSlider(description='Datetime: ', layout=Layout(width='800px'), options={Timesta…

[*$\leftarrow$ back to Contents*](#Contents)

## **2. Displaying original data as timeseries**

- **Region index**: select one of $102$ areas to plot the corresponding timeseries

In [4]:
interaction.plot_timeseries(results);

interactive(children=(Dropdown(description='Region index:', options={np.int64(1075): 0, np.int64(1076): 1, np.…

[*$\leftarrow$ back to Contents*](#Contents)

## **3. Real demand vs. forecast**

- **Region index**: select one of $102$ zones
- **Time lag**: select a lag (delay) between observations and predictions (i.e., how many hours ahead should the demand be predicted)
- **Show real data**: whether display original data or not
- **Show predicted**: whether show forecast or not


In [5]:
interaction.plot_paired_timeseries(results);

interactive(children=(Dropdown(description='Region index:', options={np.int64(1075): 0, np.int64(1076): 1, np.…

[*$\leftarrow$ back to Contents*](#Contents)