# Stream Regression
---

## `BikeSharing` dataset

**Description:** This dataset contains the hourly count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information. The task is to predict the number of bikes that will be rented.

**Features:** 10
 
|       Attribute      | Description |
|:--------------------:|:-----------------------------|
| `season`             | Season (1:winter, 2:spring, 3:summer, 4:fall)
| `yr`                 | Year (0: 2011, 1:2012)
| `mnth`               | Month (1 to 12)
| `hr`                 | Hour (0 to 23)
| `workingday`         | If day is neither weekend nor holiday is 1, otherwise is 0
| `weathersit`         | 1) Clear, Few clouds, Partly cloudy, Partly cloudy; 2)Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3) Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4) Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
| `temp`               | Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39
| `atemp`              | Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50
| `hum`                | Normalized humidity. The values are divided to 100 (max)
| `windspeed`          | Normalized wind speed. The values are divided to 67 (max)


**Target:** `cnt` | Count of total rental bikes
 
**Samples:** 17,379


In [1]:
import pandas as pd
from river.stream import iter_pandas
from river.metrics import Metrics,MAE,MSE,RMSE
from river.evaluate import progressive_val_score
from river.preprocessing import StandardScaler
from river.drift import ADWIN

In [2]:
data = pd.read_csv("../datasets/BikeSharing.csv")
features = data.columns[:-1]
numerical_features = ['mnth','hr','weathersit','temp','atemp','hum','windspeed']

## Linear Regression

---
[Linear Regression](https://riverml.xyz/0.10.1/api/linear-model/LinearRegression/) is a simple Linear Regression model.
To be noticed that a Linear Regression model is not able to deal with **categorical** features, so we have to use only the **numerical** ones

In [3]:
from river.linear_model import LinearRegression

model = (StandardScaler() |
        LinearRegression(intercept_lr=.1))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[numerical_features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 702.412945, MSE: 46,184,160.882792, RMSE: 6,795.892942
[2,000] MAE: 383.854206, MSE: 23,096,328.899143, RMSE: 4,805.864012
[3,000] MAE: 277.286153, MSE: 15,400,176.912274, RMSE: 3,924.305915
[4,000] MAE: 228.175123, MSE: 11,553,093.91782, RMSE: 3,398.984248
[5,000] MAE: 198.79005, MSE: 9,244,948.59481, RMSE: 3,040.550706
[6,000] MAE: 179.532748, MSE: 7,706,202.294215, RMSE: 2,776.004736
[7,000] MAE: 165.79017, MSE: 6,607,121.644385, RMSE: 2,570.43219
[8,000] MAE: 154.355256, MSE: 5,782,527.79314, RMSE: 2,404.68871
[9,000] MAE: 144.412695, MSE: 5,140,926.032258, RMSE: 2,267.361028
[10,000] MAE: 138.396775, MSE: 4,628,152.295476, RMSE: 2,151.314086
[11,000] MAE: 137.222862, MSE: 4,209,863.137087, RMSE: 2,051.795101
[12,000] MAE: 136.258552, MSE: 3,861,365.169476, RMSE: 1,965.035666
[13,000] MAE: 135.525125, MSE: 3,566,563.510415, RMSE: 1,888.534752
[14,000] MAE: 134.804133, MSE: 3,313,879.447976, RMSE: 1,820.406396
[15,000] MAE: 134.999736, MSE: 3,095,206.450546, RMSE: 1,759

MAE: 132.359553, MSE: 2,675,088.190485, RMSE: 1,635.569684

## KNNRegressor
---

[KNNRegressor](https://riverml.xyz/0.10.1/api/neighbors/KNNRegressor/) is the KNN adaptation for regression tasks

In [4]:
from river.neighbors import KNNRegressor

model = (StandardScaler() |
        KNNRegressor(window_size=1000))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 31.739783, MSE: 2,028.1698, RMSE: 45.035206
[2,000] MAE: 37.879492, MSE: 2,891.47506, RMSE: 53.772438
[3,000] MAE: 46.324594, MSE: 4,634.606493, RMSE: 68.077944
[4,000] MAE: 50.804996, MSE: 5,602.46592, RMSE: 74.849622
[5,000] MAE: 54.460437, MSE: 6,454.568712, RMSE: 80.340331
[6,000] MAE: 56.444697, MSE: 6,929.107473, RMSE: 83.241261
[7,000] MAE: 58.613655, MSE: 7,455.649051, RMSE: 86.3461
[8,000] MAE: 58.880998, MSE: 7,510.97703, RMSE: 86.665893
[9,000] MAE: 59.053509, MSE: 7,535.757209, RMSE: 86.808739
[10,000] MAE: 60.668418, MSE: 7,943.33842, RMSE: 89.125408
[11,000] MAE: 64.124217, MSE: 9,037.029738, RMSE: 95.063293
[12,000] MAE: 67.172315, MSE: 10,018.612197, RMSE: 100.093018
[13,000] MAE: 69.249937, MSE: 10,744.05968, RMSE: 103.653556
[14,000] MAE: 70.864456, MSE: 11,299.61068, RMSE: 106.299627
[15,000] MAE: 72.733986, MSE: 11,939.682621, RMSE: 109.268855
[16,000] MAE: 75.343299, MSE: 12,891.33336, RMSE: 113.540008
[17,000] MAE: 76.193881, MSE: 13,119.190772, RMSE:

MAE: 75.776292, MSE: 13,010.496556, RMSE: 114.063564

## AMRules
---


[AMRules](https://riverml.xyz/0.10.1/api/rules/AMRules/) is a rule-based algorithm for incremental regression tasks. AMRules relies on the Hoeffding bound to build its rule set, similarly to Hoeffding Trees. The Variance-Ratio heuristic is used to evaluate rules' splits. Moreover, this rule-based regressor has additional capacities not usually found in decision trees.
As **prediction type**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [7]:
from river.rules import AMRules

model = (StandardScaler() |
        AMRules(
            delta=0.01,
            n_min=100,
            drift_detector=ADWIN(),
            pred_type='adaptive'
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 33.507247, MSE: 5,622.116784, RMSE: 74.980776
[2,000] MAE: 52.32084, MSE: 250,668.713621, RMSE: 500.668267
[3,000] MAE: 59.942408, MSE: 170,708.104513, RMSE: 413.168373
[4,000] MAE: 64.838398, MSE: 131,021.232855, RMSE: 361.968552
[5,000] MAE: 66.454994, MSE: 106,766.973268, RMSE: 326.752159
[6,000] MAE: 67.594834, MSE: 90,646.082624, RMSE: 301.074879
[7,000] MAE: 67.937541, MSE: 79,049.252776, RMSE: 281.15699
[8,000] MAE: 67.494007, MSE: 70,179.43775, RMSE: 264.91402
[9,000] MAE: 72.857313, MSE: 124,283.437323, RMSE: 352.538561
[10,000] MAE: 73.03285, MSE: 112,950.450191, RMSE: 336.081017
[11,000] MAE: 76.19873, MSE: 105,021.626139, RMSE: 324.070403
[12,000] MAE: 77.735706, MSE: 97,800.329656, RMSE: 312.730442
[13,000] MAE: 79.476189, MSE: 91,907.855189, RMSE: 303.163083
[14,000] MAE: 81.420017, MSE: 87,042.543348, RMSE: 295.029733
[15,000] MAE: 83.86915, MSE: 83,063.12027, RMSE: 288.206732
[16,000] MAE: 86.241087, MSE: 79,751.272963, RMSE: 282.402679
[17,000] MAE: 86.666

MAE: 86.357952, MSE: 74,611.607453, RMSE: 273.151254

## HoeffdingTreeRegressor
---
[HoeffdingTreeRegressor](https://riverml.xyz/0.10.1/api/tree/HoeffdingTreeRegressor/) is the HT adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [11]:
from river.tree import HoeffdingTreeRegressor

model = (StandardScaler() |
        HoeffdingTreeRegressor(
            grace_period=100,
            leaf_prediction='adaptive',
            model_selector_decay=0.9,            
            nominal_attributes=['season','yr','workingday','weathersit'],           
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 27.29901, MSE: 1,559.328408, RMSE: 39.488333
[2,000] MAE: 30.468168, MSE: 2,049.217485, RMSE: 45.268283
[3,000] MAE: 37.571582, MSE: 3,413.954358, RMSE: 58.429054
[4,000] MAE: 43.010506, MSE: 4,560.455028, RMSE: 67.531141
[5,000] MAE: 46.193905, MSE: 5,225.581332, RMSE: 72.288183
[6,000] MAE: 47.445984, MSE: 5,422.90597, RMSE: 73.640383
[7,000] MAE: 48.851862, MSE: 5,692.930826, RMSE: 75.451513
[8,000] MAE: 49.042905, MSE: 5,718.460227, RMSE: 75.620501
[9,000] MAE: 50.076157, MSE: 7,030.353824, RMSE: 83.847205
[10,000] MAE: 50.912125, MSE: 7,060.652796, RMSE: 84.027691
[11,000] MAE: 54.014824, MSE: 8,076.920767, RMSE: 89.871691
[12,000] MAE: 57.837782, MSE: 9,254.905104, RMSE: 96.202417
[13,000] MAE: 60.724616, MSE: 10,137.81184, RMSE: 100.686701
[14,000] MAE: 62.519811, MSE: 10,680.495191, RMSE: 103.346481
[15,000] MAE: 64.521247, MSE: 11,405.238515, RMSE: 106.795311
[16,000] MAE: 66.76976, MSE: 12,381.526658, RMSE: 111.272309
[17,000] MAE: 67.615282, MSE: 12,581.119585, 

MAE: 68.229484, MSE: 12,735.514685, RMSE: 112.851738

## HoeffdingAdaptiveTreeRegressor
---
[HoeffdingAdaptiveTreeRegressor](https://riverml.xyz/0.10.1/api/tree/HoeffdingAdaptiveTreeRegressor/) is the HAT adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [14]:
from river.tree import HoeffdingAdaptiveTreeRegressor

model = (StandardScaler() |
        HoeffdingAdaptiveTreeRegressor(
            grace_period=100,
            leaf_prediction='adaptive',
            model_selector_decay=0.9,
            seed=1,            
            nominal_attributes=['season','yr','workingday','weathersit']
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 25.202601, MSE: 1,374.689305, RMSE: 37.076803
[2,000] MAE: 27.202401, MSE: 2,109.47032, RMSE: 45.92897
[3,000] MAE: 35.12362, MSE: 3,486.829491, RMSE: 59.049382
[4,000] MAE: 45.017906, MSE: 5,331.458947, RMSE: 73.01684
[5,000] MAE: 49.826018, MSE: 6,215.198049, RMSE: 78.836527
[6,000] MAE: 52.383747, MSE: 6,718.679383, RMSE: 81.967551
[7,000] MAE: 54.121729, MSE: 7,007.600091, RMSE: 83.71141
[8,000] MAE: 54.35347, MSE: 7,032.944518, RMSE: 83.862653
[9,000] MAE: 56.149199, MSE: 9,929.448866, RMSE: 99.64662
[10,000] MAE: 58.360644, MSE: 10,116.438989, RMSE: 100.58051
[11,000] MAE: 62.242584, MSE: 11,190.768373, RMSE: 105.786428
[12,000] MAE: 64.264621, MSE: 11,540.118874, RMSE: 107.424945
[13,000] MAE: 65.723855, MSE: 11,796.184878, RMSE: 108.610243
[14,000] MAE: 66.25704, MSE: 11,848.512306, RMSE: 108.850872
[15,000] MAE: 66.771425, MSE: 11,920.62793, RMSE: 109.181628
[16,000] MAE: 67.32521, MSE: 12,072.578733, RMSE: 109.875287
[17,000] MAE: 67.679485, MSE: 12,114.233686, R

MAE: 67.402293, MSE: 12,008.804075, RMSE: 109.584689

## AdaptiveRandomForestRegressor
---
[AdaptiveRandomForestRegressor](https://riverml.xyz/0.10.1/api/ensemble/AdaptiveRandomForestRegressor/) is the ARF adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [15]:
from river.ensemble import AdaptiveRandomForestRegressor

model = (StandardScaler() |
        AdaptiveRandomForestRegressor(
            n_models=10,
            seed=1,
            model_selector_decay=0.9,           
            nominal_attributes=['season','yr','workingday','weathersit'],
            leaf_prediction='adaptive'
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features.drop('season')], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 31.969551, MSE: 2,006.951761, RMSE: 44.799015
[2,000] MAE: 37.89668, MSE: 3,144.176443, RMSE: 56.072956
[3,000] MAE: 51.634231, MSE: 7,187.403275, RMSE: 84.778554
[4,000] MAE: 64.651297, MSE: 10,820.615256, RMSE: 104.022186
[5,000] MAE: 69.928343, MSE: 11,942.02479, RMSE: 109.279572
[6,000] MAE: 71.028873, MSE: 11,984.503073, RMSE: 109.473755
[7,000] MAE: 71.256277, MSE: 11,785.941078, RMSE: 108.563074
[8,000] MAE: 70.609918, MSE: 11,383.398658, RMSE: 106.693011
[9,000] MAE: 69.376219, MSE: 10,934.674829, RMSE: 104.568996
[10,000] MAE: 69.190854, MSE: 10,989.176972, RMSE: 104.829275
[11,000] MAE: 72.6409, MSE: 12,489.426892, RMSE: 111.756104
[12,000] MAE: 75.98469, MSE: 13,562.781318, RMSE: 116.459355
[13,000] MAE: 78.593623, MSE: 14,264.382619, RMSE: 119.433591
[14,000] MAE: 80.886205, MSE: 14,873.670908, RMSE: 121.95766
[15,000] MAE: 83.320798, MSE: 15,759.551706, RMSE: 125.537053
[16,000] MAE: 86.465837, MSE: 16,969.649723, RMSE: 130.267608
[17,000] MAE: 87.271889, MSE:

MAE: 87.247241, MSE: 16,941.555193, RMSE: 130.15973

## SRPRegressor
---
[SRPRegressor](https://riverml.xyz/0.10.1/api/ensemble/SRPRegressor/) is the SRP adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [16]:
from river.ensemble import SRPRegressor
from river.tree import HoeffdingTreeRegressor

model = (StandardScaler() |
        SRPRegressor(
            n_models=10,
            seed=1,
            drift_detector=ADWIN(delta=0.001),
            warning_detector=ADWIN(delta=0.01),
            model = HoeffdingTreeRegressor(
                grace_period=100,
                leaf_prediction='adaptive',
                model_selector_decay=0.9,                
                nominal_attributes=['season','yr','workingday','weathersit']
            )            
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 24.848872, MSE: 1,319.556808, RMSE: 36.325705
[2,000] MAE: 28.781345, MSE: 1,783.193626, RMSE: 42.227877
[3,000] MAE: 35.216968, MSE: 2,887.231078, RMSE: 53.732961
[4,000] MAE: 41.23148, MSE: 3,942.859123, RMSE: 62.79219
[5,000] MAE: 43.933795, MSE: 4,325.948176, RMSE: 65.771941
[6,000] MAE: 45.693313, MSE: 4,645.891973, RMSE: 68.16078
[7,000] MAE: 47.018871, MSE: 4,844.525095, RMSE: 69.602623
[8,000] MAE: 47.714774, MSE: 4,948.561424, RMSE: 70.346012
[9,000] MAE: 47.972327, MSE: 4,993.645842, RMSE: 70.665733
[10,000] MAE: 48.787203, MSE: 5,160.183908, RMSE: 71.83442
[11,000] MAE: 51.728682, MSE: 5,897.840229, RMSE: 76.797397
[12,000] MAE: 54.078749, MSE: 6,502.903993, RMSE: 80.640585
[13,000] MAE: 55.925903, MSE: 6,964.978038, RMSE: 83.456444
[14,000] MAE: 57.313277, MSE: 7,315.742505, RMSE: 85.532114
[15,000] MAE: 58.688875, MSE: 7,691.267785, RMSE: 87.699873
[16,000] MAE: 60.447046, MSE: 8,245.244208, RMSE: 90.803327
[17,000] MAE: 61.326826, MSE: 8,442.000715, RMSE: 91.

MAE: 60.974581, MSE: 8,372.197939, RMSE: 91.499716

#### To be noticed that SRPRegressor, using Global Random Subspaces, achieves best performance and takes less time to execute w.r.t. ARFRegressor 