# Stream Regression
---

## `BikeSharing` dataset

**Description:** This dataset contains the hourly count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information. The task is to predict the number of bikes that will be rented.

**Features:** 10
 
|       Attribute      | Description |
|:--------------------:|:-----------------------------|
| `season`             | Season (1:winter, 2:spring, 3:summer, 4:fall)
| `yr`                 | Year (0: 2011, 1:2012)
| `mnth`               | Month (1 to 12)
| `hr`                 | Hour (0 to 23)
| `workingday`         | If day is neither weekend nor holiday is 1, otherwise is 0
| `weathersit`         | 1) Clear, Few clouds, Partly cloudy, Partly cloudy; 2)Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3) Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4) Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
| `temp`               | Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39
| `atemp`              | Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50
| `hum`                | Normalized humidity. The values are divided to 100 (max)
| `windspeed`          | Normalized wind speed. The values are divided to 67 (max)


**Target:** `cnt` | Count of total rental bikes
 
**Samples:** 17,379


In [14]:
import pandas as pd
from river.stream import iter_pandas
from river.metrics import Metrics,MAE,MSE,RMSE
from river.evaluate import progressive_val_score
from river.preprocessing import StandardScaler
from river.drift import ADWIN

In [3]:
data = pd.read_csv("../datasets/BikeSharing.csv")
features = data.columns[:-1]

## Linear Regression

---
[Linear Regression](https://riverml.xyz/0.10.1/api/linear-model/LinearRegression/) is a simple Linear Regression model

In [11]:
from river.linear_model import LinearRegression

model = (StandardScaler() |
        LinearRegression(intercept_lr=.1))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 734.364648, MSE: 50,294,204.597895, RMSE: 7,091.840706
[2,000] MAE: 4,247,443,511.282103, MSE: 1,902,404,398,197,491,892,224., RMSE: 43,616,561,054.231361
[3,000] MAE: 3,100,783,550.074068, MSE: 1,269,106,155,664,069,361,664., RMSE: 35,624,516,216.561722
[4,000] MAE: 2,340,220,990.163503, MSE: 951,831,377,281,212,219,392., RMSE: 30,851,764,573.217075
[5,000] MAE: 1,879,635,607.990033, MSE: 761,465,588,073,537,470,464., RMSE: 27,594,665,935.168293
[6,000] MAE: 1,570,550,669.094805, MSE: 634,554,851,180,935,905,280., RMSE: 25,190,372,192.187553
[7,000] MAE: 1,348,913,300.604527, MSE: 543,904,262,035,912,130,560., RMSE: 23,321,755,123.401672
[8,000] MAE: 1,182,636,695.717268, MSE: 475,916,324,834,122,727,424., RMSE: 21,815,506,522.520229
[9,000] MAE: 20,678,359,536.491112, MSE: 107,792,216,736,977,771,298,816., RMSE: 328,317,250,136.172058
[10,000] MAE: 19,090,827,085.329998, MSE: 97,019,197,942,366,059,102,208., RMSE: 311,479,048,962.151001
[11,000] MAE: 17,465,810,909.00408

MAE: 11,151,382,853.59709, MSE: 55,825,768,580,521,006,202,880., RMSE: 236,274,773,474.700439

## KNNRegressor
---

[KNNRegressor](https://riverml.xyz/0.10.1/api/neighbors/KNNRegressor/) is the KNN adaptation for regression tasks

In [13]:
from river.neighbors import KNNRegressor

model = (StandardScaler() |
        KNNRegressor(window_size=1000))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 31.739783, MSE: 2,028.1698, RMSE: 45.035206
[2,000] MAE: 37.879492, MSE: 2,891.47506, RMSE: 53.772438
[3,000] MAE: 46.324594, MSE: 4,634.606493, RMSE: 68.077944
[4,000] MAE: 50.804996, MSE: 5,602.46592, RMSE: 74.849622
[5,000] MAE: 54.460437, MSE: 6,454.568712, RMSE: 80.340331
[6,000] MAE: 56.444697, MSE: 6,929.107473, RMSE: 83.241261
[7,000] MAE: 58.613655, MSE: 7,455.649051, RMSE: 86.3461
[8,000] MAE: 58.880998, MSE: 7,510.97703, RMSE: 86.665893
[9,000] MAE: 59.053509, MSE: 7,535.757209, RMSE: 86.808739
[10,000] MAE: 60.668418, MSE: 7,943.33842, RMSE: 89.125408
[11,000] MAE: 64.124217, MSE: 9,037.029738, RMSE: 95.063293
[12,000] MAE: 67.172315, MSE: 10,018.612197, RMSE: 100.093018
[13,000] MAE: 69.249937, MSE: 10,744.05968, RMSE: 103.653556
[14,000] MAE: 70.864456, MSE: 11,299.61068, RMSE: 106.299627
[15,000] MAE: 72.733986, MSE: 11,939.682621, RMSE: 109.268855
[16,000] MAE: 75.343299, MSE: 12,891.33336, RMSE: 113.540008
[17,000] MAE: 76.193881, MSE: 13,119.190772, RMSE:

MAE: 75.776292, MSE: 13,010.496556, RMSE: 114.063564

## AMRules
---


[AMRules](https://riverml.xyz/0.10.1/api/rules/AMRules/) is a rule-based algorithm for incremental regression tasks. AMRules relies on the Hoeffding bound to build its rule set, similarly to Hoeffding Trees. The Variance-Ratio heuristic is used to evaluate rules' splits. Moreover, this rule-based regressor has additional capacities not usually found in decision trees.

In [15]:
from river.rules import AMRules

model = (StandardScaler() |
        AMRules(delta=0.01,n_min=100,drift_detector=ADWIN()))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 33.507247, MSE: 5,622.116784, RMSE: 74.980776
[2,000] MAE: 52.32084, MSE: 250,668.713621, RMSE: 500.668267
[3,000] MAE: 59.942408, MSE: 170,708.104513, RMSE: 413.168373
[4,000] MAE: 64.838398, MSE: 131,021.232855, RMSE: 361.968552
[5,000] MAE: 66.454994, MSE: 106,766.973268, RMSE: 326.752159
[6,000] MAE: 67.594834, MSE: 90,646.082624, RMSE: 301.074879
[7,000] MAE: 67.937541, MSE: 79,049.252776, RMSE: 281.15699
[8,000] MAE: 67.494007, MSE: 70,179.43775, RMSE: 264.91402
[9,000] MAE: 72.857313, MSE: 124,283.437323, RMSE: 352.538561
[10,000] MAE: 73.03285, MSE: 112,950.450191, RMSE: 336.081017
[11,000] MAE: 76.19873, MSE: 105,021.626139, RMSE: 324.070403
[12,000] MAE: 77.735706, MSE: 97,800.329656, RMSE: 312.730442
[13,000] MAE: 79.476189, MSE: 91,907.855189, RMSE: 303.163083
[14,000] MAE: 81.420017, MSE: 87,042.543348, RMSE: 295.029733
[15,000] MAE: 83.86915, MSE: 83,063.12027, RMSE: 288.206732
[16,000] MAE: 86.241087, MSE: 79,751.272963, RMSE: 282.402679
[17,000] MAE: 86.666

MAE: 86.357952, MSE: 74,611.607453, RMSE: 273.151254

## HoeffdingTreeRegressor
---
[HoeffdingTreeRegressor](https://riverml.xyz/0.10.1/api/tree/HoeffdingTreeRegressor/) is the HT adaptation for regression tasks

In [24]:
from river.tree import HoeffdingTreeRegressor

model = (StandardScaler() |
        HoeffdingTreeRegressor(
            grace_period=100,
            leaf_prediction='adaptive',
            model_selector_decay=0.9,
            nominal_attributes=['season','yr','mnth','hr','workingday','weathersit']
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 34.260726, MSE: 2,139.367317, RMSE: 46.253295
[2,000] MAE: 42.346298, MSE: 3,291.316475, RMSE: 57.369996
[3,000] MAE: 52.958645, MSE: 5,600.563035, RMSE: 74.83691
[4,000] MAE: 63.271664, MSE: 8,084.230651, RMSE: 89.91235
[5,000] MAE: 69.716108, MSE: 9,546.598223, RMSE: 97.706695
[6,000] MAE: 73.072453, MSE: 10,444.064334, RMSE: 102.196205
[7,000] MAE: 76.233093, MSE: 11,300.225546, RMSE: 106.302519
[8,000] MAE: 76.445232, MSE: 11,309.974771, RMSE: 106.348365
[9,000] MAE: 76.828003, MSE: 13,481.589067, RMSE: 116.110245
[10,000] MAE: 78.29345, MSE: 13,637.388587, RMSE: 116.77923
[11,000] MAE: 83.407495, MSE: 15,347.522996, RMSE: 123.88512
[12,000] MAE: 87.987304, MSE: 16,844.497732, RMSE: 129.786354
[13,000] MAE: 92.091729, MSE: 18,236.753275, RMSE: 135.043524
[14,000] MAE: 95.831337, MSE: 19,707.705072, RMSE: 140.384134
[15,000] MAE: 99.632477, MSE: 21,300.677383, RMSE: 145.947516
[16,000] MAE: 102.719366, MSE: 22,622.784604, RMSE: 150.408725
[17,000] MAE: 104.01061, MSE: 2

MAE: 103.479574, MSE: 22,772.319826, RMSE: 150.905003

## HoeffdingAdaptiveTreeRegressor
---
[HoeffdingAdaptiveTreeRegressor](https://riverml.xyz/0.10.1/api/tree/HoeffdingAdaptiveTreeRegressor/) is the HAT adaptation for regression tasks

In [25]:
from river.tree import HoeffdingAdaptiveTreeRegressor

model = (StandardScaler() |
        HoeffdingAdaptiveTreeRegressor(
            grace_period=100,
            leaf_prediction='adaptive',
            model_selector_decay=0.9,
            seed=1,
            nominal_attributes=['season','yr','mnth','hr','workingday','weathersit']
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 35.001807, MSE: 2,270.739109, RMSE: 47.652273
[2,000] MAE: 41.847828, MSE: 3,281.055302, RMSE: 57.280497
[3,000] MAE: 53.082905, MSE: 5,618.142971, RMSE: 74.954273
[4,000] MAE: 64.499359, MSE: 8,418.810768, RMSE: 91.754078
[5,000] MAE: 71.987436, MSE: 10,218.737638, RMSE: 101.087772
[6,000] MAE: 76.150126, MSE: 11,249.063401, RMSE: 106.061602
[7,000] MAE: 79.551245, MSE: 12,202.853803, RMSE: 110.466528
[8,000] MAE: 79.707999, MSE: 12,179.059142, RMSE: 110.358775
[9,000] MAE: 79.515422, MSE: 12,988.892638, RMSE: 113.968823
[10,000] MAE: 80.638873, MSE: 13,134.159107, RMSE: 114.604359
[11,000] MAE: 84.774982, MSE: 14,555.286035, RMSE: 120.64529
[12,000] MAE: 88.168529, MSE: 15,676.450732, RMSE: 125.205634
[13,000] MAE: 91.539597, MSE: 16,858.005642, RMSE: 129.838383
[14,000] MAE: 94.445932, MSE: 17,981.476185, RMSE: 134.095027
[15,000] MAE: 97.789672, MSE: 19,257.35417, RMSE: 138.770869
[16,000] MAE: 100.953162, MSE: 20,664.418485, RMSE: 143.751238
[17,000] MAE: 102.433648, 

MAE: 101.844693, MSE: 20,906.880953, RMSE: 144.592119

## AdaptiveRandomForestRegressor
---
[AdaptiveRandomForestRegressor](https://riverml.xyz/0.10.1/api/ensemble/AdaptiveRandomForestRegressor/) is the ARF adaptation for regression tasks

In [26]:
from river.ensemble import AdaptiveRandomForestRegressor

model = (StandardScaler() |
        AdaptiveRandomForestRegressor(
            n_models=10,
            seed=1,
            model_selector_decay=0.9,
            nominal_attributes=['season','yr','mnth','hr','workingday','weathersit']
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 73,390.266636, MSE: 2,478,472,297,465.033203, RMSE: 1,574,316.454041
[2,000] MAE: 101,803.6373, MSE: 3,432,470,623,264.700195, RMSE: 1,852,692.803264
[3,000] MAE: 41,118,242.996925, MSE: 739,898,386,154,961,792., RMSE: 860,173,462.828842
[4,000] MAE: 34,195,568.978765, MSE: 569,261,378,689,023,232., RMSE: 754,494,121.04338
[5,000] MAE: 32,315,922.651088, MSE: 506,357,461,448,505,984., RMSE: 711,587,985.739294
[6,000] MAE: 26,941,860.253836, MSE: 421,965,365,653,812,096., RMSE: 649,588,612.626339
[7,000] MAE: 23,093,074.847054, MSE: 361,684,599,132,727,424., RMSE: 601,402,194.153569
[8,000] MAE: 20,315,134.673458, MSE: 316,496,947,997,419,456., RMSE: 562,580,614.665507
[9,000] MAE: 595,195,477.072112, MSE: 101,258,607,543,250,845,696., RMSE: 10,062,733,601.922037
[10,000] MAE: 876,060,405.260741, MSE: 110,606,335,643,349,467,136., RMSE: 10,516,954,675.349203
[11,000] MAE: 819,411,830.691403, MSE: 100,807,872,717,806,927,872., RMSE: 10,040,312,381.485296
[12,000] MAE: 753,73

MAE: 536,743,210.290123, MSE: 63,952,909,276,441,198,592., RMSE: 7,997,056,288.187623

## SRPRegressor
---
[SRPRegressor](https://riverml.xyz/0.10.1/api/ensemble/SRPRegressor/) is the SRP adaptation for regression tasks

In [28]:
from river.ensemble import SRPRegressor
from river.tree import HoeffdingTreeRegressor

model = (StandardScaler() |
        SRPRegressor(
            n_models=10,
            seed=1,
            drift_detector=ADWIN(delta=0.001),
            warning_detector=ADWIN(delta=0.01),
            model = HoeffdingTreeRegressor(
                grace_period=100,
                leaf_prediction='adaptive',
                model_selector_decay=0.9,
                nominal_attributes=['season','yr','mnth','hr','workingday','weathersit']
            )            
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 30.304474, MSE: 1,757.366478, RMSE: 41.920955
[2,000] MAE: 35.91108, MSE: 2,463.020948, RMSE: 49.628832
[3,000] MAE: 43.131877, MSE: 3,792.085638, RMSE: 61.579913
[4,000] MAE: 51.327357, MSE: 5,509.607297, RMSE: 74.226729
[5,000] MAE: 56.775195, MSE: 6,693.651539, RMSE: 81.814739
[6,000] MAE: 60.424789, MSE: 7,513.512907, RMSE: 86.680522
[7,000] MAE: 63.234028, MSE: 8,115.817745, RMSE: 90.087834
[8,000] MAE: 63.841815, MSE: 8,204.824294, RMSE: 90.580485
[9,000] MAE: 63.014434, MSE: 7,993.930374, RMSE: 89.408782
[10,000] MAE: 63.456958, MSE: 8,091.877784, RMSE: 89.954865
[11,000] MAE: 67.528103, MSE: 9,221.403159, RMSE: 96.028137
[12,000] MAE: 71.615129, MSE: 10,451.284559, RMSE: 102.231524
[13,000] MAE: 75.307547, MSE: 11,592.865917, RMSE: 107.670172
[14,000] MAE: 78.564245, MSE: 12,666.556327, RMSE: 112.545797
[15,000] MAE: 81.963688, MSE: 13,831.954871, RMSE: 117.609332
[16,000] MAE: 84.853817, MSE: 14,872.001225, RMSE: 121.950815
[17,000] MAE: 85.857387, MSE: 15,099.753

MAE: 85.153796, MSE: 14,936.651044, RMSE: 122.215592