## Baseline Regressor (DummyModel)

The baseline regressor to make predictions with simple rules, possibly without using any features. 

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.dummy import DummyRegressor

### Load datasets

In [None]:
# we are using sklearns toy data of diabetes
dataset = load_diabetes()

print("Dataset features")
print(dataset.feature_names)
print("Total sample in data ", len(dataset.data))

dataset.data[:5, :]

### Dataset description

In [None]:
X_df = pd.DataFrame(dataset.data, columns=dataset.feature_names)
X_df.head()

In [None]:
X_df.mean(axis=0)

In [None]:
np.linalg.norm(X_df, axis=0)

All the features in the dataset has a unit norm. So, this dataset is normalized. 

### Dummy Regressor Strategies

Ref: https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyRegressor.html

Dummy regressor have several strategies to do the prediction. The following four are supported in the sklearn `DummyRegressor` class. 

* “mean”: always predicts the mean of the training set
* “median”: always predicts the median of the training set
* “quantile”: always predicts a specified quantile of the training set, provided with the quantile parameter.
* “constant”: always predicts a constant value that is provided by the user.

### Strategy: Mean

This strategy uses mean value of the target variable for prediction. 

In [None]:
x = dataset.data
y = dataset.target

# we choose the mean to get the best prediction.
dummy_model = DummyRegressor(strategy='mean')

dummy_model.fit(x,y)

print(dummy_model.score(x,y))


### Strategy: Median

This strategy chooses median value of the target variable for prediction.


In [None]:
x = dataset.data
y = dataset.target

# we choose the mean to get the best prediction.
dummy_model = DummyRegressor(strategy='median')

dummy_model.fit(x,y)

print(dummy_model.score(x,y))


### Strategy: constant

This strategy employs a user given value for prediction. 


In [None]:
x = dataset.data
y = dataset.target

# we choose the mean to get the best prediction.
dummy_model = DummyRegressor(strategy='constant', constant=3)

dummy_model.fit(x,y)

print(dummy_model.score(x,y))


### Strategy: quantile

Given a quntile, this strategy indentifies the corresponding value from the taget for making prediction. 


In [None]:
x = dataset.data
y = np.asarray(dataset.target, dtype='float')

# we choose the best quantile to get the best prediction.
dummy_model = DummyRegressor(strategy='quantile', quantile=0.554)

dummy_model.fit(x,y)

print(dummy_model.score(x,y))
