# Exercises 18: Classes

Let's get familiar with writing classes. This will allow you to organize your code much better, make it more reusable and diminish code dupplication by using inheritance.

## Ex. 18.1
We start with the class from the course and add new functionnality to it. I've modified the class from the course a bit, by adding documentation. Have a look at how these docstrings appear when using the `help` function on the class.

In [None]:
class Calculator():
    """This class tracks a current value and allows to make successive
    operations on that value.
    """
    
    def __init__(self):
        self.current_state = 0

    def print_state(self):
        """Prints the current_state.
        """
        print(self.current_state)

    def add(self, x):
        """Adds x to the current_state
        """
        self.current_state += x
        self.print_state()

In [None]:
help(Calculator)

### Ex. 18.1.1
Add a method `substract`, to substract a given value from the `current_state` of the calculator, then try it out.
You will need to either copy the whole definition of the classe above and add the new `substract` method, or create a class that inherits from the above `Calculator` class but additionally defines the `substract` method.

In [None]:
class Calculator():
    """This class tracks a current value and allows to make successive
    operations on that value.
    """
    
    def __init__(self):
        self.current_state = 0

    def print_state(self):
        """Prints the current_state.
        """
        print(self.current_state)

    def add(self, x):
        """Adds x to the current_state
        """
        self.current_state += x
        self.print_state()
    
    def substract(self, x):
        """Substracts x from the current_state
        """
        self.add(-x)

In [None]:
calc = Calculator()
calc.add(4)
calc.substract(7)

### Ex. 18.1.2
Now we add a `reset` method, which sets the `current_state` back to 0.

In [None]:
class ResettableCalculator(Calculator):
    """This class tracks a current value and allows to make successive
    operations on that value.
    """

    def reset(self):
        """Reset the current_state to 0"""
        self.current_state = 0
        

In [None]:
calc = ResettableCalculator()
calc.add(10)
calc.reset()
calc.print_state()

### Ex. 18.1.3
Finally try to modify the `__init__` method so that it has a second parameter `initial_value`, which is used to set the initial value of `current_state`. Give it a default value of `0`, so that if I don't specify `initial_value`, `current_state` will be set to 0.

In [None]:
class Calculator():
    """This class tracks a current value and allows to make successive
    operations on that value.
    """
    
    def __init__(self, initial_value=0):
        self.current_state = initial_value
        
    def print_state(self):
        """Prints the current_state.
        """
        print(self.current_state)

    def add(self, x):
        """Adds x to the current_state
        """
        self.current_state += x
        self.print_state()
    
    def substract(self, x):
        """Substracts x from the current_state
        """
        self.add(-x)
        
    def reset(self):
        """Reset the current_state to 0"""
        self.current_state = 0


In [None]:
calc = Calculator(8)
calc.print_state()

calc_default = Calculator()
calc_default.print_state()

## Ex 18.2: Scaler
Let's now turn to data analysis and training models.

Below I've made the basic imports, loaded the diabetes data and made a copy of both the features and the target.
We also split the data into a learning set and a prediction set. I've defined `n_learn` as the number of data points we will use for learning and split the data into `train_data` and `test_data` (and `train_target` and `test_target`) using `sklearn.model_selection.train_test_split`.

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
try:
    diabetes = load_diabetes(scaled=False)
except:
    diabetes = load_diabetes()
    diabetes.data[:,1] = np.where(diabetes.data[:,1] > 0, 2, 1)

features = np.copy(diabetes.data)
target = np.copy(diabetes.target)

n_learn = 280
train_data, test_data, train_target, test_target = \
train_test_split(features, target, train_size = n_learn)

### Ex 18.2.1:
We now write a class that we will use to normalise the data before using it to train a model. The class (`StandardScaler`) should have an `__init__` method, a `fit` method and a `transform` method. For simplicity we limit ourselves to one-dimensional data.
- Normalization of `data` is simply done by substracting the average and dividing by the standard deviation: `data = (data - np.average(data)) / np.std(data)` except that the average and the standard deviation will be stored on the scaler in the `fit` method and used in the `transform` method.
- I prepared the backbone of the class for you. Fill in wherever necessary.

In [None]:
class StandardScaler1D:
    
    def __init__(self):
        """Initializes the mean_ and scale_ values to None. Those will then be set when 
        fitting the scaler to some data"""
        self.mean_ = None
        self.scale_ = None
    
    def fit(self, data):
        """Extract the average and standard deviation from the data and save it in the
        corresponding attributes (mean_ and scale_)
        """
        self.mean_ = np.average(data)
        self.scale_ = np.std(data)
    
    def transform(self, data):
        """Use the stored mean_ and scale_ attributes to normalise the data and return it.
        """
        if self.mean_ is None or self.scale_ is None:
            print("You must fit the scaler before you can use it to normalise data")
            return None
        return (data - self.mean_) / self.scale_


### Ex. 18.2.2
We now test our `StandardScaler1D` on the first column of `train_data` and `test_data`, which I have prepared in `train_data0` and `test_data0`
- Use the scaler to learn on `train_data0` how to normalise the data
- Apply that normalisation to `train_data0`, call the normalised data `train_data0_normed`. 
- Apply that normalisation to `test_data0`, call the normalised data `test_data0_normed`. This is closer to a real situation, as training the normaliser is part of training a model, and as such this cannot be done on the `test_data0`, but only on the `train_data0`.
- Check that the normalised data have averages close to 0 and standard deviations close to 1.

In [None]:
train_data0 = train_data[:,0]
test_data0 = test_data[:,0]

In [None]:
scaler = StandardScaler1D()

In [None]:
# After initialisation and before fitting to the data, mean_ and scale_ are None
print("scaler mean is None before fit:", scaler.mean_ is None)

# We train the scaler on the training data:
scaler.fit(train_data0)

# By fitting the scaler was trained which set mean_ and scale_
print(scaler.mean_)

In [None]:
# Applying the normalisation to the train_data, will perfectly normalise it, as the
# normalizer was trained on that data
train_data0_normed = scaler.transform(train_data0)
print("train average: {}, train std: {}".format(np.average(train_data0_normed), np.std(train_data0_normed)))

# If our test data was a large enough and representative enough set of data
# applying the scaler to the test set should lead to good normalisation, i.e. 
# mean and std close to 0 and 1 respectively.
test_data0_normed = scaler.transform(test_data0)
print("test average: {}, test std: {}".format(np.average(test_data0_normed), np.std(test_data0_normed)))

### Ex. 18.2.3 (Supplementary)
Redo the exercise above but writing a scaler that works for multidimensional data

In [None]:
class StandardScaler:
    
    def __init__(self):
        """Initializes the mean_ and scale_ values to None. Those will then be set when 
        fitting the scaler to some data"""
        self.mean_ = None
        self.scale_ = None
    
    def fit(self, data):
        """Extract the average and standard deviation from the data and save it in the
        corresponding attributes (mean_ and scale_)
        """
        self.mean_ = np.average(data, axis=0)
        self.scale_ = np.std(data, axis=0)
    
    def transform(self, data):
        """Use the stored mean_ and scale_ attributes to normalise the data and return it.
        """
        if self.mean_ is None or self.scale_ is None:
            print("You must fit the scaler before you can use it to normalise data")
            return None
        return (data - self.mean_) / self.scale_


In [None]:
scaler = StandardScaler()

In [None]:
# After initialisation and before fitting to the data, mean_ and scale_ are None
print(scaler.mean_ is None)

# We train the scaler on the training data:
scaler.fit(train_data)

# By fitting the scaler was trained which set mean_ and scale_
print(scaler.mean_)

In [None]:
# Applying the normalisation to the train_data, will perfectly normalise it, as the
# normalizer was trained on that data
train_data_normed = scaler.transform(train_data)
print("train average: {}, train std: {}".format(np.average(train_data_normed, axis=0), np.std(train_data_normed, axis=0)))

# If our test data was a large enough and representative enough set of data
# applying the scaler to the test set should lead to good normalisation, i.e. 
# mean and std close to 0 and 1 respectively.
test_data_normed = scaler.transform(test_data)
print("test average: {}, test std: {}".format(np.average(test_data_normed, axis=0), np.std(test_data_normed, axis=0)))

## Ex. 18.3: Estimator
In this exercise we will make a simple estimator ourselves. Similarly to Exercise 17.1.2, we will do a linear fit of the target data (disease progression) vs the BMI. As a reminder this was done like this:

In [None]:
def line(x, a, b): 
    return a*x + b

bmi_index = diabetes.feature_names.index("bmi")
train_feature = train_data[:, bmi_index]
fit_result = optimize.curve_fit(line, train_feature, train_target)
print(fit_result)

Here we will instead create a class that will allow us to do the linear fit and then use the fitted result to predict the housing price from the data. I wrote the class for you.

In [None]:
class SimpleLinearEstimator:
    
    def __init__(self):
        """self.a and self.b will be used to store the slope and intercept
        of our linear model. They get initialized to None here and
        will be set when fitting the estimator to some data.
        """
        self.a = None
        self.b = None

    def model(self, x, a, b): 
        return a*x + b
    
    def fit(self, features, target):
        """This method fits the model, i.e. it searches for the optimal weights
        a and b, so that the output of the model when applied on features will match
        target as closely as possible.
        """
        fit_result = optimize.curve_fit(self.model, features, target, method="dogbox")
        # curve_fit returns a tuple containing the optimal weights as first
        # element. We store them on our instance in self.a and self.b:
        self.a = fit_result[0][0]
        self.b = fit_result[0][1]
        
    def predict(self, features):
        """This method applies the optimized model on features and returns
        the predicted value for the target.
        """
        if self.a is None or self.b is None:
            print("You must fit the estimator before you can use it to make predictions")
            return None
        return self.model(features, self.a, self.b)
    
    def score(self, features, target):
        """Evaluate the quality of the estimator. This is done by using the fitted model
        to predict the target values from the data and compare them with target. The metric
        used for this comparison is the mean squared error.
        """
        if self.a is None or self.b is None:
            print("You must fit the estimator before you can score it")
            return None
        predicted = self.predict(features)
        return np.average((predicted - target)**2.0)
        

### Ex. 18.3.1
- Use the above defined class to make a linear fit of `train_target` vs `train_feature` defined above (disease progression vs bmi).
- use the estimator to predict the disease progression from the bmi (on `test_feature = test_data[:, 2]`)
- plot the predicted disease progression against the real disease progression (`test_target`) to check the quality of your prediction
- score the estimator (print the mean squared error made when using the estimator to predict `test_target` from `test_feature`)

In [None]:
# We instantiate the estimator.
estimator = SimpleLinearEstimator()
# After instantiation the parameters of the linear regression are undefined
print("a, b before fit", estimator.a, estimator.b)

# We then train the estimator on the training data
estimator.fit(train_feature, train_target)
# This will fit the linear model to the data, and the estimator will store the
# parameters of the fit.
print("a, b after fit", estimator.a, estimator.b)

# We now use the estimator to predict the price of houses from the number of rooms
# for the test set:
test_feature = test_data[:, bmi_index]
predicted_target = estimator.predict(test_feature)

In [None]:
# We plot the real housing price of the test set against our prediction
plt.figure()
plt.plot(test_target, predicted_target, "x")

# We add the diagonal here, which is equivalent to a perfect prediction
plt.plot(plt.xlim(), plt.xlim(), "--")
plt.xlabel("Disease Progression")
plt.ylabel("Predicted Progression")
plt.show()

In [None]:
# Finally we score our estimator.
print("Mean squared error on prediction:", estimator.score(test_feature, test_target))

# Supplementary

### Ex. 18.4
Write an estimator `LinearEstimator` (inspired from `SimpleLinearEstimator`) to fit the disease progression but using all 10 features instead of a single feature. We can actually write it so that it will fit a linear model to any number of features. Then test the `LinearEstimator` on the diabetes dataset, similarly to what was done in 18.2.1

*Hints*:
- *`model` should take `x` as parameter as above and then an arbitrary number of weights (use `*weights` as described in the supplementary slides and exercises of part 11 of the course on Functions)*
- *`optimize.curve_fit` needs to know how many parameters `model` needs, as it will call that function during the fit. It cannot guess it here as `model` takes an arbitrary number of parameters, hence you will need to pass it an additional argument `p0=np.ones(n_params)`, where `n_params` is the number of parameters `model` takes (not counting `x`).*

In [None]:
class LinearEstimator:
    
    def __init__(self):
        """self.weights will be used to store the weights
        of our linear model. It gets initialized to None here and
        will be set when fitting the estimator to some data.
        """
        self.weights = None

    def model(self, x, *weights):
        return np.sum(x * weights[:-1], axis=1) + weights[-1]

    def get_n_params(self, features):
        return features.shape[1] + 1
    
    def fit(self, features, target):
        """This method fits the model, i.e. it searches for the optimal weights
        a and b, so that the output of the model when applied on data will match
        target as closely as possible.
        """
        self.n_params = self.get_n_params(features)
        res = optimize.curve_fit(self.model, features, target, p0=np.ones(self.n_params), method="dogbox")
        
        # curve_fit returns a tuple containing the optimal weights as first
        # element. We store them on our instance in self.weights:
        self.weights = res[0]
        
    def predict(self, features):
        """This method applies the optimized model on features and returns
        the predicted value for the target.
        """
        if self.weights is None:
            print("You must fit the estimator before you can use it to make predictions")
            return None
        return self.model(features, *self.weights)

    def score(self, features, target):
        if self.weights is None:
            print("You must fit the estimator before you can score it")
            return None
        predicted = self.predict(features)
        return np.average((predicted - target)**2.0)

In [None]:
estimator = LinearEstimator()
estimator.fit(train_data_normed, train_target)
predicted_target = estimator.predict(test_data_normed)

In [None]:
plt.figure()
plt.plot(test_target, predicted_target, "x")
plt.plot(plt.xlim(), plt.xlim(), "--")
plt.xlabel("Disease Progression")
plt.ylabel("Predicted Progresion")
plt.show()

In [None]:
print("Mean squared error on prediction:", estimator.score(test_data_normed, test_target))