In [1]:
import pandas as pd
import numpy as np

#  sklearn-style predictive models

### Rough `sklearn` strategy

A `sklearn` model (a regressor or classifier) has a few basic methods. The most important are `__init__`, `fit`, `predict`, and (for classifiers) `predict_proba`.

In [2]:
class ModelClass:

    def __init__(self, *arg, **kwargs):
        '''
        Establishes hyperparameters
        '''
        pass
    
    
    def fit(self, X, y): 
        '''
        X = training data features
        y = training data labels/targets
        
        Takes in training data; stores whatever
        information is need to make future predictions
        '''
        pass
    
    
    def predict(self, X):  
        '''
        X = new data points (features only)
        
        Returns prediction for X
        '''
        
        pass
    
    
    def predict_proba(self, X):  
        '''
        X = new data points (features only)
        
        Only for classification models: returns probabilty
        of x belonging to each class
        '''
        
        pass


### Making our own practice class:  Mean Regressor 

One of the simplest possible model is to *always predict the average*. 

Let's make this model into a class in `sklearn`-style

In [3]:
class Mean_Regressor():

    def __init__(self, *arg, **kwargs):    #no hyperparameters
        pass

    def fit(self, X, y):
        self.mean_prediction = y.mean()
        return self

    def predict(self, X):
        return np.ones(X.shape[0]) * self.mean_prediction

Let's import some sample data. The `X` will represent the features, and the `y` what we're trying to predict.

In [4]:
#importing sample dataset
from sklearn.datasets import load_boston

boston = load_boston(return_X_y=False)

X = pd.DataFrame(boston.data[:, (0, 5, 6)],
                 columns = ['Crime_Rate',
                            'Avg_Rooms',
                            'Pct_built_b4_1940'])
y = boston.target

In [5]:
X.shape

(506, 3)

In [6]:
X.head()

Unnamed: 0,Crime_Rate,Avg_Rooms,Pct_built_b4_1940
0,0.00632,6.575,65.2
1,0.02731,6.421,78.9
2,0.02729,7.185,61.1
3,0.03237,6.998,45.8
4,0.06905,7.147,54.2


The `y` variable is the  target (the Median value of Homes (in $1000's))

In [7]:
y[:5]

array([24. , 21.6, 34.7, 33.4, 36.2])

First, we create an model.

In [8]:
model = Mean_Regressor()

Next, we fit it to our data.

In [9]:
model.fit(X, y)

<__main__.Mean_Regressor at 0x7fabdcf0e880>

And now we can make some predictions!

Note the are all the same. This isn't a very good model.

In [10]:
model.predict(X[0:10])

array([22.53280632, 22.53280632, 22.53280632, 22.53280632, 22.53280632,
       22.53280632, 22.53280632, 22.53280632, 22.53280632, 22.53280632])

Let's say we have a new town with these features. Note that `sklearn` models expect that **X is a 2-dimensional object** so we need to have  pass in an array with a single row.

In [11]:
'''
Crime_Rate            0.09
Avg_Rooms             6.41
Pct_built_b4_1940    84.10
'''

new_town = np.array([[.09, 6.41, 84.10]])
new_town.shape

(1, 3)

We can our model to predict the median value of homes in the new town.

In [12]:
model.predict(new_town)

array([22.53280632])