## OOP with Scikit-Learn

* Mutable data types are data types that can be modified after they are initialized. For example, a list is a mutable data type in Python
* immutable data types can't be modified after they are initialized. For example, a string is an immutable data type in Python.
* Most scikit-learn classes are mutable, which means that calling methods on them changes their internal data.

# Scikit-Learn Classes 
Scikit-learn has four main classes to be aware of:  

* AN ESTIMATOR  
 

is defined by having a fit method. There are two typical forms for this method:

estimator.fit(data)
and

estimator.fit(X, y)

* A TRANSFORMER  
 

is an estimator that has a transform method: 

transformer.transform(data)  

An example of a transformer , StandardScaler is used to standardize features by removing the mean and scaling to unit variance 



In [1]:
# Import class from scikit-learn
from sklearn.preprocessing import StandardScaler

# Instantiate the scaler (same step for all estimators, though specific args differ)
scaler = StandardScaler()

In [2]:
#When the estimator is first instantiated, these are all of its attributes:
scaler.__dict__

{'with_mean': True, 'with_std': True, 'copy': True}

In [3]:
# fit the scaler on the 
data = [[10], [20], [30], [40], [50]]
scaler.fit(data)


StandardScaler()

In [4]:
scaler.__dict__

{'with_mean': True,
 'with_std': True,
 'copy': True,
 'n_features_in_': 1,
 'n_samples_seen_': 5,
 'mean_': array([30.]),
 'var_': array([200.]),
 'scale_': array([14.14213562])}

In [5]:
# access these fitted attributes
scaler.var_

array([200.])

In [6]:
scaler.mean_

array([30.])

In [7]:
# transform data after scaler
scaler.transform(data)

array([[-1.41421356],
       [-0.70710678],
       [ 0.        ],
       [ 0.70710678],
       [ 1.41421356]])

Some additional examples of transformers (that aren't also predictors) are:  
OneHotEncoder: used to convert categorical features into one-hot encoded features   
CountVectorizer: used to convert text data into a matrix of token counts

* PREDICTOR  

predictor is an estimator that has a predict method: 
predictor.predict(X) 

it is called after fit method 
can be part of supervised and unsupervised learning model  
example is LinearRegression, 

In [9]:
#import class from scikit learn
from sklearn.linear_model import LinearRegression
# instantiate the model
lr = LinearRegression()

In [10]:
# attributes:
lr.__dict__

{'fit_intercept': True, 'normalize': False, 'copy_X': True, 'n_jobs': None}

In [12]:
#  fit the linear regression on the data:
# Data representing X (features) and y (target), where y = 10x + 5
X = [[1], [2], [3], [4], [5]]
y = [15, 25, 35, 45, 55]

lr.fit(X, y)

LinearRegression()

In [13]:
lr.__dict__

{'fit_intercept': True,
 'normalize': False,
 'copy_X': True,
 'n_jobs': None,
 'n_features_in_': 1,
 'coef_': array([10.]),
 '_residues': 1.7452973362415567e-29,
 'rank_': 1,
 'singular_': array([3.16227766]),
 'intercept_': 4.999999999999993}

In [14]:
#acees fitted attributes
print(lr.intercept_)
print(lr.coef_[0])

4.999999999999993
10.000000000000002


In [15]:
# the predict method
lr.predict(X)

array([15., 25., 35., 45., 55.])

Some additional examples of predictors (that aren't also transformers) are:  
LogisticRegression: a classifier that uses the logistic regression algorithm   
KNeighborsRegressor: a regressor that uses the k-nearest neighbors algorithm   
DecisionTreeClassifier: a classifier that uses the decision tree algorithm   

* MODEL

A model is an estimator that has a score method. There are two typical forms for this method:

model.score(X, y)
and 
model.score(X) 



In [16]:
# in linear regression model we can score  model using r-squared
lr.score(X, y)

1.0

An example of a model that produces a score with just X would be PCA(Principal component analysis)

In [19]:
from sklearn.decomposition import PCA
pca = PCA(n_components=1)

In [20]:
pca.__dict__

{'n_components': 1,
 'copy': True,
 'whiten': False,
 'svd_solver': 'auto',
 'tol': 0.0,
 'iterated_power': 'auto',
 'random_state': None}

In [21]:
# DATA
X = [[1, 11], [2, 12], [3, 14], [4, 16], [5, 28]]
# fit
pca.fit(X)

PCA(n_components=1)

In [22]:
pca.score(X)

-4.299494986505569

If it has a fit method, it's an estimator    
If it has a transform method, it's a transformer   
If it has a predict method, it's a predictor   
If it has a score method, it's a model   

you always need to call the fit method before you can call the transform, predict, or score methods.