# OOP with Scikit-Learn - Lab done by `Eugene Maina`

## Introduction

Now that you have learned some of the basics of object-oriented programming with scikit-learn, let's practice applying it!

## Objectives:

In this lesson you will practice:

* Recall the distinction between mutable and immutable types
* Define the four main inherited object types in scikit-learn
* Instantiate scikit-learn transformers and models
* Invoke scikit-learn methods
* Access scikit-learn attributes

## Mutable and Immutable Types

For each example below, think to yourself whether it is a mutable or immutable type. Then expand the details tag to reveal the answer.

<ol>
    <li>
        <details>
            <summary style="cursor: pointer">Python dictionary (click to reveal)</summary>
            <p><strong>Mutable.</strong> For example, the `update` method can be used to modify values within a dictionary.</p>
            <p></p>
        </details>
    </li>
    <li>
        <details>
            <summary style="cursor: pointer">Python tuple (click to reveal)</summary>
            <p><strong>Immutable.</strong> If you want to create a modified version of a tuple, you need to use <code>=</code> to reassign it.</p>
            <p></p>
        </details>
    </li>
    <li>
        <details>
            <summary style="cursor: pointer">pandas <code>DataFrame</code> (click to reveal)</summary>
            <p><strong>Mutable.</strong> Using the <code>inplace=True</code> argument with various different methods allows you to modify a dataframe in place.</p>
            <p></p>
        </details>
    </li>
    <li>
        <details>
            <summary style="cursor: pointer">scikit-learn <code>OneHotEncoder</code> (click to reveal)</summary>
            <p><strong>Mutable.</strong> Calling the <code>fit</code> method causes the transformer to store information about the data that is passed in, modifying its internal attributes.</p>
            <p></p>
        </details>
    </li>
</ol>

## The Data

For this lab we'll use data from the built-in iris dataset:

In [1]:
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True, as_frame=True)

In [2]:
X

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [3]:
y

0      0
1      0
2      0
3      0
4      0
      ..
145    2
146    2
147    2
148    2
149    2
Name: target, Length: 150, dtype: int32

## Scikit-Learn Classes

For the following exercises, follow the documentation link to understand the class you are working with, but **do not** worry about understanding the underlying algorithm. The goal is just to get used to creating and using these types of objects.

### Estimators

For all estimators, the steps are:

1. Import the class from the `sklearn` library
2. Instantiate an object from the class
3. Pass in the appropriate data to the `fit` method

#### `MinMaxScaler` ([documentation here](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html))

Import this scaler, instantiate an object called `scaler` with default parameters, and `fit` the scaler on `X`.

In [4]:
# Import
from sklearn.preprocessing import MinMaxScaler

# Instantiate
scaler = MinMaxScaler()
# Fit
scaler.fit(X)

#### `DecisionTreeClassifier` ([documentation here](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html))

Import the classifier, instantiate an object called `clf` (short for "classifier") with default parameters, and `fit` the classifier on `X` and `y`.

In [5]:
# Import
from sklearn.tree import DecisionTreeClassifier
# Instantiate
clf = DecisionTreeClassifier()
# Fit
clf.fit(X, y)

### Transformers

One of the two objects instantiated above (`scaler` or `clf`) is a transformer. Which one is it? Consult the documentation.

---

<details>
    <summary style="cursor: pointer">Hint (click to reveal)</summary>
    <p>The class with a <code>transform</code> method is a transformer.</p>
</details>

---

#### Using the transformer, print out two of the fitted attributes along with descriptions from the documentation.

---

<details>
    <summary style="cursor: pointer">Hint (click to reveal)</summary>
    <p>Attributes ending with <code>_</code> are fitted attributes.</p>
</details>

In [17]:
# Your code here#

print(scaler.__doc__)


Transform features by scaling each feature to a given range.

    This estimator scales and translates each feature individually such
    that it is in the given range on the training set, e.g. between
    zero and one.

    The transformation is given by::

        X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
        X_scaled = X_std * (max - min) + min

    where min, max = feature_range.

    This transformation is often used as an alternative to zero mean,
    unit variance scaling.

    `MinMaxScaler` doesn't reduce the effect of outliers, but it linearily
    scales them down into a fixed range, where the largest occuring data point
    corresponds to the maximum value and the smallest one corresponds to the
    minimum value. For an example visualization, refer to :ref:`Compare
    MinMaxScaler with other scalers <plot_all_scaling_minmax_scaler_section>`.

    Read more in the :ref:`User Guide <preprocessing_scaler>`.

    Parameters
    ----------
    feature_r

#### Now, call the `transform` method on the transformer and pass in `X`. Assign the result to `X_scaled`

In [14]:
# Your code here
X_scaled = scaler.transform(X)

### Predictors and Models

The other of the two scikit-learn objects instantiated above (`scaler` or `clf`) is a predictor and a model. Which one is it? Consult the documentation.

---

<details>
    <summary style="cursor: pointer">Hint (click to reveal)</summary>
    <p>The class with a <code>predict</code> method and a <code>score</code> method is a predictor and a model.</p>
</details>

---

#### Using the predictor, print out two of the fitted attributes along with descriptions from the documentation.

In [19]:
# Your code here
print(clf.score(X, y))

print(clf.__doc__)


1.0
A decision tree classifier.

    Read more in the :ref:`User Guide <tree>`.

    Parameters
    ----------
    criterion : {"gini", "entropy", "log_loss"}, default="gini"
        The function to measure the quality of a split. Supported criteria are
        "gini" for the Gini impurity and "log_loss" and "entropy" both for the
        Shannon information gain, see :ref:`tree_mathematical_formulation`.

    splitter : {"best", "random"}, default="best"
        The strategy used to choose the split at each node. Supported
        strategies are "best" to choose the best split and "random" to choose
        the best random split.

    max_depth : int, default=None
        The maximum depth of the tree. If None, then nodes are expanded until
        all leaves are pure or until all leaves contain less than
        min_samples_split samples.

    min_samples_split : int or float, default=2
        The minimum number of samples required to split an internal node:

        - If int, then 

#### Now, call the `predict` method on the predictor, passing in `X`. Assign the result to `y_pred`

In [20]:
# Your code here

y_preds = clf.predict(X)

#### Now, call the `score` method on the predictor, passing in `X` and `y`

In [21]:
# Your code here

clf.score(X, y_preds)

1.0

#### What does that score represent? Write your answer below

In [23]:
"""
The score represents that the Decision Tree Classifier correctly predicted the class for 100% of the data points in the provided X and y.

"""

'\nThe score represents that the Decision Tree Classifier correctly predicted the class for 100% of the data points in the provided X and y.\n\n'

## Summary

In this lab, you practiced identifying mutable and immutable types as well as identifying transformers, predictors, and models using scikit-learn. You also instantiated scikit-learn objects, invoked the most common scikit-learn methods, and accessed some scikit-learn attributes.