# Exercise: Regression and Classification Machine Learning

In this exercise, we'll dive deeper into the ML concepts by creating a regression and classification model.

Your tasks for this exercise are:
1. Load the iris dataset into a dataframe
2. Create a LinearRegression model and fit it to the dataset
3. Score the regression model on the dataset and predict it's values
4. Create a RidgeClassifier model and fit it to the dataset, use `alpha=3.0` when initializing the model
5. Score the classification model on the dataset and predict it's values

In [1]:
import numpy as np
import pandas as pd
import sklearn
from sklearn import datasets

In [2]:
# Load in the iris dataset
iris = datasets.load_iris()

In [3]:
iris.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

In [4]:
# Create the iris `data` dataset as a dataframe and name the columns with `feature_names`
df = pd.DataFrame(iris['data'], columns=iris['feature_names'])

# Include the target as well
df['target'] = iris['target']

In [5]:
# Check your dataframe by `.head()`
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


## Regression ML

In [6]:
from sklearn.linear_model import LinearRegression

In [7]:
# Fit a standard regression model, we've done this in other exercises
reg = LinearRegression().fit(df[iris['feature_names']], df['target'])

In [8]:
# Score the model on the same dataset
reg.score(df[iris['feature_names']], df['target'])

0.93042236753315966

In [9]:
# Predicting values shows they are not that useful to a classification model
reg.predict(df[iris['feature_names']])

array([ -8.26582725e-02,  -3.85897565e-02,  -4.81896914e-02,
         1.26087761e-02,  -7.61081708e-02,   5.68023484e-02,
         3.76259158e-02,  -4.45599433e-02,   2.07050198e-02,
        -8.13030749e-02,  -1.01728663e-01,   8.84875996e-05,
        -8.86050221e-02,  -1.01834705e-01,  -2.26997797e-01,
        -4.36405904e-02,  -3.39982044e-02,  -2.16688605e-02,
        -3.26854579e-02,  -1.22408563e-02,  -4.30562522e-02,
         5.31726003e-02,  -1.23012138e-01,   1.77258467e-01,
         6.81889023e-02,  -4.16362637e-03,   1.00119019e-01,
        -7.09322806e-02,  -8.92083742e-02,   1.99107233e-02,
         1.33606216e-02,   3.35222953e-02,  -1.58465961e-01,
        -1.57523171e-01,  -8.13030749e-02,  -1.03812269e-01,
        -1.49254996e-01,  -8.13030749e-02,  -6.41916305e-03,
        -5.55340896e-02,  -3.33948524e-02,   7.45644153e-02,
        -1.52672524e-02,   2.17673798e-01,   1.39549109e-01,
         3.33738018e-02,  -5.05301301e-02,  -1.45154068e-02,
        -9.07545163e-02,

In [10]:
# If we really wanted to, we could do something like round each regression value to an int
# and have it "act" like a classification model
# This is not required, but something to keep in mind for future reference
reg_cls = np.abs(np.rint(reg.predict(df[iris["feature_names"]])))
reg_cls

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  2.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  2.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  2.,  2.,  2.,  2.,
        2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,
        2.,  2.,  1.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,
        2.,  2.,  2.,  1.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,
        2.,  2.,  2.,  2.,  2.,  2.,  2.])

In [11]:
# Evaluate accuracy
sum(reg_cls == df["target"]) / df.shape[0]

0.9733333333333334

# Classification ML

In [12]:
from sklearn.linear_model import RidgeClassifier

In [13]:
# Fit a ridge classifier, which matches with the problem space of being a classification problem
clf = RidgeClassifier(alpha=3.0).fit(df[iris['feature_names']], df['target'])

In [14]:
# Score the model
clf.score(df[iris['feature_names']], df['target'])

0.85999999999999999

In [15]:
# Predict the class values for the dataset, these will look much better!
clf.predict(df[iris['feature_names']])

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 1,
       1, 2, 1, 1, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [16]:
sum(clf.predict(df[iris['feature_names']])== df['target']) /df.shape[0]

0.86