  # Applying Machine Learning to iris Dataset
  
  ## Loading the data

In [5]:
# import load_iris function from datasets module
from sklearn.datasets import load_iris

# save "bunch" object containing iris dataset and its attributes
iris = load_iris()

# store feature matrix in "X"
X = iris.data

# store response vector in "y"
y = iris.target

In [7]:
# print the shape of X and y
print(X.shape)
print(y.shape)

(150, 4)
(150,)


## Scikit-learn 4-step modeling pattern

**Step 1**: Import the class you plan to use

In [16]:
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

**Step 2**: "Instantiate" the "estimator"
* "Estimator" is scikit-learn's term for model
* "Instantiate" means "make an instance of"

In [10]:
knn = KNeighborsClassifier(n_neighbors=1)

* Can specify tuning parameters (aka "hyperparameters") during this step
* All parameters not specified are set to their defaults

In [12]:
print(knn)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='uniform')


**Step 3**: Fit the model with data (aka "model training")
* Model is learning the reletion between X and y
* Occurs in-place

In [13]:
knn.fit(X,y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='uniform')

**Step 4**: Predict the response for a new observation
* New observations are called "out-of-sample" data
* Uses the information it learned during model training process

In [17]:
new_sample = np.array([3, 5, 4, 2])
new_sample = new_sample.reshape(1,-1) 
knn.predict(new_sample)

array([2])

* Returns a Numpy array
* Can predict for multiple observations at once

In [19]:
X_new = np.array([[3, 5, 4, 2], [5, 4, 3, 2]])
knn.predict(X_new)

array([2, 1])

## Using a different value for K

In [20]:
# instantiate the model (using the value k=5)
knn = KNeighborsClassifier(n_neighbors=5)

# fit the model with data
knn.fit(X, y)

# predict the responce of newer observations
knn.predict(X_new)

array([1, 1])

## Using a different classification model

In [25]:
# import the class
from sklearn.linear_model import LogisticRegression

# instantiate the model (using default parameter)
logreq = LogisticRegression()

# fit the model with data
logreq.fit(X, y)

# predict the response for new observations
logreq.predict(X_new)

array([2, 0])