# Classification with Scikit-Learn

In this lesson, you will learn the basic functionality of Scikit-Learn, one of the most important Machine Learning packages in Python. We will use an example dataset of Iris flowers.

### Cheat sheet
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf

## Concepts

|   concept   | description |
|:-----------:|:-----------:|
| Estimators | how models in Scikit-learn are called |
| m.fit() | method to train |
| m.predict() | creates a prediction for unknown data |
| m.transform() | transforms features (in some models) |
| train_test_split() |splits data in a training and test portion |
| random_state | parameter for reproducible random numbers |

### 1. Load the example data

In [1]:
from sklearn import datasets

In [2]:
iris = datasets.load_iris()

In [3]:
X = iris.data[:,:2]
y = iris.target

### 2. Constructing a model in Scikit-Learn

In [4]:
from sklearn import svm
from sklearn.model_selection import train_test_split

In [5]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y,
                                                test_size = 0.5,
                                                random_state = 42)

In [6]:
model = svm.SVC(kernel = 'linear', C=1.0)
model.fit(Xtrain, ytrain)

print("Train score: ", model.score(Xtrain, ytrain))
print("Test score: ", model.score(Xtest, ytest))

Train score:  0.7866666666666666
Test score:  0.76


### 3. Predictions for unknown data

In [7]:
import numpy as np

Xnew = np.array([[5.0, 3.3], [4.7, 2.1]])
ypred = model.predict(Xnew)

print("predictions:")
for x, y in zip(Xnew, ypred):
    label = iris.target_names[y]
    print(x, y, "->", label)

predictions:
[5.  3.3] 0 -> setosa
[4.7 2.1] 1 -> versicolor


## Exercise

1. Evaluate a SVC classifier on the iris data.
2. Use a SVC classifier on the Titanic data.
3. Find out what classification methods are there on the Scikit-Learn website. Use one or more of them.
4. Use sklearn.dummy.DummyClassifier with the default parameters, resulting in a 33% chance for each type of iris.
5. Change the parameters so that sklearn.dummy.DummyClassifier predicts 1 for any data point.
6. Evaluate both classifiers on the same dataset.