### Introduction

When we created our machine learning model in the notebook  [03 first steps with machine learning](./03_first_steps_with_machine_learning.ipynb) we called `fit(X, Y)` passing in the complete data set:

```
X, y = iris.data, iris.target

classifier = LinearSVC()

model = classifier.fit(X, y)
```

The problem we are faced with now is how do we test our model's accuracy?

### Setup random state

Many methods in Scikit-learn use randomness in their operation.  These methods usually have the parameter `random_state` allowing you to provide a fixed value which will give reproduceable results for others trying your code.

Rather than setting the random state on each method, we can set it globally in Scikit-learn using the following code:

In [1]:
# set the random state to one to ensure everyone running the code 
# gets the same results
import numpy as np
np.random.seed(1)

### Splitting our dataset

Typically we will want to split our dataset so that we use a portion of the data for training the model and a portion of the dataset is held back to test the model.


In [2]:
from sklearn.datasets import load_iris
iris = load_iris()

X, y = iris.data, iris.target

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33)

print('Training data size : ', X_train.shape, y_train.shape)

# Exercise: uncomment the next line and print the size of your test data
# print('Test data size : ',???.???, ???.???)

Training data size :  (100, 4) (100,)


### Fitting the model

Now that we have split the data, we can train the model using **only** the training data: 

In [3]:
from sklearn.svm import LinearSVC
classifier = LinearSVC()

# Train the model (using just the training data)
model = classifier.fit(X_train, y_train)



### Scoring the model

Finally, we can test the accuracy of the model by using the [score](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html?highlight=linear%20svc#sklearn.svm.LinearSVC.score) function

In [4]:
model.score(X_test, y_test)

0.92

We will discuss more about quantifying the quality of predictions in the next notebook.

### Summary

In this notebook we covered the basics of splitting our dataset into train and test data.

**Exercise:**

Try retraining your model with the parameters `penalty='l1', dual=False` and obtain the score.

The score when using the default parameter values (`classifier = LinearSVC()`) in Fitting the model was `0.92`.  Did you get a score of `0.9` with the parameters `penalty='l1', dual=False`?

### Navigation

[Previous](./07_estimator_parameters.ipynb) | [Next](./09_model_performance.ipynb) notebook