#  X for our data, and y for our answer labels.

### Supervised learning estimators implement a slightly different set of distinct methods:
* **.predict()** : After training your machine learning model, you can predict the labels of new and never seen samples
* **.predict_proba()** : For some estimators, you can further see what the probability of the new sample belonging to each label is
* **.score()**: The ability to score how well your model fit the training data

### SciKit-Learn helps you split your data:

In [1]:
from sklearn.cross_validation import train_test_split

In [2]:
data   = [0,1,2,3,4, 5,6,7,8,9]

In [3]:
labels = [0,0,0,0,0, 1,1,1,1,1]

In [4]:
data_train, data_test, label_train, label_test = train_test_split(data, labels, test_size=0.5, random_state=7)

In [5]:
data_train

[9, 7, 3, 6, 4]

In [6]:
label_train

[1, 1, 0, 1, 0]

In [7]:
data_test

[8, 5, 0, 2, 1]

In [8]:
label_test

[1, 1, 0, 0, 0]

### Tinkering with test_size, random_state

**Setting the test_size to 0.7**

In [9]:
data_train, data_test, label_train, label_test = train_test_split(data, labels, test_size=0.7, random_state=7)

In [10]:
data_train

[3, 6, 4]

In [11]:
label_train

[0, 1, 0]

In [12]:
data_test

[8, 5, 0, 2, 1, 9, 7]

In [13]:
label_test

[1, 1, 0, 0, 0, 1, 1]

**Setting the random_state to 3**

**If you absolutely need the results to come back identically, such as if you're doing a demo, then you can pass in an optional random_state variable to make the centroid selection reproducible.**

In [16]:
data_train, data_test, label_train, label_test = train_test_split(data, labels, test_size=0.5, random_state=3)

In [17]:
data_train

[6, 7, 0, 3, 8]

### After you've trained your model against the training data (data_train, label_train), the next step is testing it. You'll use the .predict() method of your model, passing in the testing data (data_test) to come create an array of predictions. And then you'll gauge its accuracy against the true label_test answers. SciKit-Learn also has a method to help you do that:

In [18]:
from sklearn.metrics import accuracy_score

In [19]:
predictions = my_model.predict(data_test) 

NameError: name 'my_model' is not defined

In [23]:
predictions = [0,0,0,1,0]

In [20]:
label_train

[1, 1, 0, 0, 1]

In [24]:
accuracy_score(label_train, predictions)

0.20000000000000001

In [25]:
accuracy_score(label_train, predictions, normalize=False)

1