> Reference:
+ [machinelearningmastery: data resampling](http://machinelearningmastery.com/evaluate-performance-machine-learning-algorithms-python-using-resampling/)

To avoid overfitting, we can’t train a machine learning algorithm on a dataset and use predictions from this same dataset to evaluate machine learning algorithms.

We must evaluate our machine learning algorithms on data that is not used to train the algorithm.

The evaluation is an estimate that we can use to talk about how well we think the algorithm may actually do in practice. It is not a guarantee of performance.

Once we estimate the performance of our algorithm, we can then re-train the final algorithm on the entire training dataset and get it ready for operational use.

# Train and Test Sets #

**Pluses**
+ Fast
+ Good for large datasets

**Minuses**
+ High variance (differences in train and test datasets can result in differences in accuracy estimation)

In addition to specifying the size of the split, we also specify the random seed. Because the split of the data is random, we want to ensure that the results are reproducible. 
This is important if we want to compare this result to the estimated accuracy of another machine learning algorithm or the same algorithm with a different configuration.

In [9]:
# Evaluate using a train and a test set (67% / 33%)
import pandas
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y, test_size=test_size, random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
result = model.score(X_test, Y_test)
print("Accuracy: {:.3%}".format(result))

Accuracy: 75.591%
