# Creating Hold-out enviornment for Model learning and testing

In Hold-out method we randomly divide data set into two parts. Where the one part is used for learning the model and other for testing. In practice, we usually take 70-30 proportion of data set for training and testing respectively. Scikit learn provides inbuilt function that randomly divide data set into training and test data samples based on the split entered by the user. Each training and test set should include information on indicator and predictor variables. 



<img src = "fig/holdout.png">

<img src = "fig/dataset2.png">

# Loading train_test_split library from scikit learn for Hold-Out enviornment

In [1]:
from sklearn.model_selection import train_test_split # using scikit learn for hold-out

# Creating Hold-out enviornment for sample data set in scikit learn

# 1. Loading data set

In [2]:
# Loading library
from sklearn import datasets
dataset_wine = datasets.load_wine()

# 2. Creating Hold-out enviornment for built in data set

In [3]:
winedata_train, winedata_test, winetarget_train, winetarget_test = train_test_split(dataset_wine.data, dataset_wine.target, test_size=0.3)

#The pair of arrays winedata_train and  winetarget_train will be used for learning
#the sueprvised model. 
#Whereas, winedata_test and  winetarget_test for model testing
print(winetarget_test)

[2 0 2 1 2 1 2 1 2 0 2 1 1 1 2 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 2 2 1 1
 1 2 0 0 2 1 0 2 1 0 0 2 1 2 2 0 1]


# Creating Hold-out enviornment for user defined data set

# 1. Import pandas

In [4]:
import pandas as pd


# 2. Load user specific data set

In [6]:
# Loading data set from local machine. The data set on predicting liver disorder.
My_dataset = pd.read_csv('liver_dataset.csv')

# 3. Creating Hold-out enviornment for user specific data set

In [7]:
# My_data contains all data points from My_data set from from first feature to  6th feature(indicator features)
My_data = My_dataset.iloc[:,0:6].values
print(My_data)

# My_target contains class information which is 7th feature in the data set of all the data points in My_dataset

My_target=My_dataset.iloc[:,6].values 
print(My_target)

[[85. 92. 45. 27. 31.  0.]
 [85. 64. 59. 32. 23.  0.]
 [86. 54. 33. 16. 54.  0.]
 ...
 [98. 77. 55. 35. 89. 15.]
 [91. 68. 27. 26. 14. 16.]
 [98. 99. 57. 45. 65. 20.]]
[1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 1 1 1 1
 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1
 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 2 2
 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1
 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2
 2 1 1 2 2 2 2 1 2 1 1 1]


In [8]:
liverdata_train, liverdata_test, livertarget_train, livertarget_test = train_test_split(My_data, My_target, test_size=0.3)

In [9]:
print(livertarget_test)

[1 2 2 2 1 1 2 1 1 1 2 2 1 1 2 1 1 1 1 1 2 2 2 1 1 2 2 2 1 1 2 2 2 1 2 2 2
 1 2 2 1 2 2 2 1 2 2 1 2 1 1 1 2 2 2 2 2 2 1 2 1 1 1 1 1 1 2 1 1 2 2 1 2 2
 2 1 1 2 2 2 1 2 1 2 2 2 2 1 1 2 2 2 2 2 1 1 1 2 1 2 2 1 1 2]
