# Machine Learning - Decision Tree Model with Wine Dataset

The Wine dataset is a classic dataset in machine learning, often used as a benchmark for classification algorithms. It contains the results of a chemical analysis of wines grown in a particular region in Italy, and the task is to predict the origin of the wine based on its chemical properties.

The dataset was created by R.A. Fisher and published in 1936. The dataset has been used extensively in research on classification algorithms, and is often used as a benchmark dataset to compare the performance of different classification models.

The Wine dataset contains 178 instances, each representing a different wine. There are 13 features (input variables) in the dataset, which are numerical and represent the results of a chemical analysis of the wines. These features include attributes such as alcohol content, malic acid, and ash content. The target variable in the dataset is the class label, which indicates the origin of the wine. There are three classes in the dataset, representing the three different wine varieties:

*  Class 0: 59 instances of wine from the first winery
*  Class 1: 71 instances of wine from the second winery
*  Class 2: 48 instances of wine from the third winery

The Wine dataset is often used to demonstrate the effectiveness of classification algorithms, and is a useful tool for teaching students about machine learning. It is also a good dataset for testing and comparing the performance of different classification algorithms, as it has a small number of instances and features.

https://scikit-learn.org/stable/datasets/toy_dataset.html#wine-dataset

In [80]:
# import necessary libraries
import numpy as np
from sklearn.datasets import load_wine
from sklearn import tree

In [81]:
# load the wine dataset into a variable name "wine"

In [83]:
# if we would like to take the first sample of each category to testing data
# given there is 59 instances of class 0, 71 instances of class 1, 48 instances of class 2. And the instances are sorted by class in ascending order
# what is the index of the first occurence/instance of class 0?
# what is the index of the first occurence/instance of class 1?
# what is the index of the first occurence/instance of class 2?
# create a text_inx list with the above indexes

In [85]:
# print the target of the wine dataset

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2])

In [87]:
# print the data/features of the wine dataset

array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
        1.065e+03],
       [1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
        1.050e+03],
       [1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
        1.185e+03],
       ...,
       [1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
        8.350e+02],
       [1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
        8.400e+02],
       [1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
        5.600e+02]])

In [89]:
# use np.delete function to remove the test_idx from the wine target to create a subset of wine targets/labels for training called 'train_target'

In [90]:
# use np.delete function to remove the test_idx from the wine target to create a subset of wine data for training called 'train_data'

In [91]:
# using the test_idx, create a subset of wine targets called 'test_target'

In [92]:
# using the test_idx, create a subset of wine data for testing called 'test_data'

In [94]:
# create a new decision tree classifier called 'clf'

In [96]:
# using the fit method to train the decision tree classifier with train_data and train_target

In [98]:
# using the predict method, predict the target using the test_data and store the result in a variable named "prediction"

In [100]:
# print the prediction

array([0, 1, 2])

In [102]:
# print the original test_target and compare with the prediction. 

array([0, 1, 2])

In [104]:
# copy and modify the code from Week13-02 to export the visualization of the decision tree to a pdf file

True

# Exercise 
Find a similar dataset like wine and iris and apply the same steps we did to practice training classifer using Decision Tree.