# Classification
## Objective

As a result of completing this exercise you should be able to:

- Understand the concept of Classification (binary and multi-class classifier)
- Understand the basic idea of Logistic Regression
- Understand the basic idea of K Nearest Neighbors (KNN)
- Understand the basic idea of Deicision Tree
- Understand the basic idea of Support Vector Machine (SVM)
- Build Binary Classifier using `sklearn` (scikit-learn library)
- Build Multi-class Classifier using `sklearn`

## Instructions

### Section
This homework includes two coding sections (binary classification and ). You will build classification models using different approaches (i.e., logistic regression, KNN, decision tree, and SVM) with sickit-learn module in Python . 

### Submission
The assignment should be submitted on Canvas. You will submit a single zip file.  

- Put all your work into a folder, which should have the 
    - ISAT341_hw4.ipynb, which includes your solutions for the first section and your codes for the second section
    - a subfolder called "datasets",which contains the test data 'housing_or.csv'
    - (only if you use photos or images to show your work) a subfolder called "images", which contains all the images you displayed in the Jupyter Notebook
- Compress your homework folder into a zip file
- Name you zip file as hw4_$\lt$your JMU eid$\gt$. For example, Dr. Yang's eid is yang4cx, then the submission would be hw4_yang4cx

### Some useful webpage from Scikit-Learn library
- https://scikit-learn.org/stable/tutorial/basic/tutorial.html
- https://scikit-learn.org/stable/tutorial/statistical_inference/settings.html
- https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
- https://scikit-learn.org/stable/modules/neighbors.html
- https://scikit-learn.org/stable/modules/tree.html
- https://scikit-learn.org/stable/modules/svm.html

## Load Data

Let build the following classifiers with the sckit-learn built-in dataset "wine"

In [1]:
from sklearn import datasets
wine = datasets.load_wine()
list(wine.keys())

['data', 'target', 'target_names', 'DESCR', 'feature_names']

In [2]:
print(wine.DESCR)

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0

In [3]:
print(wine.data)
print(wine.data.shape)

[[1.423e+01 1.710e+00 2.430e+00 ... 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 ... 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 ... 1.030e+00 3.170e+00 1.185e+03]
 ...
 [1.327e+01 4.280e+00 2.260e+00 ... 5.900e-01 1.560e+00 8.350e+02]
 [1.317e+01 2.590e+00 2.370e+00 ... 6.000e-01 1.620e+00 8.400e+02]
 [1.413e+01 4.100e+00 2.740e+00 ... 6.100e-01 1.600e+00 5.600e+02]]
(178, 13)


In [4]:
print(wine.target)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]


## KNN

#### First, fit or train the model

#### Second, assign a new X (13 features or attributes) and use the model you built to predict the y (the class)

In [23]:
# You may copy one row from the wine.data to be your new X
X_new = wine.data[0].reshape(1,-1)
print(X_new)
X_new.shape

[[1.423e+01 1.710e+00 2.430e+00 1.560e+01 1.270e+02 2.800e+00 3.060e+00
  2.800e-01 2.290e+00 5.640e+00 1.040e+00 3.920e+00 1.065e+03]]


(1, 13)

## Decision Tree

In [24]:
from sklearn.tree import DecisionTreeClassifier

tree_model = DecisionTreeClassifier(criterion='gini', 
                                    max_depth=4, 
                                    random_state=1)
tree_model.fit(wine.data, wine.target)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=4,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=1, splitter='best')

In [25]:
tree_model.predict(X_new)

array([0])

## Logistic Regression
Please use the prepared X and y in the previous step to build a binary classifier using logistic regression (with sckit-learn `sklearn`)

### Before we continue building the following binary classifiers, let's prepare the data to include only two classes (class_0 vs. class_1) 

In [7]:
class0_or_class1 = (wine.target == 0) | (wine.target == 1)
X = wine.data[class0_or_class1]
print(X)
print(X.shape)
y = wine.target[class0_or_class1]
print(y)

[[1.423e+01 1.710e+00 2.430e+00 ... 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 ... 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 ... 1.030e+00 3.170e+00 1.185e+03]
 ...
 [1.179e+01 2.130e+00 2.780e+00 ... 9.700e-01 2.440e+00 4.660e+02]
 [1.237e+01 1.630e+00 2.300e+00 ... 8.900e-01 2.780e+00 3.420e+02]
 [1.204e+01 4.300e+00 2.380e+00 ... 7.900e-01 2.570e+00 5.800e+02]]
(130, 13)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]


### Binary Classifier (Two Classes)

### [Bonus] Multi-class Classifier (Three Classes)
Please use the original data (with all 178 instances) and target (with all three classes) to bulit a mulit-class logistic regression. Use sckit-learn `sklearn`

## SVM

### Before we continue building the following binary classifiers, let's prepare the data to include only two classes (class_0 vs. class_1) 

In [None]:
class0_or_class1 = (wine.target == 0) | (wine.target == 1)
X = wine.data[class0_or_class1]
print(X)
print(X.shape)
y = wine.target[class0_or_class1]
print(y)

### Binary Classifier (Two Classes)
Please use the prepared X and y in the previous step to build a binary classifier using logistic regression (with sckit-learn `sklearn`)

### [Bonus] Kerneled SVM (Two Classes)