## Prepare python environment


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

%matplotlib inline

In [None]:
random_state = 5 # use this to control randomness across runs e.g., dataset partitioning

## Preparing the Glass Dataset (2 points)

We will use glass dataset from UCI machine learning repository. Details for this data can be found [here](https://archive.ics.uci.edu/ml/datasets/glass+identification). The objective of the dataset is to identify the class of glass based on the following features:

1.  RI: refractive index
2.  Na: Sodium
3.  Mg: Magnesium
4.  Al: Aluminum
5.  Si: Silica
6.  K: Potassium
7.  Ca: Calcium
8.  Ba: Barium
9.  Fe: Iron
10. Type of glass (Target label)

The classes of glass are:

1. building_windows_float_processed
2. building_windows_non_float_processed
3. vehicle_windows_float_processed
4. containers
6. tableware
7. headlamps

Identification of glass from its content can be used for forensic analysis.

### Loading the dataset

In [None]:
# Download and load the dataset
import os
if not os.path.exists('glass.csv'):
    !wget "https://github.com/jha-lab/ece364_2024/blob/main/data/glass.csv?raw=true" -O glass.csv
df = pd.read_csv('glass.csv')
# Display the first five instances in the dataset
df.head(5)

In [None]:
# Additional features to be added to the data
df['Ca_Na'] = df.Ca*df.Na
df['Al_Mg'] = df.Al*df.Mg
df['Ca_Mg'] = df.Ca*df.Mg
df['Ca_RI'] = df.Ca*df.RI

### Extract target and descriptive features (0.5 points)

In [None]:
# Store all the features from the data in X
X = # insert your code here
# Store all the labels in y
y = # insert your code here

In [None]:
# Convert data to numpy array
X = # insert your code here
y = # insert your code here

### Create training and validation datasets (0.5 points)


Split the data into training and validation sets using `train_test_split`.  See [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) for details. To get consistent result while splitting, set `random_state` to the value defined earlier. We use 80% of the data for training and 20% of the data for validation.

In [None]:
X_train, X_val, y_train, y_val = # insert your code here

### Preprocess the dataset (1 point)

#### Preprocess the data by normalizing each feature to have zero mean and unit standard deviation. This can be done using the `StandardScaler()` function. See [here](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) for more details.


In [None]:
# Define the scaler for scaling the data
scaler = # insert your code here

# Normalize the training data
X_train = # insert your code here

# Use the scaler defined above to standardize the validation data by applying the same transformation to the validation data.
X_val = # insert your code here

## Training error-based models (18 points)


#### We will use the `sklearn` library to train a Multinomial Logistic Regression classifier and Support Vector Machines.


### Exercise 1:  Learning a Multinomial Logistic Regression classifier (4 points)

#### Use `sklearn`'s `SGDClassifier` to train a multinomial logistic regression classifier (i.e., using a one-versus-rest scheme) with Stochastic Gradient Descent. Review ch.7 and see [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier) for more details.

#### Set the `random_state` as defined above,  increase the `n_iter_no_change` to 100 and `max_iter` to 1000 to facilitate better convergence.  

#### Report the model's accuracy over the training and validation sets.


In [None]:
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score

In [None]:
# insert your code here

#### Explain any performance difference observed between the training and validation datasets.

**ANS**:

### Exercise 2: Learning a Support Vector Machine (SVM) (14 points)

#### Use `sklearn`'s `SVC` class to train an SVM (i.e., using a [one-versus-one scheme](https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-one)). Review ch.7 and see [here](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) for more details.


In [None]:
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

#### Exercise 2a: Warm up (2 points)

#### Train an SVM with a linear kernel. Set the  random_state to the value defined above. Keep all other parameters at their defaults.

#### Report the model's accuracy over the training and validation sets.

In [None]:
# insert your code here

#### Exercise 2b: Evaluate a polynomial kernel function (4 points)

#### Try fitting an SVM with a polynomial kernel function and vary the degree among {1, 2, 3, 4}. Note that degree=1 yields a linear kernel.

#### For each fitted classifier, report its accuracy over the training and validation sets.

#### As before, set the random_state to the value defined above. Set the regularization strength `C=3`.  When the data is not linearly separable, this encourages the model to fit the training data. Keep all other parameters at their default values.

In [None]:
# insert your code here

#### Explain the effect of increasing the degree of the polynomial.

**ANS**:

#### Exercise 2c: Evaluate the radial basis kernel function (6 points)

#### Try fitting an SVM with a radial basis kernel function and vary the length-scale parameter given by $\gamma$ among {0.01, 0.1,1,10, 100}.

#### For each fitted classifier, report its accuracy over the training and validation sets.

#### As before, set the random_state to the value defined above. Set the regularization strength `C=10`.  When the data is not linearly separable, this encourages the model to fit the training data (read more [here](https://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html)). Keep all other parameters at their default values.

In [None]:
# insert your code here

#### Comment on the effect of increasing/reducing the length-scale parameter $\gamma$. Also, compare the performance of the classifiers trained with RBF kernel function against those trained with the polynomial and linear kernel functions (i.e., Ex. 2b).

**ANS**:

#### Exercise 2d: Briefly state the main difference between the logistic regression classifier and the SVM. (2 points)

**ANS**: