## General Instructions

# Submission Guidelines

## 1. Complete the Jupyter Notebook
- Fill in the missing code sections marked with `TODO`.  
- Ensure all required implementations are completed correctly.  

## 2. Include All Outputs
- Run all cells in the notebook before submission.  
- Ensure that outputs (e.g., printed results, tables, confusion matrices) are visible.  
- Submissions with missing outputs will be penalized.  

## 3. Submit the Notebook File (`.ipynb`) along with pdf of handwritten solutions
- **Do NOT** submit the dataset file, sample.ipynb, or any saved plots.  

## 4. Code Clarity & Documentation
- Clearly define any extra variables you introduce.  
- Use appropriate comments to explain modifications or additions to the code.  

⚠️ **Failure to adhere to these guidelines may result in deductions.**


## Gaussian Discriminant Classification

This assignment focuses on obesity classification using Gaussian Discriminant Analysis (GDA). The goal is to predict whether an individual is obese based on health-related features, using three variants of GDA classifiers. The dataset contains attributes such as gender, family history of overweight, dietary habits, physical activity levels, and physiological measurements.

### Rubric (Total = 70 points):
1. **Data Preprocessing**:  
   - Encode categorical variables (e.g., "yes"/"no" to 1/0).  **[3 points]**
   - Remove redundant columns (`CAEC`, `CALC`, `MTRANS`).  **[3 points]**

2. **Model Implementation**:  
   Implement three GDA classifiers with different covariance assumptions:
   - **Helper functions**: Functions to split class data and estimate parameters  **[20 points]**
   - **C1**: Class-dependent covariance matrices.  **[12 points]**
   - **C2**: Shared covariance matrix across classes.  **[12 points]**
   - **C3**: Diagonal covariance matrix (feature independence).  **[12 points]**

4. **Evaluation**:  
   - Train/test split (`test_size=0.2`).  
   - Compare performance using confusion matrices.  
   - Analyze why one model outperforms others.  
     1. **Q1**: Which model (C1, C2, or C3) performs best? **[4 points]**
     2. **Q2**: Why does this model perform best?  **[4 points]**

#### Dataset:
- Features: `Gender`, `Age`, `Height`, `Weight`, `family_history_with_overweight`, etc.  
- Target: `Obese` (1 = Not Obese, 2 = Obese).  


All code that needs to be filled in is marked with the word "*TODO*". So make sure you do not forget anything.

In [10]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [11]:
#
# TODO: Load in the `data.csv` file into a Pandas DataFrame.
#
df = pd.read_csv("data.csv")

df.head()

Unnamed: 0,Gender,Age,Height,Weight,family_history_with_overweight,FAVC,FCVC,NCP,CAEC,SMOKE,CH2O,SCC,FAF,TUE,CALC,MTRANS,NObeyesdad
0,Female,21.0,1.62,64.0,yes,no,2.0,3.0,Sometimes,no,2.0,no,0.0,1.0,no,Public_Transportation,Normal_Weight
1,Female,21.0,1.52,56.0,yes,no,3.0,3.0,Sometimes,yes,3.0,yes,3.0,0.0,Sometimes,Public_Transportation,Normal_Weight
2,Male,23.0,1.8,77.0,yes,no,2.0,3.0,Sometimes,no,2.0,no,2.0,1.0,Frequently,Public_Transportation,Normal_Weight
3,Male,27.0,1.8,87.0,no,no,3.0,3.0,Sometimes,no,2.0,no,2.0,0.0,Frequently,Walking,Overweight_Level_I
4,Male,22.0,1.78,89.8,no,no,2.0,1.0,Sometimes,no,2.0,no,0.0,0.0,Sometimes,Public_Transportation,Overweight_Level_II


In [12]:
df["NObeyesdad"].unique()   #For my own visualization purpose to see the unique class in NObyesdad class


array(['Normal_Weight', 'Overweight_Level_I', 'Overweight_Level_II',
       'Obesity_Type_I', 'Insufficient_Weight', 'Obesity_Type_II',
       'Obesity_Type_III'], dtype=object)

In [13]:
not_obese = ["Insufficient_Weight", "Normal_Weight"]

# Replace the cateogrical values with 1's and 2's for not obese and is obese respectfuly.
df["Obese"] = np.where(np.isin(df["NObeyesdad"], not_obese), 1, 2)

# Drop the Nobeyesdad column
df.drop("NObeyesdad", inplace = True, axis = 1) #axis= 1 specifies the column is being dropped   # the inplace= True ensures the change is applied directly to the DataFrame df without needing to reassign it.

# View the class imbalance
np.unique(df["Obese"], return_counts = True) #return_counts=True also returns the count of each unique value, which helps identify if there is a class imbalance.

(array([1, 2]), array([ 559, 1552]))

In [14]:
bad_cols = ["CAEC", "CALC", "MTRANS"]

#
# TODO: Remove (drop) the columns listed in `bas_cols` from the DataFrame.

df = df.drop(columns=bad_cols)

df.head()

Unnamed: 0,Gender,Age,Height,Weight,family_history_with_overweight,FAVC,FCVC,NCP,SMOKE,CH2O,SCC,FAF,TUE,Obese
0,Female,21.0,1.62,64.0,yes,no,2.0,3.0,no,2.0,no,0.0,1.0,1
1,Female,21.0,1.52,56.0,yes,no,3.0,3.0,yes,3.0,yes,3.0,0.0,1
2,Male,23.0,1.8,77.0,yes,no,2.0,3.0,no,2.0,no,2.0,1.0,1
3,Male,27.0,1.8,87.0,no,no,3.0,3.0,no,2.0,no,2.0,0.0,2
4,Male,22.0,1.78,89.8,no,no,2.0,1.0,no,2.0,no,0.0,0.0,2


In [17]:
mapping = {
    "Gender": {"Female": 1.0, "Male": 0.0},
    "family_history_with_overweight": {"yes": 1.0, "no": 0.0},
    "SMOKE": {"yes": 1.0, "no": 0.0},
    "FAVC": {"yes": 1.0, "no": 0.0},
    "SCC": {"yes": 1.0, "no": 0.0},
}

#
# TODO: Replace the values in the columns listed above in `mapping` with the new values.
#here we can directly use the .replace in the defined mapping dictionary
pd.set_option('future.no_silent_downcasting', True)  #because i got some kind of warning in notebook, so wanted to make the version of my pandas be compatible with future releases

# Ex: Replace 'yes' with '1' and 'no' with '0' in all of the columns that have yes and no answers.
df = df.replace(mapping)

df.head()

Unnamed: 0,Gender,Age,Height,Weight,family_history_with_overweight,FAVC,FCVC,NCP,SMOKE,CH2O,SCC,FAF,TUE,Obese
0,1.0,21.0,1.62,64.0,1.0,0.0,2.0,3.0,0.0,2.0,0.0,0.0,1.0,1
1,1.0,21.0,1.52,56.0,1.0,0.0,3.0,3.0,1.0,3.0,1.0,3.0,0.0,1
2,0.0,23.0,1.8,77.0,1.0,0.0,2.0,3.0,0.0,2.0,0.0,2.0,1.0,1
3,0.0,27.0,1.8,87.0,0.0,0.0,3.0,3.0,0.0,2.0,0.0,2.0,0.0,2
4,0.0,22.0,1.78,89.8,0.0,0.0,2.0,1.0,0.0,2.0,0.0,0.0,0.0,2


In [16]:
x_cols = list(df.columns[:-1])  #This line creates a list, x_cols, containing the names of all the columns except the last one.
y_cols = list(df.columns[-1:])  #This line creates a list, y_cols, containing the name of the last column. 

x = df[x_cols].to_numpy()   #converts the feature columns into a NumPy array
y = df[y_cols].to_numpy().reshape(-1) #converts the target column into a 1D NumPy array y, where each element corresponds to a target value for a sample in x.

#many ML algorithms expect the target variable to be a 1D array with shape (n,) instead of (n, 1)

# Feature dimension (should be 13)
d = x.shape[1]

print(f"Number of dimensions: {d}")
print(x.shape)
print(y.shape)

Number of dimensions: 13
(2111, 13)
(2111,)


In [21]:
!pip install scikit-learn
import sklearn




[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [25]:
from sklearn.model_selection import train_test_split  #train_test_split is a utility function used to split datasets into training and testing subsets.

Xtrain, Xtest, ytrain, ytest = train_test_split(
    x, y, test_size = 0.2, random_state = 42)   #test_size=0.2: This specifies the proportion of the dataset to include in the test split. Here, 0.2 means 20% of the data will be used for testing, and the remaining 80% will be used for training.

print(Xtrain.shape)
print(ytrain.shape)
print(Xtest.shape)
print(ytest.shape)

(1688, 13)
(1688,)
(423, 13)
(423,)


## Helper Functions

You can implement the helper functions before or after implementing the Gaussian Discriminant below, though we recommend to implement the helper functions first.

In [28]:
#Helper functions are smaller functions that perform a specific task and help make the code modular and reusable. These functions usually handle tasks that are needed by a main function but are complex enough to be separated out for clarity and better structure.
#For example, in the case of implementing Gaussian Discriminant Analysis or other machine learning algorithms, we might need helper functions for tasks like:
#Data Preprocessing: Normalizing or scaling data, handling missing values, etc.
#Parameter Estimation: Computing means, variances, or covariances.
#Matrix Operations: Implementing matrix multiplication, inversion, or other linear algebra tasks.



def splitData(features, labels):
    """
    Hint: Separate the features according to the corresponding labels, for example
    if `features = [[1,1],[2,2],[3,3],[4,4]]` and `labels = [1,1,1,2]`, the 
    resulting feature1 and feature2 will be `features1 = [[1,1],[2,2],[3,3]]` 
    and `features2 = [[4,4]]`
    
    Input:
        features: n*d matrix (n is the number of samples, d is the number of 
            dimensions of the feature)
        labels: n vector
        
    Output:
        features1: n1*d
        features2: n2*d
        
    Notes: 
        n1+n2 = n, where n1 is the number of class 1 and n2 is the number of samples 
        from class 2.
    
    """
    # Placeholders to store the separated features (feature1, feature2), 
    # can be ignored, removed or replaced with any following implementations
    #features1 = np.zeros([np.sum(labels == 1), features.shape[1]])  
    #features2 = np.zeros([np.sum(labels == 2), features.shape[1]])

    #
    # TODO: Fill in your code below! 
    features1 = features[labels == 1]  # Features corresponding to class 1
    features2 = features[labels == 2]  # Features corresponding to class 2
    
    return features1, features2


def computeMean(features):
    """
    Compute the mean of input features
    
    Hint: Try to explore np.mean() for convenience
    
    Input 
        features: n*d array
    
    Output: 
        d-dimensional array
        
    """
    # Placeholders to store the mean for one class can be ignored, removed, 
    # or replaced with any following implementations
    #m = np.zeros(features.shape[1])

    #
    # TODO: Fill in your code below! 
    m = np.mean(features, axis=0)
    
    return m


def computeCov(features):
    """
    Compute the covariance of input features
    
    Hint: Try to explore np.cov() for convenience
    
    Input: 
        features: n*d array
        
    Output: 
        d*d array
        
    """
    # Placeholders to store the covariance matrix for one class can be 
    # ignored, removed, or replaced with any following implementations
    #S = np.eye(features.shape[1])

    #
    # TODO: Fill in your code below! 
    #
    S = np.cov(features, rowvar=False)  ## rowvar=False ensures columns are treated as variables
    return S


def computePrior(labels):
    """
    Compute the priors of input features
    
    Hint: p1 = numOf class1 / numOf all the data; same as p2
    
    Input: 
        features: Array of shape (n,)
        
    output: 
        Array of length size 2
        
    """
    # placeholders to store the priors for both class
    # can be ignored, removed or replaced with any following implementations
    #p = np.array([0.5, 0.5])

    #
    # TODO: Fill in your code below!  
    # Compute priors
    p1 = np.sum(labels == 1) / len(labels)  # Probability of class 1
    p2 = np.sum(labels == 2) / len(labels)  # Probability of class 2
    
    p = np.array([p1, p2])
    return p

### Gaussian Discriminant C1

For this assignment, you are going to implement 3 classifiers and corresponding helper functions.

### Mathematical Formulation:

The Gaussian Discriminant Analysis (GDA) model assumes that the class-conditional probability density functions are Gaussian:

$$
P(X \mid Y = k) = \frac{1}{(2\pi)^{d/2} |\Sigma_k|^{1/2}} \exp\left(-\frac{1}{2} (X - \mu_k)^T \Sigma_k^{-1} (X - \mu_k)\right)
$$

where:
- $\mu_k$ is the mean vector for class $k$.
- $\Sigma_k$ is the covariance matrix for class $k$.
- $|\Sigma_k|$ is the determinant of $\Sigma_k$.
- $\Sigma_k^{-1}$ is the inverse of the covariance matrix.

The discriminant function for class $k$ is given by:

$$
g_k(X) = -\frac{1}{2} (X - \mu_k)^T \Sigma_k^{-1} (X - \mu_k) - \frac{1}{2} \log |\Sigma_k| + \log P(Y=k)
$$

where $P(Y=k)$ is the prior probability of class $k$.

### Implementation Tips:
- Compute $|\Sigma|$ using `np.linalg.det()`.
- Compute $\Sigma^{-1}$ using `np.linalg.inv()`.

In [32]:
class GaussianDiscriminant_C1:
    """
    Classifier initialization
    
    Args
      k: number of classes (2 for this assignment)
      d: number of features; feature dimensions (13 for this assignment)
      
    """
    def __init__(self, k = 2, d = 13):
        self.m = np.zeros((k, d))    # m1 and m2, store in 2*8 matrices
        self.S = np.zeros((k, d, d)) # S1 and S2, store in 2*(8*8) matrices
        self.p = np.zeros(2)         # p1 and p2, store in dimension 2 vectors

    def fit(self, Xtrain, ytrain):
        """
        Compute the parameters for both classes based on the training data
        
        """
        # Step 1: Split the data into two parts based on the labels
        Xtrain1, Xtrain2 = splitData(Xtrain, ytrain)

        # Step 2: Compute the parameters for each class
        # m1, S1 for class 1
        self.m[0,:] = computeMean(Xtrain1)
        self.S[0,:,:] = computeCov(Xtrain1)
        
        # m2, S2 for class 2
        self.m[1,:]  = computeMean(Xtrain2)
        self.S[1,:,:] = computeCov(Xtrain2)
        
        # Priors for both class
        self.p = computePrior(ytrain)

    def predict(self, Xtest):
        """
        Predict the labels for test data
        
        Input:
            Xtest: n*d array
            
        Output:
            predictions: n (all entries will be either number 1 or 2 to denote the labels)
            
        """
        # Placeholders to store the predictions can be ignored, removed or replaced with any 
        # following implementations
        #predictions = np.zeros(Xtest.shape[0])

        #
        # TODO: Fill in your code below! 
        #
        # Step1: plug in the test data features and compute the discriminant functions for both 
        # classes (you need to choose the correct discriminant functions) you will finall get two 
        # list of discriminant values (g1, g2), both have the shape n (n is the number of Xtest).


        #
        # TODO: Fill in your code below! 
        # Number of test samples
        n = Xtest.shape[0]
        
        # Initialize arrays for storing discriminant function values
        g1 = np.zeros(n)  # Discriminant function for class 1
        g2 = np.zeros(n)# Discriminant function for class 2

        detS1 = np.linalg.det(self.S[0])
        invS1 = np.linalg.inv(self.S[0])
        
        detS2 = np.linalg.det(self.S[1])
        invS2 = np.linalg.inv(self.S[1])

        for i in range(Xtest.shape[0]):
            
            quad_term_1 = -0.5 * np.dot(Xtest[i] - self.m[0], np.dot(invS1, Xtest[i] - self.m[0]))         
            determinant_term_1  = -0.5 * np.log(detS1 + 1e-12)                                             # determinant term for g1
            prior_term_1= np.log(self.p[0] + 1e-12)                                                         # prior term for g1
            g1[i] = quad_term_1 + determinant_term_1 + prior_term_1

            # --- For Class 2 ---
            quad_term_2 = -0.5 * np.dot(Xtest[i] - self.m[1], np.dot(invS2, Xtest[i] - self.m[1]))      # quadratic term for g2
            determinant_term_2  = -0.5 * np.log(detS2 + 1e-12)                                          # determinant term for g2      
            prior_term_2= np.log(self.p[1] + 1e-12)                                                     # prior term for g2
            g2[i] = quad_term_2 + determinant_term_2 + prior_term_2

        # Step2: If g1>g2, choose class1, otherwise choose class 2, you can convert g1 and g2 into 
        # your final predictions. 
        #predictions = np.where(g1 > g2, 1, 2)
        # Ex: g1 = [0.1, 0.2, 0.4, 0.3], g2 = [0.3, 0.3, 0.3, 0.4], => predictions = [2, 2, 1, 2]
        predictions = np.where(g1 > g2, 1, 2)

        return predictions


### Now test your implementation by evalutating the performance of C1 on test data

In [33]:
# Define the model with a Gaussian Discriminant function (class-dependent covariance)
clf = GaussianDiscriminant_C1(2, d)
clf.fit(Xtrain,ytrain)

# Evaluate on test data
predictions = clf.predict(Xtest)
confusion_matrix = np.array([
    [sum((ytest==1) & (predictions==1)), sum((ytest==2) & (predictions==1))],
    [sum((ytest==1) & (predictions==2)), sum((ytest==2) & (predictions==2))]
])

print("Confusion Matrix for Gaussian Discriminant with class-dependent covariance")
print(confusion_matrix)

Confusion Matrix for Gaussian Discriminant with class-dependent covariance
[[108  25]
 [ 10 280]]


In [50]:
accuracy_C1_classifier = (108 + 280) / (108+25+10+280)

print("Accuracy of C1 classifier:", accuracy_C1_classifier)


Accuracy of C1 classifier: 0.91725768321513


### Gaussian Discriminant C2

In [42]:
class GaussianDiscriminant_C2:
    """
    Classifier initialization
    
    Args
      k: number of classes (2 for this assignment)
      d: number of features; feature dimensions (13 for this assignment)
      
    """
    def __init__(self, k=2, d=8):
        self.m = np.zeros((k, d))        # m1 and m2, store in 2*8 matrices
        self.S = np.zeros((k, d, d))     # S1 and S2, store in 2*(8*8) matrices
        self.shared_S = np.zeros((d, d)) # the shared convariance S that will be used for both classes
        self.p = np.zeros(2)             # p1 and p2, store in dimension 2 vectors

    def fit(self, Xtrain, ytrain):
        """
        Compute the parameters for both classes based on the training data
        
        """
        # Step 1: Split the data into two parts based on the labels
        Xtrain1, Xtrain2 = splitData(Xtrain, ytrain)

        # Step 2: Compute the parameters for each class
        # m1, S1 for class1
        self.m[0,:] = computeMean(Xtrain1)
        self.S[0,:,:] = computeCov(Xtrain1)
        
        # m2, S2 for class2
        self.m[1,:] = computeMean(Xtrain2)
        self.S[1,:,:] = computeCov(Xtrain2)
        
        # Priors for both class
        self.p = computePrior(ytrain)

        #
        # TODO: Fill in your code below! 
        #
        # Step 3: Compute the shared covariance matrix that is used for both class
        # shared_S = p1 * S1 + p2 * S2
        self.shared_S = (self.p[0] * self.S[0]) + (self.p[1] * self.S[1])

    def predict(self, Xtest):
        """
        Predict the labels for test data.
        
        Input:
            Xtest: n*d
            
        Output:
            predictions: n (all entries will be either number 1 or 2 to denote the labels)
            
        """
        # Placeholders to store the predictions can be ignored, removed or 
        # replaced with any following implementations
        #predictions = np.zeros(Xtest.shape[0])

        #
        # TODO: Fill in your code below! 
        n = Xtest.shape[0]

        # Initialize arrays for storing discriminant function values
        g1 = np.zeros(n)  # Discriminant function for class 1
        g2 = np.zeros(n)  # Discriminant function for class 2

         # Step1: plug in the test data features and compute the discriminant functions for 
        # both classes (you need to choose the correct discriminant functions) you will final 
        # get two list of discriminant values (g1,g2), both have the shape n (n is the number 
        # of Xtest)

        # TODO: Fill in your code below! 
        # Compute determinant and inverse of the shared covariance matrix
        shared_S_det = np.linalg.det(self.shared_S+np.eye(self.shared_S.shape[0]))
        shared_S_inv = np.linalg.inv(self.shared_S+np.eye(self.shared_S.shape[0]))

        detS = np.linalg.det(self.shared_S)
        invS = np.linalg.inv(self.shared_S)

        for i in range(Xtest.shape[0]):
            
            
            # Find the discriminnt for class 1 using the shared covariance matrix
            quad_term_1 = -0.5 * np.dot(np.dot(Xtest[i] - self.m[0], shared_S_inv), Xtest[i] - self.m[0])
            determinant_term_1 = -0.5 * np.log(shared_S_det)
            prior_term_1 = np.log(self.p[0])
            g1[i] = quad_term_1 + determinant_term_1 + prior_term_1

            # Find the discriminnt for class 2 using the shared covariance matrix
            quad_term_2 = -0.5 * np.dot(np.dot(Xtest[i] - self.m[1], shared_S_inv), Xtest[i] - self.m[1])
            determinant_term_2 = -0.5 * np.log(shared_S_det)
            prior_term_2 = np.log(self.p[1])
            g2[i] = quad_term_2 + determinant_term_2 + prior_term_2
        # TODO: Fill in your code below!
        predictions = np.where(g1 > g2, 1, 2)
       

        # Step2: If g1>g2, choose class1, otherwise choose class 2, you can convert g1 and g2 
        # into your final predictions. 
        #
        # Ex: g1 = [0.1, 0.2, 0.4, 0.3], g2 = [0.3, 0.3, 0.3, 0.4], => predictions = [2, 2, 1, 2]

        return predictions



### Now test your implementation by evalutating the performance of C2 on test data

In [43]:
# Define the model with a Gaussian Discriminant function (class-independent covariance)
clf = GaussianDiscriminant_C2(2, d)
clf.fit(Xtrain,ytrain)

# Evaluate on test data
predictions = clf.predict(Xtest)
confusion_matrix = np.array([
    [sum((ytest==1) & (predictions==1)), sum((ytest==2) & (predictions==1))],
    [sum((ytest==1) & (predictions==2)), sum((ytest==2) & (predictions==2))]
])

print("Confusion Matrix for Gaussian Discriminant with class-independent covariance")
print(confusion_matrix)

Confusion Matrix for Gaussian Discriminant with class-independent covariance
[[103  17]
 [ 15 288]]


In [47]:
accuracy_C2_classifier = (103 + 288) / (103+17+15+288)

print("Accuracy of C2 classifier:", accuracy_C2_classifier)


Accuracy of C2 classifier: 0.9243498817966903


### Gaussian Discriminant C3

In [36]:
class GaussianDiscriminant_C3:
    """
    Classifier initialization
    
    Args
      k: number of classes (2 for this assignment)
      d: number of features; feature dimensions (13 for this assignment)
      
    """
    def __init__(self, k = 2, d = 13):
        self.m = np.zeros((k, d))        # m1 and m2, store in 2*8 matrices
        self.S = np.zeros((k, d, d))     # S1 and S2, store in 2*(8*8) matrices
        self.shared_S = np.zeros((d, d)) # the shared convariance S that will be used for both classes
        self.p = np.zeros(2)             # p1 and p2, store in dimension 2 vectors

    def fit(self, Xtrain, ytrain):
        """
        Compute the parameters for both classes based on the training data
        
        """
        # Step 1: Split the data into two parts based on the labels
        Xtrain1, Xtrain2 = splitData(Xtrain, ytrain)

        # Step 2: Compute the parameters for each class
        # m1, S1 for class1
        self.m[0,:] = computeMean(Xtrain1)
        self.S[0,:,:] = computeCov(Xtrain1)
        
        # m2, S2 for class2
        self.m[1,:]  = computeMean(Xtrain2)
        self.S[1,:,:] = computeCov(Xtrain2)
        
        # Priors for both class
        self.p = computePrior(ytrain)

        #
        # TODO: Fill in your code below! 
        # Step 3: Make covariance matrices diagonal by extracting the diagonal elements
        self.S[0, :, :] = np.diag(np.diag(self.S[0]))  # Keeping only the diagonal elements of S1
        self.S[1, :, :] = np.diag(np.diag(self.S[1]))  # Keeping only the diagonal elements of S2

        
        # Step 3: Compute the diagonal of S1 and S2, since we assume each feature is independent, 
        # and has non diagonal entries cast to 0
        #
        # [[1,2],[2,4]] => [[1,0],[0,4]], try np.diag() twice

        #
        # TODO: Fill in your code below! 
        self.shared_S = (self.p[0] * self.S[0]) + (self.p[1] * self.S[1])
        
        # Step 4: Compute the shared covariance matrix that is used for both class 
        # shared_S = p1 * S1 + p2 * S2

        
    def predict(self, Xtest):
        """
        Predict the labels for test data
        
        Input:
            Xtest: n*d
            
        Output:
            predictions: n (all entries will be either number 1 or 2 to denote the labels)
            
        """
        # Placeholders to store the predictions can be ignored, removed or replaced with any 
        # following implementations
        #predictions = np.zeros(Xtest.shape[0])

        #
        # TODO: Fill in your code below! 
        n = Xtest.shape[0]
    # Initialize arrays for storing discriminant function values
        g1 = np.zeros(n)  # Discriminant function for class 1
        g2 = np.zeros(n)  # Discriminant function for class 2
        
        # Step1: plug in the test data features and compute the discriminant functions for both 
        # classes (you need to choose the correct discriminant functions) you will finall get two 
        # list of discriminant values (g1, g2), both have the shape n (n is the number of Xtest).
        #
        # Please note here, currently we assume shared_S is a d*d diagonal matrix, the non-capital 
        # si^2 in the lecture formula will be the i-th entry on the diagonal.

        #
        # TODO: Fill in your code below! 
        # Extract diagonal elements of the shared covariance matrix (since it's diagonal)
        diag_shared_S = np.diag(self.shared_S)

        for i in range(n):
            x = Xtest[i, :]
            
            # Compute the discriminant function g(x) for both classes
            for c in range(2):  # Class 1 (c=0) and Class 2 (c=1)
                mean_vec = self.m[c, :]  # Mean vector of class c
                prior = self.p[c]  # Prior probability of class c

                # Compute discriminant function for diagonal covariance matrix
                variance_term = -0.5 * np.sum(((x - mean_vec) ** 2) / (diag_shared_S + 1e-12))
                determinant_term = -0.5 * np.sum(np.log(diag_shared_S + 1e-12))  # Log of determinant
                prior_term = np.log(prior + 1e-12)  # Log of prior probability
                
                g_x = variance_term + determinant_term + prior_term  # Final discriminant function value
                
                # Store the computed value
                if c == 0:
                    g1[i] = g_x  # Class 1 discriminant function
                else:
                    g2[i] = g_x  # Class 2 discriminant function

        # Step2: If g1>g2, choose class1, otherwise choose class 2, you can convert g1 and g2 into 
        # your final predictions
        #
        # Ex: g1 = [0.1, 0.2, 0.4, 0.3], g2 = [0.3, 0.3, 0.3, 0.4], => predictions = [2,2,1,2]
        predictions = np.where(g1 > g2, 1, 2)

        return predictions


### Now test your implementation by evalutating the performance of C3 on test data

In [37]:
# Define the model with a Gaussian Discriminant function (diagonal covariance)
clf = GaussianDiscriminant_C3(2, d)
clf.fit(Xtrain, ytrain)

# Evaluate on test data
predictions = clf.predict(Xtest)
confusion_matrix = np.array([
    [sum((ytest==1) & (predictions==1)),sum((ytest==2) & (predictions==1))],
    [sum((ytest==1) & (predictions==2)),sum((ytest==2) & (predictions==2))]
])

print("Confusion Matrix for Gaussian Discriminant with diagonal covariance")
print(confusion_matrix)

Confusion Matrix for Gaussian Discriminant with diagonal covariance
[[ 96  28]
 [ 22 277]]


In [45]:
accuracy_C3_classifier = (96 + 277) / (96 + 28 + 22 + 277)
print("Accuracy of C3 classifier:", accuracy_C3_classifier)


Accuracy of C3 classifier: 0.8817966903073287


### Q1: Which model performs the best? 

#The model C2 performs the best. 


### Q2: Why does this model perform the best?
#The model C2 has the lowest false positives among all the models. In case of false negative it has slighly higher false negatives than C1 but lower than C3. Moreover, C2 model has the highest accuracy(92.43%) as compared to C1(91.7%) and C3 (88.17%). C1 has the overfitting risk.Similarly, C3 oversimplifies the model by assuming that each feature is irrelevant to others. While the model C2 balances the bias and variances leading better generalizations. Instead of directly utilizing independent covariance matrices, C2 computes a covariance matrix of all the data. This smooths out variations, preventing overfitting. So I think C2 this model performs the best