# **Week 2 : Linear Classifiers and Logistic Regression**

In this week we shall explore our first useful machine learning model, namely the linear classifier. The resources for learning about the same can be found [here](https://github.com/Ihsoj-Mahos/WiDS-Week2/tree/master/resources). The objective for this assignment will be to design a linear classifier in order to distinguish a labelled dataset of red and blue points in the $\mathbb{R}^2$ space. In particular, this assignment deals with the binary classification problem.


<img src="https://stanford.edu/~shervine/teaching/cs-221/illustrations/linear-classifier.png?79f320ac5ba3e9d5dae2c573007dbfb6"
 style="float:center;width:200px;height:200px;">

# **Importing Libraries**

In [1]:
# Import Libraries here
import numpy as np
import matplotlib.pyplot as plt
import matplotlib

# **Dataset Generation**

First, we will generate a dataset for which the classification problem needs to be solved. In this dataset, each element will be a vector and each label will a either 0 or 1, where 0 corresponds to red and 1 corresponds to blue. You can observe that the data can't be partitioned into two halves exactly. You don't have to edit the cells below.

The data is in the form of a $2 \times n$ matrix, where each column has a pair of $(x,y)$ values. The data is stored in the variable `data`, whereas the labels, a $1 \times n$ matrix, is stored in the variable `labels` and corresponds to the label {$0,1$} of each point in the data.

Note that you don't even have to look at our data generation process, it's fine if you do, but don't waste time on it :)

In [2]:
# Random number generator
rng = np.random.default_rng(seed = 1)

y_positive = np.abs(rng.normal(0,1,5000)*20)
labels = rng.binomial(1,0.95,5000)
labels_positive = labels
x_positive = rng.normal(0,1,5000)*20

y_negative = -1*np.abs(rng.normal(0,1,5000)*20)
labels_negative = 1-labels_positive
x_negative = rng.normal(0,1,5000)*20

In [4]:
x = np.concatenate((x_positive, x_negative))
y = np.concatenate((y_positive, y_negative))
labels = np.concatenate((labels_positive, labels_negative))
data_and_labels = np.vstack((x,y,labels))

In [5]:
shuffled = data_and_labels[:, np.random.permutation(data_and_labels.shape[1])]
x = shuffled[:1,:]
y = shuffled[1:2,:]
labels = shuffled[2:3,:]

In [6]:
theta = np.pi/6

rot_matrix = np.array([[np.cos(theta),-1*np.sin(theta)],[np.sin(theta), np.cos(theta)]])

In [7]:
data = shuffled[:2,:]
data = rot_matrix@data
X_plot = data[:1,:]
y_plot = data[1:2,:] + 5

X = np.vstack((X_plot,y_plot))
y = labels


In [None]:
fig = plt.figure(figsize=(20,10))

colors = ['blue','red']

plt.scatter(X_plot, y_plot, c=labels, cmap=matplotlib.colors.ListedColormap(colors), s=5)
plt.show()

# **Binary Classifier**

Now that we have the dataset variable loaded, let's construct the binary classifier using logistic regression. That is we need to estimate the parameters W and b where, 

\begin{equation}
z = Wx + b \\ 
a = \sigma(z) = \frac{1}{1+e^{-z}}\\ 
L(a, y) = -(y.log(a) + (1-y).log(1-a))
\end{equation}

In machine-learning terminology, the function $\sigma(z)$ is called an [activation function](https://en.wikipedia.org/wiki/Activation_function). Activation functions will be covered next week :)

# **Gradient Descent**

Here, you have to implement the gradient descent algorithm for the binary classifier and return the parameters W, b. 

**Bonus (Optional)** : Plot the loss function as a function of the number of iterations.

In [17]:
def grad_descent(X, y, num_iter = 1000, lr = 0.01) : 

    # INSERT CODE BELOW
    # Initialize the parameters
    W = np.zeros((2, 1))

    #W = np.zeros((X.shape[0], 1))
    b = 0
    
    # Initialize an empty list to store the loss values for each iteration
    loss_values = []
    
    for i in range(num_iter):
        # Compute the dot product of the input features and the parameters
        z = np.dot( W.T,X) + b
        
        # Apply the sigmoid function to get the predicted probability
        a = 1 / (1 + np.exp(-z))
        
        # Compute the loss
        loss = - (y * np.log(a) + (1 - y) * np.log(1 - a))
        
        # Compute the gradient of the loss with respect to the parameters
        dW = (1 / X.shape[1]) * np.dot(X, (a - y).T)
        db = (1 / X.shape[1]) * np.sum(a - y)
        
        # Update the parameters
        W = W - lr * dW
        b = b - lr * db
        
        # Append the current loss value to the list of loss values
        loss_values.append(np.mean(loss))
        
   
    
    # INSERT CODE ABOVE

    return (W, b)

# **Plotting the decision boundary**

Given the labelled dataset and the parameters W, b, plot the decision boundary along with the dataset (with the appropriate coloring).

In [27]:
def plot(X, y, W, b):
    # INSERT CODE ABOVE
    # Plot the input data
    plt.scatter(X[0,:], X[1,:], c=y.reshape(-1), s=40, cmap=plt.cm.Spectral)

    # Create a mesh of points over the input space 
    x_min, x_max = X[0,:].min() - .5, X[0,:].max() + .5
    y_min, y_max = X[1,:].min() - .5, X[1,:].max() + .5
    h = 0.01
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
     # Flatten the grid of points and stack them with a column of ones
    input_points = np.c_[xx.ravel(), yy.ravel()]
    input_points = np.concatenate((input_points, np.ones((input_points.shape[0], 1))), axis=1)

    # Compute the predicted class labels for each point in the mesh
    z = np.dot(W.T,input_points) + b
    # Apply the sigmoid function to get the predicted probability
    a = 1 / (1 + np.exp(-z))
    # Convert the predicted probabilities to class labels
    # Reshape the class labels back to the grid shape
    predicted_labels = np.round(a).reshape(xx.shape)

    # Plot the decision boundary
    plt.contourf(xx, yy, predicted_labels, cmap=plt.cm.Spectral)
    plt.show()
    return

# **Accuracy**

Now, let us calculate the accuracy, i.e. percentage of points classified correctly by the classifier given as :

\begin{equation}
\text{accuracy} = 100 * \frac{\text{Correctly classified points}}{\text{Total Number of Points}}
\end{equation}


For this purpose, we will define two functions, for predicting the labels and accuracy.

In [11]:
def predict(X, y, W, b) : 
    '''
    Inputs 
    -> X : A numpy array of vectors denoting positions of points
    -> y : A numpy array containing labels
    -> W, b : Parameters for the model

    Returns : 
    -> A numpy array containing predicted labels for the dataset using the classifier model.
    -> Make sure that the dimensions of the input y and the output "preds" are the same.
    '''

    # INSERT CODE BELOW

     # Compute the dot product of the input features and the parameters
    z = np.dot(W.T, X) + b
    
    # Apply the sigmoid function to get the predicted probability
    a = 1 / (1 + np.exp(-z))
    
    # Convert the predicted probabilities to class labels
    preds = np.round(a)
    
    # Assert that the dimensions of the input y and the output "preds" are the same

    # INSERT CODE ABOVE

    assert(preds.shape == y.shape)

    return preds

For the accuracy function, we need to return the accuracy as described by the equation above

In [19]:
def accuracy(X, y, preds) : 
    '''
    Inputs 
    -> X : A numpy array of vectors denoting positions of points
    -> y : A numpy array containing labels
    -> preds : Predicted labels by the model

    Returns : 
    -> A floating point number denoting the % accuracy of the model
    '''

    # INSERT CODE BELOW
    #calculate the accuracy of the classifier by dividing the number of 
    #correctly classified points by the total number of points
    ''' it compares the predicted labels with the true labels using the 
    == operator, this will give you an array of boolean values, where
    True means that the prediction is correct, and False means that the
    prediction is incorrect. Then it takes the mean of this boolean array,
    which will give you the fraction of correct predictions, and 
    finally, it multiplies this fraction by 100 to get the 
    accuracy as a percentage.'''
    accuracy = (preds == y).mean() * 100


    # INSERT CODE ABOVE

    return accuracy

# **Combining the functions**

Now, we are done with all the elements we need. Let's combine them into a program. (Feel free to edit the number of iterations to observe how the line changes)

In [None]:
# The number of iterations
num_iter = 2000

W, b = grad_descent(X, y, num_iter)

plot(X, y, W, b)
predictions = predict(X, y, W, b)
result = accuracy(X, y, predictions)

print(f'The accuracy of the model is : ', result, ' %')

# **Bonus (Optional)**

We need to examine how well the model does depending upon the number of iterations, size of input data. So you have to plot the accuracy of the model as the number of iterations varies. This part of the assignment is open-ended and you can choose the sampling of the number of iterations, you can try varying the size of the input data, maybe even a combination of the two! We have not provided a template for this part so you can choose the plotting scheme.

In [None]:
# BONUS PART
'''defined an array num_iterations containing a few values of the
 number of iterations. Then, I use a for loop to iterate over this array,
 call the grad_descent() function with each value of num_iter, and append 
 the accuracy of the model to the accuracies list.'''
 
num_iterations = [100, 500, 1000, 2000, 5000]
accuracies = []

for i in num_iterations:
    W, b = grad_descent(X, y, num_iter=i, lr=0.01)
    preds = predict(X, y, W, b)
    acc = accuracy(X, y, preds)
    accuracies.append(acc)
#create a line plot of the accuracy as a function of the number of iterations.
plt.plot(num_iterations, accuracies)
plt.xlabel("Number of iterations")
plt.ylabel("Accuracy (%)")
plt.show()


# **Submission Instructions**

Upload this notebook on your github classroom repository by the name Week2.ipynb