# Part 4: Logistic

In this part, you will complete a Do-It-Yourself (DIY) implementation of binary logistic regression in an object-oriented pattern that corresponds with the Scikit-Learn API.

**Learning objectives.** You will:
1. Write object-oriented code for a Python class, matching standard API patterns.
2. Apply numerical Python (NumPy) to efficiently implement binary logistic regression, including code to fit the model to data using the gradient descent algorithm. 
3. Evaluate your implementation compared to the Scikit-Learn standard on synthetic data. 
4. Perform an ablation study on the impact of the learning rate hyperparameter for fitting a logistic regression model.

## Task 1

First, we will use Scikit-Learn to develop a baseline logistic regression model to which we can compare our DIY implementation. Run the following code to generate synthetic data for use in this part of the assignment. Observe that the predictive target is coded as 0 or 1, that the `sigmoid` function is defined for you, and that the code also splits the synthetic data into train and test sets for you.

Use Scikit-Learn to fit a [logistic regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#logisticregression) on the train set with the parameter setting `penalty = 'None'` (this will train a basic model without applying any regularization). Evaluate and report the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) of the model on both the train set and the test set.

In [1]:
# Run but do not modify this code

import numpy as np
from sklearn.model_selection import train_test_split

def sigmoid(z):
    return 1.0/(1.0 + np.exp(-z))

np.random.seed(2024)
n = 1000
features = 20

X = np.random.normal(size = (n, features))
weights = np.random.normal(size = features)
probs = sigmoid(X @ weights + np.random.normal(scale=0.01)) 
y = np.random.binomial(n=1, p=probs)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2024)

In [2]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score as acc
model=LogisticRegression(max_iter=100000, penalty=None)
model.fit(X_train,y_train)
train_probs=model.predict(X_train)
test_probs=model.predict(X_test)
print(f'train accuracy:{acc(y_train,train_probs)}')
print(f'test accuracy:{acc(y_test,test_probs)}')

train accuracy:0.8714285714285714
test accuracy:0.88


## Task 2

Complete the following class to implement binary logistic regression. Some important notes about the implementation:

1. Remember that the Scikit-Learn API treats an input `X` array, whether to `fit` or `predict`, as a matrix with a row for every data point and a column for every feature. 
2. For `fit`, every row in `X` corresponds to a given output in `y`, and  you don't need to return anything, just optimize the internal model weights (which should be stored as instance variables). For `predict_proba` and `predict`, you should return a NumPy array with one element (corresponding to a probability or a 0/1 value) for every row in the input `X`.
3. Remember that logistic regression models the probability of outputting `1` as a sigmoid of a linear function of features. This has several implications. One is that the number of weights in your model should equal the number of features equal to the number of columns of the `X` matrix passed to the `fit` method. We recommend that you initialize these weights as random normally distributed values, for example by using NumPy's [random.normal](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html). Another implication is that `predict_proba` should return the sigmoid activation of the [dot product](https://numpy.org/doc/stable/reference/generated/numpy.dot.html) (multiply element-wise then add together) of your model's weights and the given input. You do **not** need to include a bias term for this implementation.
4. For the `predict` function, you should use a simple thresholding of 0.5. That is, you should calculate the probabilities using the `predict_proba` method and return `1` if the probability is greater than 0.5 and `0` otherwise. You can assume that `y` will consist exclusively of `0`s or `1`s for the purpose of this implementation.
5. The `fit` method should implement gradient descent on the log-likelihood. For a given feature/weight dimension $j$ and a particular data point $x$ with label $y$, the partial derivative with respect to weight $w_j$ is $(a - y)x_j$ where $a$ is the activation (the predicted probability) assosciated with example $x$. For each feature, this quantity should be averaged over all training data. The vector of all such values forms the gradient $\vec{\nabla}$. The gradient descent learning update should then be $\vec{w'} = \vec{w} - \eta \vec{\nabla}$ where $\eta$ is the learning rate `lr` passed to the constructor, $\vec{w}$ are the previous weights and $\vec{w'}$ are the weights for the next iteration. The algorithm should proceed for `iters` iterations.
6. You will note the `fit` method takes an optional `verbose` parameter. While it is not required, we highly recommend that you include code in the `fit` method that, when `verbose` is `True`, provides additional logging or printing of information about the training process to help debug. 
7. The `pass` statements are syntactic placeholders that should be removed when you implement a method.
8. Finally, note that your implementation will be much more efficient if you use vectorized NumPy operations and avoid for loops or nested for loops over large amounts of data.

In [4]:
class BinaryLogisticRegression:
    def __init__(self, lr=0.1, iters=1000, random_state=2024, weights=[]):
        self.lr = lr
        self.iters = iters
        self.random_state = random_state
        self.weights=weights
        np.random.seed(2024)


    def sigmoid(self, z):
        return 1.0/(1.0 + np.exp(-z))


    def predict_proba(self, X):
        current_prob=sigmoid(np.dot(X,self.weights))
        return current_prob


    def predict(self, X):
        current_prob=self.predict_proba(X)
        prediction_array=(current_prob >= .5).astype(int)
        return prediction_array
        


    def fit(self, X, y, verbose=False):
        features=len(X[0])
        self.weights = np.random.normal(size = features)

        for i in range(self.iters):
            gradient=np.zeros(len(X[0]))
            for j in range(len(X)):
                current_prob=sigmoid(np.dot(X[j],self.weights))
                for k in range(len(X[0])):
                    partial_derivative=(current_prob-y[j])*(X[j][k])
                    gradient[k]+=partial_derivative
            gradient=gradient/len(X)
            self.weights-=self.lr*gradient

## Task 3

Use your DIY `BinaryLogisticRegression` class from task 2 to fit a logistic regression model on the train set as you did for the Scikit-Learn implementation in task 1. Use the default parameters (`lr=0.1, iters=1000, random_state=2024`). Evaluate and report the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) of your DIY model on both the train set and the test set. You should achieve similar performance compared to the Scikit-Learn implementation.

In [9]:
model=BinaryLogisticRegression()
model.fit(X_train,y_train)
model.predict_proba(X_test)
train_probs=model.predict(X_train)
test_probs=model.predict(X_test)
print(f'train accuracy:{acc(y_train,train_probs)}')
print(f'test accuracy:{acc(y_test,test_probs)}')

train accuracy:0.8671428571428571
test accuracy:0.8766666666666667


## Task 4

Perform an *ablation* study on the learning rate hyperparameter `lr`. Specifically, try fitting 20 different models using your DIY `BinaryLogisticRegression` implementation, trying every combination of parameter settings `[10, 1, 0.1, 0.01, 0.001]` for the learning rate `lr` and `[1, 5, 20, 100]` for the number of gradient descent iterations `iters`. For each combination, evaluate the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) of the model on the train set only (note that hyperparameters should never be selected using the test data).

Report all of your accuracies in a clearly labeled [Markdown table](https://www.markdownguide.org/extended-syntax/#tables). For readability, only report three digits per entry; for example, if the accuracy of a run is `0.8747340`, write `0.874` or `87.4` in the table. Explain which setting of `lr` you would use in this case and why.

In [16]:
lr=[10, 1, 0.1, 0.01, 0.001]
iter=[1, 5, 20, 100]
out=np.zeros((5, 4))
for i in range(len(lr)):
    for j in range(len(iter)):
        model=BinaryLogisticRegression(lr=lr[i],iters=iter[j])
        model.fit(X_train,y_train)
        train_probs=model.predict(X_train)
        out[i][j]=acc(y_train,train_probs)
print("| Learning Rate | Iterations | Accuracy |\n|---------------|------------|----------|")
for i in range(len(lr)):
    for j in range(len(iter)):
        accuracy = out[i][j]
        print(f"| {lr[i]:<13} | {iter[j]:<10} | {accuracy:.3f} |")

| Learning Rate | Iterations | Accuracy |
|---------------|------------|----------|
| 10            | 1          | 0.841 |
| 10            | 5          | 0.870 |
| 10            | 20         | 0.869 |
| 10            | 100        | 0.867 |
| 1             | 1          | 0.629 |
| 1             | 5          | 0.716 |
| 1             | 20         | 0.850 |
| 1             | 100        | 0.866 |
| 0.1           | 1          | 0.617 |
| 0.1           | 5          | 0.620 |
| 0.1           | 20         | 0.654 |
| 0.1           | 100        | 0.786 |
| 0.01          | 1          | 0.614 |
| 0.01          | 5          | 0.616 |
| 0.01          | 20         | 0.617 |
| 0.01          | 100        | 0.627 |
| 0.001         | 1          | 0.614 |
| 0.001         | 5          | 0.614 |
| 0.001         | 20         | 0.614 |
| 0.001         | 100        | 0.617 |


Both lr=10 and lr=1 are feasible learning rates because they have the highest growth in accuracy as you increase 