# ECE 285 Assignment 1: Logistic Regression

For this part of assignment, you are tasked to implement a logistic regression algorithm for multiclass classification and test it on the CIFAR10 dataset.

You sould run the whole notebook and answer the questions in the notebook.

TO SUBMIT: PDF of this notebook with all the required outputs and answers.


In [1]:
# Prepare Packages
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

from ece285.utils.data_processing import get_cifar10_data

# Use a subset of CIFAR10 for KNN assignments
dataset = get_cifar10_data(
    subset_train=5000,
    subset_val=250,
    subset_test=500,
)

print(dataset.keys())
print("Training Set Data  Shape: ", dataset["x_train"].shape)
print("Training Set Label Shape: ", dataset["y_train"].shape)
print("Validation Set Data  Shape: ", dataset["x_val"].shape)
print("Validation Set Label Shape: ", dataset["y_val"].shape)
print("Test Set Data  Shape: ", dataset["x_test"].shape)
print("Test Set Label Shape: ", dataset["y_test"].shape)


dict_keys(['x_train', 'y_train', 'x_val', 'y_val', 'x_test', 'y_test'])
Training Set Data  Shape:  (5000, 3072)
Training Set Label Shape:  (5000,)
Validation Set Data  Shape:  (250, 3072)
Validation Set Label Shape:  (250,)
Test Set Data  Shape:  (500, 3072)
Test Set Label Shape:  (500,)


# Logistic Regression for multi-class classification


A Logistic Regression Algorithm has 3 hyperparameters that you can experiment with:

- **Learning rate** - controls how much we change the current weights of the classifier during each update. We set it at a default value of 0.5, and later you are asked to experiment with different values. We recommend looking at the graphs and observing how the performance of the classifier changes with different learning rate.
- **Number of Epochs** - An epoch is a complete iterative pass over all of the data in the dataset. During an epoch we predict a label using the classifier and then update the weights of the classifier according the linear classifier update rule for each sample in the training set. We evaluate our models after every 10 epochs and save the accuracies, which are later used to plot the training, validation and test VS epoch curves.
- **Weight Decay** - Regularization can be used to constrain the weights of the classifier and prevent their values from blowing up. Regularization helps in combatting overfitting. You will be using the 'weight_decay' term to introduce regularization in the classifier.

The only way how a Logistic Regression based classification algorithm is different from a Linear Regression algorithm is that in the former we additionally pass the classifier outputs into a sigmoid function which squashes the output in the (0,1) range. Essentially these values then represent the probabilities of that sample belonging to class particular classes


### Implementation (40%)

You need to implement the Linear Regression method in `algorithms/logistic_regression.py`. You need to fill in the sigmoid function, training function as well as the prediction function.


In [2]:
# Import the algorithm implementation (TODO: Complete the Logistic Regression in algorithms/logistic_regression.py)
from ece285.algorithms import Logistic
from ece285.utils.evaluation import get_classification_accuracy

num_classes = 10  # Cifar10 dataset has 10 different classes

# Initialize hyper-parameters
learning_rate = 0.01  # You will be later asked to experiment with different learning rates and report results
num_epochs_total = 1000  # Total number of epochs to train the classifier
epochs_per_evaluation = 10  # Epochs per step of evaluation; We will evaluate our model regularly during training
N, D = dataset[
    "x_train"
].shape  # Get training data shape, N: Number of examples, D:Dimensionality of the data
weight_decay = 0.00002

x_train = dataset["x_train"].copy()
y_train = dataset["y_train"].copy()
x_val = dataset["x_val"].copy()
y_val = dataset["y_val"].copy()
x_test = dataset["x_test"].copy()
y_test = dataset["y_test"].copy()

# Insert additional scalar term 1 in the samples to account for the bias as discussed in class
x_train = np.insert(x_train, D, values=1, axis=1)
x_val = np.insert(x_val, D, values=1, axis=1)
x_test = np.insert(x_test, D, values=1, axis=1)

In [3]:
# Training and evaluation function -> Outputs accuracy data
def train(learning_rate_, weight_decay_):
    # Create a linear regression object
    logistic_regression = Logistic(
        num_classes, learning_rate_, epochs_per_evaluation, weight_decay_
    )

    # Randomly initialize the weights and biases
    weights = np.random.randn(num_classes, D + 1) * 0.0001

    train_accuracies, val_accuracies, test_accuracies = [], [], []

    # Train the classifier
    for _ in tqdm(range(int(num_epochs_total / epochs_per_evaluation))):
        # Train the classifier on the training data
        weights = logistic_regression.train(x_train, y_train, weights)

        # Evaluate the trained classifier on the training dataset
        y_pred_train = logistic_regression.predict(x_train)
        train_accuracies.append(get_classification_accuracy(y_pred_train, y_train))

        # Evaluate the trained classifier on the validation dataset
        y_pred_val = logistic_regression.predict(x_val)
        val_accuracies.append(get_classification_accuracy(y_pred_val, y_val))

        # Evaluate the trained classifier on the test dataset
        y_pred_test = logistic_regression.predict(x_test)
        test_accuracies.append(get_classification_accuracy(y_pred_test, y_test))

    return train_accuracies, val_accuracies, test_accuracies, weights


In [4]:
import matplotlib.pyplot as plt


def plot_accuracies(train_acc, val_acc, test_acc):
    # Plot Accuracies vs Epochs graph for all the three
    epochs = np.arange(0, int(num_epochs_total / epochs_per_evaluation))
    plt.ylabel("Accuracy")
    plt.xlabel("Epoch/10")
    plt.plot(epochs, train_acc, epochs, val_acc, epochs, test_acc)
    plt.legend(["Training", "Validation", "Testing"])
    plt.show()


In [None]:
# Run training and plotting for default parameter values as mentioned above
t_ac, v_ac, te_ac, weights = train(learning_rate, weight_decay)


  0%|          | 0/100 [00:00<?, ?it/s]

6.932540087676567
Epoch 0, Loss 6.932540087677183
3.5165692053317223
3.418425409793438
3.379101394425952
3.361897844478
3.3509052760428744
3.3415730152166385
3.3328384018436554
3.324462884507205
3.3163857671051002


  1%|          | 1/100 [00:02<04:33,  2.76s/it]

3.308584202611597
Epoch 0, Loss 3.3085842027676953
3.301043767757864
3.29375227560423
3.286698448162124
3.2798716142692306
3.273261627239394
3.2668588307396527
3.260654035657196
3.2546384996934883
3.2488039078298483


  2%|▏         | 2/100 [00:05<04:27,  2.73s/it]

3.2431423532717574
Epoch 0, Loss 3.2431423534727375
3.237646318823736
3.232308658726343
3.2271225810001427
3.2220816303373736
3.217179671573519
3.2124108737620802
3.207769694867507
3.2032508670839546
3.1988493827813627


  3%|▎         | 3/100 [00:08<04:23,  2.72s/it]

3.1945604810750643
Epoch 0, Loss 3.1945604813360906
3.1903796350108906
3.1863025393541786
3.182325098968354
3.178443417766507
3.174653788217783
3.1709526813891773
3.1673367375024783
3.163802756985636
3.16034769199756


  4%|▍         | 4/100 [00:10<04:23,  2.74s/it]

3.156968638405354
Epoch 0, Loss 3.1569686387348956
3.1536628281931547
3.150427622282046
3.1472605037409735
3.144159071369079
3.141121033630473
3.138144202923139
3.1352264901642855
3.132365899675222
3.1295605243495275


  5%|▌         | 5/100 [00:14<04:41,  2.97s/it]

3.12680854108899
Epoch 0, Loss 3.126808541491828
3.124108206492551
3.1214578527841517
3.1188558839661176
3.1163007721853657
3.113791054300396
3.111325328637651
3.1089022519264606
3.1065205364023503
3.1041789470690886


  6%|▌         | 6/100 [00:17<04:41,  2.99s/it]

3.101876299110355
Epoch 0, Loss 3.1018762995891733
3.099611455442449
3.0973833243999325
3.095190857546573
3.093033047604387
3.090908926494005
3.0888175634799806
3.0867580634150156
3.0847295650774558
3.0827312395967277


  7%|▋         | 7/100 [00:21<05:03,  3.26s/it]

3.080762288961696
Epoch 0, Loss 3.080762289517957
3.078821944607237
3.0769094660745764
3.0750241397412226
3.0731652776165657
3.071332216199438
3.06952431539417
3.0677409574818557
3.0659815461437536
3.0642455055339233


  8%|▊         | 8/100 [00:24<05:03,  3.30s/it]

3.062532279398374
Epoch 0, Loss 3.06253228003281
3.0608413302381443
3.0591721385139174
3.0575242018898723
3.0558970345146528
3.054290166337413
3.0527031424570605
3.0511355225028907
3.0495868800449415
3.0480568020324625


  9%|▉         | 9/100 [00:27<04:51,  3.21s/it]

3.046544888259015
Epoch 0, Loss 3.0465448889719102
3.0450507508527904
3.0435740137908036
3.0421143124357224
3.040671293094142
3.0392446125951857
3.037833937888387
3.0364389456598477
3.0350593219657473
3.033694761882303


 10%|█         | 10/100 [00:29<04:20,  2.90s/it]

3.0323449691713584
Epoch 0, Loss 3.0323449699627183
3.0310096559607986
3.029688542439053
3.0283813565629796
3.027087833778462
3.0258077167530835
3.024540755120299
3.0232867052345256
3.0220453299366175
3.0208163983292433


 11%|█         | 11/100 [00:33<04:42,  3.17s/it]

3.0195996855616505
Epoch 0, Loss 3.0195996864313033
3.0183949726234065
3.017202046146656
3.0160206982165048
3.014850726189142
3.0136919325173417
3.0125441245829854
3.011407114536297
3.0102807191414667
3.0091647596283706


 12%|█▏        | 12/100 [00:36<04:37,  3.15s/it]

3.008059061550118
Epoch 0, Loss 3.0080590624977797
3.006963454646144
3.005877772710613
3.004801853465885
3.0037355384408153
3.002678672853684
3.0016311054995404
3.0005926886417726
2.999563277907709
2.9985427321880986


 13%|█▎        | 13/100 [00:40<04:42,  3.24s/it]

2.997530913540266
Epoch 0, Loss 2.9975309145655795
2.9965276870948276
2.99553292096577
2.9945464861637894
2.993568256512722
2.992598108568959
2.9916359215437023
2.990681577227958
2.989734959920147


 14%|█▍        | 14/100 [00:42<04:28,  3.12s/it]

2.9887959563562236
2.9878644556422054
Epoch 0, Loss 2.98786445674477
2.986940349189013
2.9860235306495246
2.985113895857765
2.9842113427701262
2.9833157714085656
2.982427083805674
2.9815451839515617
2.9806699777424863
2.9798013729311443


 15%|█▌        | 15/100 [00:45<03:59,  2.82s/it]

2.978939279078586
Epoch 0, Loss 2.978939280257973
2.9780836075076556
2.97723427125794
2.9763911850421305
2.9755542652037783
2.97472342967636
2.9738985979436263
2.9730796910011854
2.9722666313192563
2.9714593428065723


 16%|█▌        | 16/100 [00:47<03:56,  2.81s/it]

2.970657750775382
Epoch 0, Loss 2.970657752031147
2.9698617819075017
2.9690713642214006
2.968286427040258
2.9675069009609794
2.966732717824119
2.965963810684693
2.9652001137838417
2.964441562521312
2.963688093428741


 17%|█▋        | 17/100 [00:52<04:47,  3.46s/it]

2.962939644143696
Epoch 0, Loss 2.962939645475388
2.9621961533844674
2.9614575609255627
2.960723807573907
2.9599948351456953
2.9592705864439046
2.9585510052364192
2.957836036234765
2.9571256250734255
2.9564197182897196


 18%|█▊        | 18/100 [00:55<04:32,  3.33s/it]

2.955718263304231
Epoch 0, Loss 2.9557182647113973
2.955021208401759
2.9543285027127837
2.9536400961954277
2.9529559396178913
2.952275984541357
2.9516001833033387
2.9509284890014666
2.9502608554776963
2.949597237302924


 19%|█▉        | 19/100 [00:58<04:24,  3.26s/it]

2.948937589761995
Epoch 0, Loss 2.9489375912441864
2.9482818688391
2.9476300312035373
2.946982034195843
2.9463378358142642
2.9456973947015737
2.9450606701322113
2.9444276219997456
2.9437982108046508


 20%|██        | 20/100 [01:02<04:33,  3.42s/it]

2.943172397642368
2.9425501441916753
Epoch 0, Loss 2.9425501457484473
2.941931412703333
2.941316165989
2.940704367410423
2.9400959808688847
2.939490970794895
2.9388893021381426
2.9382909403576636
2.937695851412258
2.937104001751125


 21%|██        | 21/100 [01:05<04:17,  3.26s/it]

2.9365153583047086
Epoch 0, Loss 2.9365153599356253
2.9359298884757687
2.935347560130648
2.93476834159074
2.9341922016241613
2.933619109437597
2.933049034668346
2.9324819473765382
2.9319178180375247
2.9313566175344423


 22%|██▏       | 22/100 [01:08<04:13,  3.25s/it]

2.9307983171509404
Epoch 0, Loss 2.9307983188555755
2.930242888564069
2.9296903038373245
2.9291405354138487
2.9285935561097696
2.928049339107697
2.9275078579503533
2.926969086534337
2.9264329991040303
2.9258995702456287


 23%|██▎       | 23/100 [01:11<03:52,  3.02s/it]

2.9253687748812993
Epoch 0, Loss 2.9253687766592362
2.9248405882634647
2.924314985969207
2.9237919438947846
2.923271438250271
2.922753445554299
2.922237942628917
2.9217249065945503
2.9212143148650673
2.9207061451429417


 24%|██▍       | 24/100 [01:14<03:53,  3.08s/it]

2.920200375414521
Epoch 0, Loss 2.9202003772653544
2.919696983945387
2.919195949275806
2.9186972502162782
2.9182008658431724
2.9177067754944455
2.917214958765453
2.916725395504836
2.916238065810497
2.915752950025643


 25%|██▌       | 25/100 [01:17<03:53,  3.11s/it]

2.9152700287349207
Epoch 0, Loss 2.9152700306582573
2.914789282760614
2.9143106931589244
2.9138342412163167
2.91335990844594
2.9128876765841167
2.9124175275868924
2.911949443626661
2.9114834070888485
2.9110194005686574


 26%|██▌       | 26/100 [01:21<04:05,  3.32s/it]

2.910557406867879
Epoch 0, Loss 2.9105574088633372
2.910097408991756
2.9096393901459163
2.9091833337333504
2.9087292233514543
2.9082770427891247
2.907826776023904
2.907378407219181
2.9069319207214446
2.906487301057583


 27%|██▋       | 27/100 [01:24<03:53,  3.19s/it]

2.906044532932233
Epoch 0, Loss 2.9060445349994444
2.905603601225181
2.905164490988805
2.9047271874455656
2.904291675985543
2.903857942164016
2.9034259716990847
2.902995750469329
2.902567264511526
2.9021405000183838


 28%|██▊       | 28/100 [01:27<03:54,  3.25s/it]

2.9017154433363337
Epoch 0, Loss 2.901715445474941
2.9012920809633496
2.9008703995468146
2.900450385881415
2.9000320269070783
2.8996153097069395
2.8992002215053496
2.8987867496659163
2.89837488168957
2.897964605212675


 29%|██▉       | 29/100 [01:30<03:45,  3.18s/it]

2.897555908005163
Epoch 0, Loss 2.8975559102148227
2.897148777968706
2.8967432031349096
2.896339171663547
2.8959366718408166
2.895535692077627
2.89513622090792
2.894738246987006
2.8943417590899436
2.893946746109929


 30%|███       | 30/100 [01:33<03:40,  3.15s/it]

2.8935531970567285
Epoch 0, Loss 2.893553199337109
2.893161101055123
2.892770447343382
2.8923812252717718
2.8919934243010648
2.891607034001103
2.891222044049356
2.890838444229524
2.890456224430147
2.8900753746432475


 31%|███       | 31/100 [01:36<03:33,  3.10s/it]

2.8896958849629883
Epoch 0, Loss 2.8896958873137706
2.889317745584354
2.8889409468018528
2.8885654790082347
2.888191332693241
2.887818498442358
2.887446966935604
2.8870767289463255
2.886707775340014
2.8863400970731483


 32%|███▏      | 32/100 [01:39<03:24,  3.01s/it]

2.885973685192042
Epoch 0, Loss 2.8859736876129194
2.8856085308317208
2.8852446252148085
2.8848819596504343
2.884520525533152
2.884160314341886
2.8838013176388797
2.8834435270686662
2.883086934357061
2.882731531310155


 33%|███▎      | 33/100 [01:42<03:08,  2.81s/it]

2.8823773098133363
Epoch 0, Loss 2.882377312304014
2.8820242618303165
2.8816723794021804
2.8813216546464404
2.880972079756116
2.8806236469988105
2.8802763487158236
2.879930177321253
2.879585125301128


 34%|███▍      | 34/100 [01:44<02:59,  2.72s/it]

2.8792411852125475
2.878898349682825
Epoch 0, Loss 2.8788983522430205
2.8785566114086625
2.8782159631553164
2.8778763977557906
2.8775379081100354
2.877200487184157
2.8768641280096374
2.8765288236825706
2.8761945673629046
2.8758613522736955


 35%|███▌      | 35/100 [01:47<02:56,  2.72s/it]

2.8755291717003773
Epoch 0, Loss 2.87552917432982
2.8751980189900257
2.8748678875506584
2.87453877085052
2.874210662417391
2.873883555837901
2.873557444756859
2.8732323228765773
2.872908183956223
2.8725850218111684


 36%|███▌      | 36/100 [01:49<02:49,  2.64s/it]

2.872262830312349
Epoch 0, Loss 2.87226283301078
2.871941603385635
2.8716213350112105
2.8713020192229592
2.8709836501078607
2.8706662218053927
2.8703497285069433
2.870034164455229
2.8697195239437248
2.869405801316096


 37%|███▋      | 37/100 [01:52<02:45,  2.63s/it]

2.8690929909656417
Epoch 0, Loss 2.8690929937328122
2.8687810873347472
2.8684700849143363
2.868159978243341
2.8678507619081683
2.867542430542184
2.867234978825194
2.8669284014829364
2.8666226932865855
2.8663178490522507


 38%|███▊      | 38/100 [01:55<02:56,  2.85s/it]

2.8660138636404917
Epoch 0, Loss 2.866013866476165
2.8657107319558386
2.865408448946313
2.8651070096029594
2.864806408959385
2.8645066420912997
2.8642077041160654
2.863909590192249
2.8636122955191867
2.8633158153365463


 39%|███▉      | 39/100 [02:00<03:32,  3.48s/it]

2.8630201449238997
Epoch 0, Loss 2.86302014782785
2.862725279600302
2.8624312147238666
2.862137945691364
2.861845467937804
2.861553776936039
2.861262868196366
2.860972737266136
2.860683379729363


 40%|████      | 40/100 [02:04<03:34,  3.57s/it]

2.860394791206347
2.8601069673532904
Epoch 0, Loss 2.8601069703253024
2.859819903861931
2.8595335964591717
2.8592480409067145
2.858963233000705
2.8586791685713764


In [None]:
plot_accuracies(t_ac, v_ac, te_ac)


### Try different learning rates and plot graphs for all (20%)


In [None]:
# Initialize the best values
best_weights = weights
best_learning_rate = learning_rate
best_weight_decay = weight_decay
best_v_ac = 0.0

# TODO
# Repeat the above training and evaluation steps for the following learning rates and plot graphs
# You need to try 3 learning rates and submit all 3 graphs along with this notebook pdf to show your learning rate experiments
learning_rates = []
weight_decay = 0.0  # No regularization for now

# FEEL FREE TO EXPERIMENT WITH OTHER VALUES. REPORT OTHER VALUES IF THEY ACHIEVE A BETTER PERFORMANCE

# for lr in learning_rates: Train the classifier and plot data
# Step 1. train_accu, val_accu, test_accu = train(lr, weight_decay)
# Step 2. plot_accuracies(train_accu, val_accu, test_accu)

learning_rate_info = {}
learning_rate_info = {}

for learning_rate in learning_rates:
    # TODO: Train the classifier with different learning rates and plot
    # pass
    t_ac, v_ac, te_ac, weights = train(learning_rate, weight_decay)
    learning_rate_info[learning_rate] = [t_ac, v_ac, te_ac, weights]
    
    if best_v_ac < max(v_ac):
        best_weights = weights
        best_learning_rate = learning_rate
        best_weight_decay = weight_decay
        best_v_ac = max(v_ac)


In [None]:
for i, learning_rate in enumerate(learning_rates):
    t_ac, v_ac, te_ac = learning_rate_info[learning_rate][0], learning_rate_info[learning_rate][1], learning_rate_info[learning_rate][2]
    plt.subplot(2, 3, i + 1)
    plt.title("Learning rate: {}".format(learning_rate))
    plot_accuracies(t_ac, v_ac, te_ac)

plt.show()


#### Inline Question 1.

Which one of these learning rates (best_lr) would you pick to train your model? Please Explain why.


#### Your Answer:


### Regularization: Try different weight decay and plots graphs for all (20%)


In [None]:
# Initialize a non-zero weight_decay (Regularization constant) term and repeat the training and evaluation
# Use the best learning rate as obtained from the above exercise, best_lr

# You need to try 3 learning rates and submit all 3 graphs along with this notebook pdf to show your weight decay experiments
weight_decays = [0.0001, 0.001, 0.01]

# FEEL FREE TO EXPERIMENT WITH OTHER VALUES. REPORT OTHER VALUES IF THEY ACHIEVE A BETTER PERFORMANCE

# for weight_decay in weight_decays: Train the classifier and plot data
# Step 1. train_accu, val_accu, test_accu = train(best_lr, weight_decay)
# Step 2. plot_accuracies(train_accu, val_accu, test_accu)

weight_decay_info = {}

for weight_decay in weight_decays:
    # TODO: Train the classifier with different weighty decay and plot
    # pass
    t_ac, v_ac, te_ac, weights = train(best_learning_rate, weight_decay)
    weight_decay_info[weight_decay] = [t_ac, v_ac, te_ac, weights]
    
    if best_v_ac < max(v_ac):
        best_weights = weights
        best_learning_rate = learning_rate
        best_weight_decay = weight_decay
        best_v_ac = max(v_ac)
                                       

In [None]:
for i, weight_decay in enumerate(weight_decays):
    t_ac, v_ac, te_ac = weight_decay_info[weight_decay][0], weight_decay_info[weight_decay][1], weight_decay_info[weight_decay][2]
    plt.subplot(2, 3, i + 1)
    plt.title("Weight decay: {}".format(weight_decay))
    plot_accuracies(t_ac, v_ac, te_ac)

plt.show()

#### Inline Question 2.

Discuss underfitting and overfitting as observed in the 5 graphs obtained by changing the regularization.
Which weight_decay term gave you the best classifier performance?
HINT: Do not just think in terms of best training set performance, keep in mind that the real utility of a machine learning model is when it performs well on data it has never seen before


#### Your Answer:


### Visualize the filters (10%)


In [None]:
# These visualizations will only somewhat make sense if your learning rate and weight_decay parameters were
# properly chosen in the model. Do your best.

# TODO: Run this cell and Show filter visualizations for the best set of weights you obtain.
# Report the 2 hyperparameters you used to obtain the best model.

# NOTE: You need to set `best_learning_rate` and `best_weight_decay` to the values that gave the highest accuracy
print("Best LR:", best_learning_rate)
print("Best Weight Decay:", best_weight_decay)

# NOTE: You need to set `best_weights` to the weights with the highest accuracy
w = best_weights[:, :-1]
w = w.reshape(10, 3, 32, 32).transpose(0, 2, 3, 1)

w_min, w_max = np.min(w), np.max(w)

fig = plt.figure(figsize=(16, 16))
classes = [
    "plane",
    "car",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck",
]
for i in range(10):
    fig.add_subplot(2, 5, i + 1)

    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[i, :, :, :].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype(int))
    plt.axis("off")
    plt.title(classes[i])
plt.show()


#### Inline Question 3. (10%)

a. Compare and contrast the performance of the 2 classifiers i.e. Linear Regression and Logistic Regression.
b. Which classifier would you deploy for your multiclass classification project and why?


#### Your Answer:
