## Appendix-2: Logistic Regression

Logistic Regression is a binary classification algorithm that can be extended to multiclass classification by using one-vs-rest or softmax. It is a linear model that learns a weight vector and a bias term to map the input features to a probability score. 

For our implementation, we used scikit-learn's LogisticRegression classifier with the default L2 regularization.

In [1]:
# import library dependencies
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt
import joblib

#### Import Data

In [2]:
ROOT_PATH='../'

In [3]:
# function to open pickle file
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

In [4]:
# store each pickle files in individual batches
batch1 = unpickle(ROOT_PATH+"cifar-10-batches-py/data_batch_1")
batch2 = unpickle(ROOT_PATH+"cifar-10-batches-py/data_batch_2")
batch3 = unpickle(ROOT_PATH+"cifar-10-batches-py/data_batch_3")
batch4 = unpickle(ROOT_PATH+"cifar-10-batches-py/data_batch_4")
batch5 = unpickle(ROOT_PATH+"cifar-10-batches-py/data_batch_5")
test_batch = unpickle(ROOT_PATH+"cifar-10-batches-py/test_batch")

In [5]:
# function to create labels and images from data
def load_data0(btch):
    labels = btch[b'labels']
    imgs = btch[b'data'].reshape((-1, 32, 32, 3))
    
    res = []
    for ii in range(imgs.shape[0]):
        img = imgs[ii].copy()
        img = np.fliplr(np.rot90(np.transpose(img.flatten().reshape(3,32,32)), k=-1))
        res.append(img)
    imgs = np.stack(res)
    return labels, imgs

In [6]:
# function to load data into training and test set
def load_data():
    x_train_l = []
    y_train_l = []
    for ibatch in [batch1, batch2, batch3, batch4, batch5]:
        labels, imgs = load_data0(ibatch)
        x_train_l.append(imgs)
        y_train_l.extend(labels)
    x_train = np.vstack(x_train_l)
    y_train = np.vstack(y_train_l)
    
    x_test_l = []
    y_test_l = []
    labels, imgs = load_data0(test_batch)
    x_test_l.append(imgs)
    y_test_l.extend(labels)
    x_test = np.vstack(x_test_l)
    y_test = np.vstack(y_test_l)
    return (x_train, y_train), (x_test, y_test)

#### Preprocess Data

In [7]:
# create training and test set
(x_train, y_train), (x_test, y_test) = load_data()

In [8]:
print('x_train shape:', x_train.shape)
print('y_train shape:', y_train.shape)
print('x_test shape:', x_test.shape)
print('y_test shape:', y_test.shape)

x_train shape: (50000, 32, 32, 3)
y_train shape: (50000, 1)
x_test shape: (10000, 32, 32, 3)
y_test shape: (10000, 1)


In [9]:
print(x_train.shape[0], 'train samples (x)')
print(y_train.shape[0], 'train samples (y)')

50000 train samples (x)
50000 train samples (y)


In [10]:
print(x_test.shape[0], 'test samples (x)')
print(y_test.shape[0], 'test samples (y)')

10000 test samples (x)
10000 test samples (y)


In [11]:
# Flatten the images
X_train = x_train.reshape(x_train.shape[0], -1)
X_test = x_test.reshape(x_test.shape[0], -1)

In [12]:
# Normalize the data
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

In [13]:
# Reshape y_train and y_test to 1d arrays
y_train = y_train.ravel()
y_test = y_test.ravel()

In [14]:
# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)

#### Define model and train

In [15]:
# Define the model
lr = LogisticRegression(solver='saga', multi_class='multinomial', verbose=1, max_iter=1000, n_jobs=-1)

In [16]:
# Train the model
lr.fit(X_train, y_train)

[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.


Epoch 1, change: 1.00000000
Epoch 2, change: 0.31112364
Epoch 3, change: 0.17446972
Epoch 4, change: 0.13395590
Epoch 5, change: 0.10465156
Epoch 6, change: 0.08240297
Epoch 7, change: 0.07107364
Epoch 8, change: 0.06299022
Epoch 9, change: 0.05496485
Epoch 10, change: 0.04876496
Epoch 11, change: 0.04347889
Epoch 12, change: 0.03975073
Epoch 13, change: 0.03597004
Epoch 14, change: 0.03368282
Epoch 15, change: 0.03095332
Epoch 16, change: 0.02869766
Epoch 17, change: 0.02641392
Epoch 18, change: 0.02535186
Epoch 19, change: 0.02388932
Epoch 20, change: 0.02200129
Epoch 21, change: 0.02163129
Epoch 22, change: 0.02128161
Epoch 23, change: 0.02059028
Epoch 24, change: 0.01865203
Epoch 25, change: 0.01795270
Epoch 26, change: 0.01607023
Epoch 27, change: 0.01583829
Epoch 28, change: 0.01565392
Epoch 29, change: 0.01546345
Epoch 30, change: 0.01513787
Epoch 31, change: 0.01497226
Epoch 32, change: 0.01479146
Epoch 33, change: 0.01458597
Epoch 34, change: 0.01440972
Epoch 35, change: 0.014

Epoch 550, change: 0.0max_iter reached after 3547 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed: 59.1min finished


#### Save and load model

In [17]:
# Save the model to a file
joblib.dump(lr, 'logistic_regression_model_saga.sav')

['logistic_regression_model_saga.sav']

In [18]:
# Load the saved model from a file
loaded_model = joblib.load('logistic_regression_model_saga.sav')

#### Evaluate the model

In [19]:
# Evaluate the model
train_acc = accuracy_score(y_train, loaded_model.predict(X_train))
val_acc = accuracy_score(y_val, loaded_model.predict(X_val))
test_acc = accuracy_score(y_test, loaded_model.predict(X_test))

print(f"Train Accuracy: {train_acc:.4f}")
print(f"Val Accuracy: {val_acc:.4f}")
print(f"Test Accuracy: {test_acc:.4f}")

Train Accuracy: 0.5047
Val Accuracy: 0.3848
Test Accuracy: 0.3864


In [20]:
# Calculate precision, recall, and F1 score on test set
test_pred = loaded_model.predict(X_test)
precision = precision_score(y_test, test_pred, average='weighted')
recall = recall_score(y_test, test_pred, average='weighted')
f1 = f1_score(y_test, test_pred, average='weighted')

print(f"Test Precision: {precision:.4f}")
print(f"Test Recall: {recall:.4f}")
print(f"Test F1 score: {f1:.4f}")

Test Precision: 0.3833
Test Recall: 0.3864
Test F1 score: 0.3843
