# Model Benchmarking
---

## `SlowLogisticRegression` Class

The snippet below illustrates the typical usage of the `SlowLogisticRegression` class provided by this implementation using the [breast cancer dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html).

In [6]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score
from regression.SlowLogisticRegression import SlowLogisticRegression
from utilities.Stopwatch import Stopwatch

# Loading the breast cancer dataset.
data = load_breast_cancer()
X, y = data.data, data.target

# Measuring time performance.
training_stopwatch = Stopwatch()
training_stopwatch.start()

# Splitting the data into training and test sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scaling the features.
scaler = MinMaxScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Initializing and training the model.
logistic_regression_model = SlowLogisticRegression(learning_rate=0.1, epochs=5000)
logistic_regression_model.fit(X_train, y_train, False)

training_time = training_stopwatch.stop()
prediction_stopwatch = Stopwatch()
prediction_stopwatch.start()

# Making predictions on the test set.
predictions = logistic_regression_model.predict(X_test)

# Evaluating model.
prediction_time = prediction_stopwatch.stop()
accuracy = accuracy_score(y_test, predictions)

print(f"\nSlow Logistic Regression Accuracy: {accuracy * 100:.2f}%")
print(f"Training Time: {training_time:.5f} seconds")
print(f"Prediction Time: {prediction_time:.5f} seconds")


Slow Logistic Regression Accuracy: 97.37%
Training Time: 1.24048 seconds
Prediction Time: 0.00028 seconds


## SciKit-Learn's `LogisticRegression` Class

The snippet below evaluates SciKit-Learn's `LogisticRegression` class. 

In [13]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from utilities.Stopwatch import Stopwatch

data = load_breast_cancer()
X = data.data
y = data.target

training_stopwatch = Stopwatch()
training_stopwatch.start()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LogisticRegression(solver="liblinear")
model.fit(X_train, y_train)

training_time = training_stopwatch.stop()
prediction_stopwatch = Stopwatch()
prediction_stopwatch.start()

predictions = model.predict(X_test)

prediction_time = prediction_stopwatch.stop()
accuracy = accuracy_score(y_test, predictions)

print(f"SciKit-Learn's Accuracy: {accuracy * 100:.2f}%")
print(f"Training Time: {training_time:.5f} seconds")
print(f"Prediction Time: {prediction_time:.5f} seconds")

SciKit-Learn's Accuracy: 97.37%
Training Time: 0.00778 seconds
Prediction Time: 0.00082 seconds


# `SlowLogisticRegression` Class with `BasicOptimizedSampler`

The snippet below leverages a simplified version of the paper's proposed sampling algorithm. 

In [28]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score
from regression.SlowLogisticRegression import SlowLogisticRegression
from optimized_sampling.BasicLeverageScoresSampler import BasicLeverageScoresSampler
from utilities.Stopwatch import Stopwatch

# Loading the breast cancer dataset.
data = load_breast_cancer()
X, y = data.data, data.target

# Measuring time performance.
training_stopwatch = Stopwatch()
training_stopwatch.start()

# Sampling the data using BasicOptimizedSampler.
sampler = BasicLeverageScoresSampler()
X_sampled, y_sampled = sampler.sample(X, y, 0.2) # Sampling 20% of the data.

# Splitting the data into training and test sets.
X_train, X_test, y_train, y_test = train_test_split(X_sampled, y_sampled, test_size=0.2, random_state=42)

# Scaling the features.
scaler = MinMaxScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Initializing and training the model.
logistic_regression_model = SlowLogisticRegression(learning_rate=0.1, epochs=5000)
logistic_regression_model.fit(X_train, y_train, False)

training_time = training_stopwatch.stop()
prediction_stopwatch = Stopwatch()
prediction_stopwatch.start()

# Making predictions on the test set.
predictions = logistic_regression_model.predict(X_test)

# Evaluating model.
prediction_time = prediction_stopwatch.stop()
accuracy = accuracy_score(y_test, predictions)

print(f"\nSlow Logistic Regression with Sampled Data Accuracy: {accuracy * 100:.2f}%")
print(f"Training Time: {training_time:.5f} seconds")
print(f"Prediction Time: {prediction_time:.5f} seconds")


Slow Logistic Regression with Sampled Data Accuracy: 95.65%
Training Time: 0.20064 seconds
Prediction Time: 0.00008 seconds


# SciKit-Learn's `LogisticRegression` Class with `BasicOptimizedSampler`

The snippet below evaluates SciKit-Learn's `LogisticRegression` class with training data sampled with `BasicOptimizedSampler`.

In [20]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from optimized_sampling.BasicLeverageScoresSampler import BasicLeverageScoresSampler
from sklearn.metrics import accuracy_score
from utilities.Stopwatch import Stopwatch

data = load_breast_cancer()
X = data.data
y = data.target

training_stopwatch = Stopwatch()
training_stopwatch.start()

sampler = BasicLeverageScoresSampler()
X_sampled, y_sampled = sampler.sample(X, y, 0.2) # Sampling 20% of the data.

X_train, X_test, y_train, y_test = train_test_split(X_sampled, y_sampled, test_size=0.2, random_state=42)

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LogisticRegression(solver="liblinear")
model.fit(X_train, y_train)

training_time = training_stopwatch.stop()
prediction_stopwatch = Stopwatch()
prediction_stopwatch.start()

predictions = model.predict(X_test)

prediction_time = prediction_stopwatch.stop()
accuracy = accuracy_score(y_test, predictions)

print(f"SciKit-Learn's accuracy: {accuracy * 100:.2f}%")
print(f"Training Time: {training_time:.5f} seconds")
print(f"Prediction Time: {prediction_time:.5f} seconds")

SciKit-Learn's accuracy: 86.96%
Training Time: 0.00592 seconds
Prediction Time: 0.00034 seconds
