# Model Benchmarking
---

## Our `LogisticRegression` Class

The snippet below illustrates the typical usage of the LogisticRegression class provided by this implementation using the [breast cancer dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html).

In [34]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score
from regression.LogisticRegression import LogisticRegression
from utilities.Stopwatch import Stopwatch

# Loading the breast cancer dataset.
data = load_breast_cancer()
X, y = data.data, data.target

# Measuring time performance.
stopwatch = Stopwatch()
stopwatch.start()

# Splitting the data into training and test sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scaling the features.
scaler = MinMaxScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Initializing and training the model.
logistic_regression_model = LogisticRegression(learning_rate=0.1, epochs=5000)
logistic_regression_model.fit(X_train, y_train, False)

# Making predictions on the test set.
predictions = logistic_regression_model.predict(X_test)

# Evaluating model.
time = stopwatch.stop()
accuracy = accuracy_score(y_test, predictions)

print(f"\nOur accuracy: {accuracy * 100:.2f}%")
print(f"Time taken: {time:.2f} seconds")


Our accuracy: 97.37%
Time taken: 0.97 seconds


## SciKit-Learn's `LogisticRegression` Class

The snippet below evaluates SciKit-Learn's `LogisticRegression` class. 

In [6]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from utilities.Stopwatch import Stopwatch

data = load_breast_cancer()
X = data.data
y = data.target

stopwatch = Stopwatch()
stopwatch.start()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LogisticRegression(solver="liblinear")
model.fit(X_train, y_train)

predictions = model.predict(X_test)

time = stopwatch.stop()
accuracy = accuracy_score(y_test, predictions)

print(f"SciKit-Learn's accuracy: {accuracy * 100:.2f}%")
print(f"Time taken: {time:.8f} seconds")

SciKit-Learn's accuracy: 97.37%
Time taken: 0.00800089 seconds


# Our `LogisticRegression` Class with `BasicOptimizedSampler`

The snippet below leverages a simplified version of the paper's proposed sampling algorithm. 

In [40]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score
from regression.LogisticRegression import LogisticRegression
from optimized_sampling.BasicLeverageScoresSampler import BasicLeverageScoresSampler
from utilities.Stopwatch import Stopwatch

# Loading the breast cancer dataset.
data = load_breast_cancer()
X, y = data.data, data.target

# Measuring time performance.
stopwatch = Stopwatch()
stopwatch.start()

# Sampling the data using BasicOptimizedSampler.
sampler = BasicLeverageScoresSampler()
X_sampled, y_sampled = sampler.sample(X, y, 0.2) # Sampling 20% of the data.

# Splitting the data into training and test sets.
X_train, X_test, y_train, y_test = train_test_split(X_sampled, y_sampled, test_size=0.2, random_state=42)

# Scaling the features.
scaler = MinMaxScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Initializing and training the model.
logistic_regression_model = LogisticRegression(learning_rate=0.1, epochs=5000)
logistic_regression_model.fit(X_train, y_train, False)

# Making predictions on the test set.
predictions = logistic_regression_model.predict(X_test)

# Evaluating model.
time = stopwatch.stop()
accuracy = accuracy_score(y_test, predictions)

print(f"\nOur accuracy: {accuracy * 100:.2f}%")
print(f"Time taken: {time:.2f} seconds")


Our accuracy: 95.65%
Time taken: 0.21 seconds


# SciKit-Learn's `LogisticRegression` Class with `BasicOptimizedSampler`

The snippet below evaluates SciKit-Learn's `LogisticRegression` class with training data sampled with `BasicOptimizedSampler`.

In [26]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from optimized_sampling.BasicLeverageScoresSampler import BasicLeverageScoresSampler
from sklearn.metrics import accuracy_score
from utilities.Stopwatch import Stopwatch

data = load_breast_cancer()
X = data.data
y = data.target

stopwatch = Stopwatch()
stopwatch.start()

sampler = BasicLeverageScoresSampler()
X_sampled, y_sampled = sampler.sample(X, y, 0.2) # Sampling 20% of the data.

X_train, X_test, y_train, y_test = train_test_split(X_sampled, y_sampled, test_size=0.2, random_state=42)

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LogisticRegression(solver="liblinear")
model.fit(X_train, y_train)

predictions = model.predict(X_test)

time = stopwatch.stop()
accuracy = accuracy_score(y_test, predictions)

print(f"SciKit-Learn's accuracy: {accuracy * 100:.2f}%")
print(f"Time taken: {time:.8f} seconds")

SciKit-Learn's accuracy: 95.65%
Time taken: 0.00628617 seconds
