# Performance Playground

* In this notebook, we play around with some sample code to see if we can speedup training, hyper-param training for our classification task. 
* The reason for this experimentation is that free version of `Google Colab` seems to take hours for hyper-param training and cross validation and even after waiting 4 5 hours the run time sometimes get disconnecting loosing all the progress we made
* Paid `Google Colab` instances is an option, but since this is a learning experience we want to limit to free resources for now. 

## Imports

In [1]:
import os
import gdown
import pandas as pd
from pathlib import Path

from sklearn.preprocessing import Binarizer, OneHotEncoder, MinMaxScaler, StandardScaler, FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score, cross_val_predict,GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import loguniform


from sklearn.metrics import ConfusionMatrixDisplay, f1_score, roc_auc_score, roc_curve, accuracy_score
from sklearn.dummy import DummyClassifier
from time import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import joblib
import json

## Check Number of Cores

In [2]:
os.cpu_count()

24

* We do have 12 cores on current machine as compared to 2 cores on default Google Colab machine. 

## Download Data

In [8]:
FILE_ID = "1e706WL5BRpjdb7EMZremIFLtA0lSuN8O"
file_url = f"https://drive.google.com/uc?id={FILE_ID}"
data_dir = str(Path("..", "data", "mnist_train_set.csv"))   

## download training data to data directory
gdown.download(file_url, data_dir, quiet=False)

Downloading...
From: https://drive.google.com/uc?id=1e706WL5BRpjdb7EMZremIFLtA0lSuN8O
To: /Users/gaurangdave/workspace/mnist_digits_recognition/data/mnist_train_set.csv
100%|██████████| 102M/102M [00:01<00:00, 62.9MB/s] 


'../data/mnist_train_set.csv'

## Read Data

In [9]:
mnist_train_set = pd.read_csv(data_dir)

In [10]:
## split training data into features and target
train_X = mnist_train_set.drop("class", axis=1)
train_Y = mnist_train_set["class"]


In [11]:
### Preprocessing Function

def preprocess_data(data, method="none", threshold = 128):
    """
    Preprocess MNIST data based on the specified method.

    Args:
        data (pd.DataFrame): Input dataset with only features.
        method (str): Preprocessing method - "normalize", "binarize", or "none".

    Returns:
        pd.DataFrame: Preprocessed dataset.
    """
    if method == "normalize":
        scaler = MinMaxScaler()
        transformed_data = scaler.fit_transform(data)
        return pd.DataFrame(transformed_data)
    elif method == "binarize":
        binarizer = Binarizer(threshold=threshold)
        transformed_data = binarizer.fit_transform(data)
        return pd.DataFrame(transformed_data)
    # else, keep features unchanged (no transformation)

    # Combine processed features and labels
    return pd.DataFrame(data)

## helper function to print aggregated descrition of features
def print_aggregated_description(data):
  # Check the range of normalized pixel values
  print(f"Min of mins in data is {data.iloc[:, :].min().min()} and max of mins in data is {data.iloc[:, :].min().max()}")
  print(f"Min of max in data is {data.iloc[:, :].max().min()} and max of max in data is {data.iloc[:, :].max().max()}")

  # # Check the mean and standard deviation
  print(f"Aggregated mean of data is {data.iloc[:, :].mean().mean()}")
  print(f"Aggregated standard deviation of data is {data.iloc[:, :].std().mean()}")

In [12]:
preprocess_transformer = FunctionTransformer(preprocess_data, feature_names_out="one-to-one")


## this should output the dataframe as it is without any changes.
preprocessed_data = pd.DataFrame(preprocess_transformer.fit_transform(train_X, y=None), columns=preprocess_transformer.get_feature_names_out())
print_aggregated_description(preprocessed_data)

Min of mins in data is 0 and max of mins in data is 0
Min of max in data is 0 and max of max in data is 255
Aggregated mean of data is 33.40283570973032
Aggregated standard deviation of data is 49.25044784975305


In [13]:
## helper function to calculate per class f1 scores
def per_class_f1_score(actual_classes, prediction_classes):
    # Compute F1 scores for each class directly
    f1_scores = f1_score(actual_classes, prediction_classes, average=None)
    # Create a list of dictionaries for output
    per_class_f1_scores = [{"class": i, "f1_score": score} for i, score in enumerate(f1_scores)]

    return per_class_f1_scores

def update_model_comparison(probabilities, true_labels, algorithm, method, comparison_df=None):
    """
    Updates the model comparison DataFrame with metrics for a given model.

    Args:
        probabilities (ndarray): Probabilities or predicted values for the dataset.
        true_labels (Series or ndarray): True labels for the dataset.
        algorithm (str): Name of the algorithm (e.g., 'Logistic Regression').
        method (str): Method used (e.g., 'Default Params', 'Grid Search').
        comparison_df (DataFrame or None): Existing comparison DataFrame. If None, a new one is created.

    Returns:
        DataFrame: Updated comparison DataFrame with metrics for the given model.
    """

    # Get predicted classes (argmax for probabilities)
    predicted_classes = probabilities.argmax(axis=1)

    # Compute metrics
    accuracy = accuracy_score(true_labels, predicted_classes)
    weighted_f1 = f1_score(true_labels, predicted_classes, average='weighted')
    roc_score = roc_auc_score(train_Y, probabilities, multi_class="ovr")
    # Compute per-class F1 scores
    per_class_f1_scores = f1_score(true_labels, predicted_classes, average=None)
    per_class_f1_dict = {f"Class_{i}": score for i, score in enumerate(per_class_f1_scores)}

    # Create a new row with metrics
    new_row = {
        "Algorithm": algorithm,
        "Method": method,
        "Accuracy": accuracy,
        "Weighted F1 Score": weighted_f1,
        "ROC AUC Score": roc_score,
        **per_class_f1_dict,  # Unpack per-class F1 scores
    }

    # Initialize or update the DataFrame
    if comparison_df is None:
      return pd.DataFrame([new_row])


    # Append the new row
    comparison_df = pd.concat([comparison_df, pd.DataFrame([new_row])], ignore_index=True)

    return comparison_df

## helper function to save the model metrics to google drive
# def save_comparison_df(comparison_df):
#   comparison_df.to_csv(f"{shared_folder_path}/mnist_models_metrics.csv", index=False)


## Logistic Regression - Cross Validation

In [14]:
from sklearn.linear_model import LogisticRegression

## initialize LogisticRegression
logistic_regression = LogisticRegression(max_iter=10000)

## create pipeline
pipeline = Pipeline([
    ("preprocessing", FunctionTransformer(preprocess_data, kw_args={"method": "normalize"})),
    ("training", logistic_regression)
])

* Just as a learning experience lets run cross validation by default on single core and then we'll run it on all the cores

In [15]:
start = time()
probabilities = cross_val_predict(pipeline, train_X, train_Y, cv=3, method="predict_proba" )
end = time()
## calculate time taken
time_taken = end - start
print(f"Time taken to train the model is {time_taken}")

Time taken to train the model is 32.639790296554565


* It took ~35 Seconds to run cross_val_predict on single core, next we'll run it on all 12 cores and see if there is a significant improvement.

In [25]:
start = time()
probabilities = cross_val_predict(pipeline, train_X, train_Y, cv=3, method="predict_proba", n_jobs=-1)
end = time()
## calculate time taken
time_taken = end - start
print(f"Time taken to train the model is {time_taken}")

Time taken to train the model is 20.01595401763916


* Wow the time reduced by ~15s, which is a nice optimization. But the main culprit we saw was `GridSearchCV` which took almost an hour to train. So next we are going to test that. 
* This might be a bit time consuming as I'll do both with and without `n_jobs` but it help to compare apple to apple (no pun intended since I am testing on mac)

## LogisticRegression Grid Search

In [26]:
param_grid = {
    "logisticregression__solver": ['saga', 'lbfgs'],
    "preprocessing__kw_args": [{"method": "normalize"}, {"method": "binarize"}],
}

## initialize LogisticRegression
logistic_regression = LogisticRegression(max_iter=10000, verbose=1)

## create pipeline
pipeline = Pipeline([
    ("preprocessing", FunctionTransformer(preprocess_data, kw_args={"method": "normalize"})),
    ("logisticregression", logistic_regression)
])

In [27]:
start = time()
grid_search = GridSearchCV(pipeline, param_grid, cv=3, scoring="f1_weighted")
grid_search.fit(train_X, train_Y)
end = time()
## calculate time taken
time_taken = end - start
print(f"Time taken to train the model is {time_taken}")

Epoch 1, change: 1.00000000
Epoch 2, change: 0.22414523
Epoch 3, change: 0.11030604
Epoch 4, change: 0.08455607
Epoch 5, change: 0.06965100
Epoch 6, change: 0.06353221
Epoch 7, change: 0.05055988
Epoch 8, change: 0.04621198
Epoch 9, change: 0.03984642
Epoch 10, change: 0.03588041
Epoch 11, change: 0.03228636
Epoch 12, change: 0.03078997
Epoch 13, change: 0.02907866
Epoch 14, change: 0.02706978
Epoch 15, change: 0.02595939
Epoch 16, change: 0.02422542
Epoch 17, change: 0.02339458
Epoch 18, change: 0.02212805
Epoch 19, change: 0.02110725
Epoch 20, change: 0.02007680
Epoch 21, change: 0.01933300
Epoch 22, change: 0.01871460
Epoch 23, change: 0.01772231
Epoch 24, change: 0.01732305
Epoch 25, change: 0.01649646
Epoch 26, change: 0.01613322
Epoch 27, change: 0.01543447
Epoch 28, change: 0.01485588
Epoch 29, change: 0.01446404
Epoch 30, change: 0.01384044
Epoch 31, change: 0.01350972
Epoch 32, change: 0.01305257
Epoch 33, change: 0.01261413
Epoch 34, change: 0.01221085
Epoch 35, change: 0.011

 This problem is unconstrained.



At iterate   50    f=  2.48657D-01    |proj g|=  8.58704D-04

At iterate  100    f=  2.25728D-01    |proj g|=  9.29761D-04

At iterate  150    f=  2.19626D-01    |proj g|=  2.90211D-04

At iterate  200    f=  2.17672D-01    |proj g|=  4.76671D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    241    251      1     0     0   9.990D-05   2.169D-01
  F =  0.21694883513591931     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            


 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.41774D-02

At iterate   50    f=  2.49618D-01    |proj g|=  2.02680D-03

At iterate  100    f=  2.28005D-01    |proj g|=  3.96563D-04

At iterate  150    f=  2.21342D-01    |proj g|=  4.12738D-04

At iterate  200    f=  2.19318D-01    |proj g|=  1.91980D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    239    258      1     0     0   9.493D-05   2.187D-01
  F =  0.21867118602222635     

CONVERG

 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.40563D-02

At iterate   50    f=  2.57646D-01    |proj g|=  2.05869D-03

At iterate  100    f=  2.34153D-01    |proj g|=  5.74131D-04

At iterate  150    f=  2.27762D-01    |proj g|=  1.02487D-03

At iterate  200    f=  2.25580D-01    |proj g|=  4.59504D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    245    259      1     0     0   7.831D-05   2.248D-01
  F =  0.22482750509435362     

CONVERG

 This problem is unconstrained.



At iterate   50    f=  2.51174D-01    |proj g|=  8.43499D-04

At iterate  100    f=  2.30347D-01    |proj g|=  3.77731D-04

At iterate  150    f=  2.24819D-01    |proj g|=  3.34275D-04

At iterate  200    f=  2.23046D-01    |proj g|=  2.30095D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    216    228      1     0     0   9.461D-05   2.228D-01
  F =  0.22279333687911895     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            


 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.75006D-02

At iterate   50    f=  2.51098D-01    |proj g|=  1.17335D-03

At iterate  100    f=  2.30823D-01    |proj g|=  4.62178D-04

At iterate  150    f=  2.25290D-01    |proj g|=  3.65595D-04

At iterate  200    f=  2.23523D-01    |proj g|=  2.97381D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    240    257      1     0     0   9.050D-05   2.229D-01
  F =  0.22290162217723683     

CONVERG

 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.74506D-02

At iterate   50    f=  2.59041D-01    |proj g|=  1.61150D-03

At iterate  100    f=  2.36663D-01    |proj g|=  4.59939D-04

At iterate  150    f=  2.30742D-01    |proj g|=  3.40621D-04

At iterate  200    f=  2.28966D-01    |proj g|=  1.52393D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    249    267      1     0     0   7.912D-05   2.283D-01
  F =  0.22833325684631053     

CONVERG

 This problem is unconstrained.



At iterate   50    f=  2.58501D-01    |proj g|=  8.08079D-04

At iterate  100    f=  2.38475D-01    |proj g|=  6.77655D-04

At iterate  150    f=  2.32623D-01    |proj g|=  3.64700D-04

At iterate  200    f=  2.30360D-01    |proj g|=  1.48746D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    232    245      1     0     0   9.548D-05   2.297D-01
  F =  0.22968836437421747     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
Time taken to train the model is 2503.677103996277


* So it took ~2504 seconds which is ~41 mins running on single core. 
* Which means 12 fits on single core took 40 mins, next we'll run it on all cores, which is 12 cores in our case. 

In [28]:
start = time()
grid_search = GridSearchCV(pipeline, param_grid, cv=3, scoring="f1_weighted", n_jobs=-1)
grid_search.fit(train_X, train_Y)
end = time()
## calculate time taken
time_taken = end - start
print(f"Time taken to train the model is {time_taken}")

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.40563D-02
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.74506D-02


 This problem is unconstrained.
 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.75006D-02
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.77256D-02
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.41774D-02


 This problem is unconstrained.
 This problem is unconstrained.
 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.40194D-02


 This problem is unconstrained.


Epoch 1, change: 1.00000000
Epoch 1, change: 1.00000000
Epoch 1, change: 1.00000000
Epoch 1, change: 1.00000000
Epoch 1, change: 1.00000000
Epoch 1, change: 1.00000000
Epoch 2, change: 0.26361163
Epoch 2, change: 0.25739143
Epoch 2, change: 0.25185903
Epoch 2, change: 0.22412474
Epoch 2, change: 0.21837908
Epoch 2, change: 0.23445243
Epoch 3, change: 0.13678575
Epoch 3, change: 0.17044866
Epoch 3, change: 0.12235536
Epoch 3, change: 0.13563202
Epoch 3, change: 0.16191578
Epoch 3, change: 0.13407162
Epoch 4, change: 0.09050248

At iterate   50    f=  2.57646D-01    |proj g|=  2.05869D-03
Epoch 4, change: 0.09537063
Epoch 4, change: 0.10950000
Epoch 4, change: 0.11441847
Epoch 4, change: 0.10689675

At iterate   50    f=  2.59041D-01    |proj g|=  1.61150D-03

At iterate   50    f=  2.51174D-01    |proj g|=  8.43499D-04
Epoch 4, change: 0.08496962

At iterate   50    f=  2.51098D-01    |proj g|=  1.17335D-03

At iterate   50    f=  2.49618D-01    |proj g|=  2.02680D-03

At iterate   50  

* Wow! This time it just took 1083 seconds which is ~18 mins, thats little more than 2x performance gains.
* Lets try adding `Memory` to enable caching and see if it improves the performaces. 

In [14]:
from sklearn.base import BaseEstimator, TransformerMixin

## creating custom preprocessing function which shouldn't require serialization
class CustomPreprocessor(BaseEstimator, TransformerMixin):
    def __init__(self, method="none", threshold=128):
        self.method = method
        self.threshold = threshold
    
    def transform(self, data):
        if self.method == "normalize":
            scaler = MinMaxScaler()
            transformed_data = scaler.fit_transform(data)
            return pd.DataFrame(transformed_data)
        elif self.method == "binarize":
            binarizer = Binarizer(threshold=self.threshold)
            transformed_data = binarizer.fit_transform(data)
            return pd.DataFrame(transformed_data)
        return pd.DataFrame(data)
    
    def fit(self, X, y=None):
        return self

In [15]:
from joblib import Memory
from sklearn.linear_model import LogisticRegression

## initialize LogisticRegression
logistic_regression = LogisticRegression(max_iter=10000, verbose=1)
param_grid = {
    "logisticregression__solver": ['saga', 'lbfgs'],
    "preprocessing__method": ["normalize", "binarize"],
}

memory = Memory(location="./cachedir", verbose=0)

## create pipeline
pipeline = Pipeline([
    ("preprocessing", CustomPreprocessor(method="normalize")),
    ("logisticregression", logistic_regression)
], memory=memory)


start = time()
grid_search = GridSearchCV(pipeline, param_grid, cv=3, scoring="f1_weighted")
grid_search.fit(train_X, train_Y)
end = time()
## calculate time taken
time_taken = end - start
print(f"Time taken to train the model is {time_taken}")

Epoch 1, change: 1.00000000
Epoch 2, change: 0.22907515
Epoch 3, change: 0.10989471
Epoch 4, change: 0.08231106
Epoch 5, change: 0.07824500
Epoch 6, change: 0.05739446
Epoch 7, change: 0.05364002
Epoch 8, change: 0.04788767
Epoch 9, change: 0.04213015
Epoch 10, change: 0.03627619
Epoch 11, change: 0.03292099
Epoch 12, change: 0.03040435
Epoch 13, change: 0.02906245
Epoch 14, change: 0.02737646
Epoch 15, change: 0.02563304
Epoch 16, change: 0.02384056
Epoch 17, change: 0.02314766
Epoch 18, change: 0.02197626
Epoch 19, change: 0.02126991
Epoch 20, change: 0.01956455
Epoch 21, change: 0.01924208
Epoch 22, change: 0.01830271
Epoch 23, change: 0.01781268
Epoch 24, change: 0.01708113
Epoch 25, change: 0.01658515
Epoch 26, change: 0.01607412
Epoch 27, change: 0.01546980
Epoch 28, change: 0.01493256
Epoch 29, change: 0.01425843
Epoch 30, change: 0.01400412
Epoch 31, change: 0.01350050
Epoch 32, change: 0.01285936
Epoch 33, change: 0.01260878
Epoch 34, change: 0.01224246
Epoch 35, change: 0.011

 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.40194D-02

At iterate   50    f=  2.48657D-01    |proj g|=  8.58704D-04

At iterate  100    f=  2.25728D-01    |proj g|=  9.29761D-04

At iterate  150    f=  2.19626D-01    |proj g|=  2.90211D-04

At iterate  200    f=  2.17672D-01    |proj g|=  4.76671D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    241    251      1     0     0   9.990D-05   2.169D-01
  F =  0.21694883513591931     

CONVERG

 This problem is unconstrained.



At iterate   50    f=  2.49618D-01    |proj g|=  2.02680D-03

At iterate  100    f=  2.28005D-01    |proj g|=  3.96563D-04

At iterate  150    f=  2.21342D-01    |proj g|=  4.12738D-04

At iterate  200    f=  2.19318D-01    |proj g|=  1.91980D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    239    258      1     0     0   9.493D-05   2.187D-01
  F =  0.21867118602222635     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iter

 This problem is unconstrained.



At iterate   50    f=  2.57646D-01    |proj g|=  2.05869D-03

At iterate  100    f=  2.34153D-01    |proj g|=  5.74131D-04

At iterate  150    f=  2.27762D-01    |proj g|=  1.02487D-03

At iterate  200    f=  2.25580D-01    |proj g|=  4.59504D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    245    259      1     0     0   7.831D-05   2.248D-01
  F =  0.22482750509435362     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iter

 This problem is unconstrained.



At iterate   50    f=  2.51174D-01    |proj g|=  8.43499D-04

At iterate  100    f=  2.30347D-01    |proj g|=  3.77731D-04

At iterate  150    f=  2.24819D-01    |proj g|=  3.34275D-04

At iterate  200    f=  2.23046D-01    |proj g|=  2.30095D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    216    228      1     0     0   9.461D-05   2.228D-01
  F =  0.22279333687911895     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            


 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.75006D-02

At iterate   50    f=  2.51098D-01    |proj g|=  1.17335D-03

At iterate  100    f=  2.30823D-01    |proj g|=  4.62178D-04

At iterate  150    f=  2.25290D-01    |proj g|=  3.65595D-04

At iterate  200    f=  2.23523D-01    |proj g|=  2.97381D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    240    257      1     0     0   9.050D-05   2.229D-01
  F =  0.22290162217723683     

CONVERG

 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         7850     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30259D+00    |proj g|=  6.74506D-02

At iterate   50    f=  2.59041D-01    |proj g|=  1.61150D-03

At iterate  100    f=  2.36663D-01    |proj g|=  4.59939D-04

At iterate  150    f=  2.30742D-01    |proj g|=  3.40621D-04

At iterate  200    f=  2.28966D-01    |proj g|=  1.52393D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    249    267      1     0     0   7.912D-05   2.283D-01
  F =  0.22833325684631053     

CONVERG

 This problem is unconstrained.



At iterate   50    f=  2.58501D-01    |proj g|=  8.08079D-04

At iterate  100    f=  2.38475D-01    |proj g|=  6.77655D-04

At iterate  150    f=  2.32623D-01    |proj g|=  3.64700D-04

At iterate  200    f=  2.30360D-01    |proj g|=  1.48746D-04
Time taken to train the model is 2524.5359501838684
           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 7850    232    245      1     0     0   9.548D-05   2.297D-01
  F =  0.22968836437421747     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            



In [16]:
time_taken = end - start
print(f"Time taken to train the model is {time_taken}")

Time taken to train the model is 2524.5359501838684


* So adding cache doesn't seem to improve the performance as much. 
* For now, multi-core is the way to go to get faster cross validation and hyper param tuning. 