# The ML Modeling Process Basics 
In this notebook, we will go through some of the basic techinques for modeling data. This is a companion workbook for the 365 Data Science course on ML Process. This notebook only foucses on implementation. Check out the course or the documentation for the in-depth explanations of each step

In this case, we will be trying to predict if we can predict a stroke from the above dataset. 

We will cover:
- Baseline creation
- Model selection
- Parameter tuning
     - manual
     - gridsearch
     - random search
     - basian optomization
- Ensemble models


### On the Data 
This dataset is a good representation of real world data that can have valuable impact when analyzed. We will be exploring the accuracy of different models for predicting if someone will purchase an auto insurance policy or not. We will first lightly explore the data, create our train, test / validation sets, then we will ceate a baseline model. To get the best results we will compare other algorithms to our basline and use various parameter tuning techniques to see which model produces the best results. At the end we will explore some ensemble models to see what produces the best results. 

The focus of this notebook is the modeling process. If you're interested in the specifics of differen machine learning algorithms, check out our other course specifically on that. 

## Load Data

Next, we'll need to load our AirBnb dataset. 

In [1]:
import pandas as pd 
import numpy as np

https://www.kaggle.com/datasets/sasivirat18/machine-learning-datasets

In [2]:
df = pd.read_csv(r"C:\Desktop\Cross_sell_prediction.csv")

In [3]:
#look at basic data for continuous variables 
df.describe()

Unnamed: 0,id,Age,Driving_License,Region_Code,Previously_Insured,Annual_Premium,Policy_Sales_Channel,Vintage,Response
count,381109.0,381109.0,381109.0,381109.0,381109.0,381109.0,381109.0,381109.0,381109.0
mean,190555.0,38.822584,0.997869,26.388807,0.45821,30564.389581,112.034295,154.347397,0.122563
std,110016.836208,15.511611,0.04611,13.229888,0.498251,17213.155057,54.203995,83.671304,0.327936
min,1.0,20.0,0.0,0.0,0.0,2630.0,1.0,10.0,0.0
25%,95278.0,25.0,1.0,15.0,0.0,24405.0,29.0,82.0,0.0
50%,190555.0,36.0,1.0,28.0,0.0,31669.0,133.0,154.0,0.0
75%,285832.0,49.0,1.0,35.0,1.0,39400.0,152.0,227.0,0.0
max,381109.0,85.0,1.0,52.0,1.0,540165.0,163.0,299.0,1.0


In [4]:
df.describe(include=np.object)

Unnamed: 0,Gender,Vehicle_Age,Vehicle_Damage
count,381109,381109,381109
unique,2,3,2
top,Male,1-2 Year,Yes
freq,206089,200316,192413


In [5]:
#small enough number of null values we will just remove them.
df.isnull().sum()

id                      0
Gender                  0
Age                     0
Driving_License         0
Region_Code             0
Previously_Insured      0
Vehicle_Age             0
Vehicle_Damage          0
Annual_Premium          0
Policy_Sales_Channel    0
Vintage                 0
Response                0
dtype: int64

In [6]:
# check for possible nulls in categoricals / non answers 
for i in df.select_dtypes(include=['object']).columns:
    print(df[i].value_counts())

Male      206089
Female    175020
Name: Gender, dtype: int64
1-2 Year     200316
< 1 Year     164786
> 2 Years     16007
Name: Vehicle_Age, dtype: int64
Yes    192413
No     188696
Name: Vehicle_Damage, dtype: int64


In [7]:
#dropping variable for now, but could likely be used to imrove our models with some engineering! 
df.Policy_Sales_Channel.value_counts()

152.0    134784
26.0      79700
124.0     73995
160.0     21779
156.0     10661
          ...  
144.0         1
149.0         1
84.0          1
143.0         1
43.0          1
Name: Policy_Sales_Channel, Length: 155, dtype: int64

In [8]:
df_trimmed = df.loc[:,['Gender','Age','Driving_License','Previously_Insured','Vehicle_Age','Vehicle_Damage','Annual_Premium','Vintage','Response']]

In [9]:
#drop null values and create dummy variables
df_final = pd.get_dummies(df_trimmed).dropna()

In [10]:
df_final.columns

Index(['Age', 'Driving_License', 'Previously_Insured', 'Annual_Premium',
       'Vintage', 'Response', 'Gender_Female', 'Gender_Male',
       'Vehicle_Age_1-2 Year', 'Vehicle_Age_< 1 Year', 'Vehicle_Age_> 2 Years',
       'Vehicle_Damage_No', 'Vehicle_Damage_Yes'],
      dtype='object')

In [11]:
df_final.Response.value_counts()

0    334399
1     46710
Name: Response, dtype: int64

In [12]:
#Create train test split 

from sklearn.model_selection import train_test_split
X = df_final.drop('Response', axis =1)
y = df_final.loc[:,['Response']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [13]:
#balance the data (SMOTE) Try this if interested: https://www.kaggle.com/code/kenjee/dealing-with-imbalanced-data-section-10
"""from imblearn.over_sampling import SMOTE 
smote = SMOTE(sampling_strategy =1)

X_train, y_train = smote.fit_resample(X_train,y_train)"""

'from imblearn.over_sampling import SMOTE \nsmote = SMOTE(sampling_strategy =1)\n\nX_train, y_train = smote.fit_resample(X_train,y_train)'

# Creating a Basline Model
How can we tell if our machine learning models are any good? To evaluate performance, we need to benchmark against something. In this case, we will create two baslines for our model. First, we can simply look at the average of our data for a numeric value. If we were going to predict the age, we could simply guess the average age for every candidate. 

On the other hand, for a categorical variable, we could simply guess 50/50 or the ratio of the categories in the data. In this case, the conversion data is imbalanced with 46710/ 334399 samples being of the non-stroke cateogry. That means that if we guessed that everyone in the sample didn't have a stroke, we would have a 86.0% success rate. Since this data is slightly imblanaced, this would not be a good baseline for our model.

One of the most important steps that we need to take is choosing a good evluation metric. The notebook that covers specific evaluation metrics can be located here: 

Accuracy does not make sense because of the imbalanced nature of the data. For this example we will use F1 score as our model evaluation metric.

- F1 is calculated by 2*((precision*recall)/(precision+recall))

- Instead of a simple accuracy calculation which would give us a baseline of 96.1%, F1 score gives us an undefined number since both the precision and recall of a model that only predicted negatives would equal 0. 

- In this case, we want to use a simple basleline model like Naive Bayes to set our baseline based off of f1 score. You can use most models to create a baseline, but I like Naive bayes because it is quick and doesn't require much parameter tuning. (Full breakdown of Naieve Bayes in or Algorithms Course)

In [14]:
#import cross validation score
from sklearn.model_selection import cross_val_score

#import Naive Bayes Classifier 
from sklearn.naive_bayes import GaussianNB

#create classifier object
nb = GaussianNB()

#run cv for NB classifier
from sklearn.metrics import classification_report

nb_accuracy = cross_val_score(nb,X_train,y_train.values.ravel(), cv=3, scoring ='accuracy')
nb_f1 = cross_val_score(nb,X_train,y_train.values.ravel(), cv=3, scoring ='f1')

print('nb_accuracy: ' +str(nb_accuracy))
print('nb F1_Macro Score: '+str(nb_f1))
print('nb_accuracy_avg: ' + str(nb_accuracy.mean()) +'  |  lr_f1_avg: '+str(nb_f1.mean()))


#With these F1 scores, we can begin evaluating our model. While the accuracy is lower than if we only predicted 0 every time,
# our f1 score suggests we are doing a far better job of predicting stroke outcomes. 

nb_accuracy: [0.74467535 0.74676413 0.7414113 ]
nb F1_Macro Score: [0.40969243 0.41583439 0.41445341]
nb_accuracy_avg: 0.7442835922580899  |  lr_f1_avg: 0.4133267459130525


# Model Comparison & Selection 
After we have a baseline model to compare against, we want to evaluate how other models might perform on the same data. I like to experiment with other basic models with very little paramater tuning to see what performs well. This isn't an exact science and many people may do this step differently. After we set up the models, we can begin experimenting with parameter tuning. I find that model selection and parameter tuning is often an iterative process. For an analysis like this, trying different models, changing parameters, and experimenting with new engineered features is where I find myself spending most of my time working. 

In this section we will try:
- Logistic regression
- Decision Tree
- K Nearest Neighbors (KNN)

In [None]:
#Let's now experiment with a few different basic models 

## Decision Tree
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier(random_state =32)

dt_accuracy = cross_val_score(dt,X_train,y_train.values.ravel(), cv=3, scoring ='accuracy')
dt_f1 = cross_val_score(dt,X_train,y_train.values.ravel(), cv=3, scoring ='f1')

print('dt_accuracy: ' +str(dt_accuracy))
print('dt F1_Macro Score: '+str(dt_f1))
print('dt_accuracy_avg: ' + str(dt_accuracy.mean()) +'  |  dt_f1_avg: '+str(dt_f1.mean())+'\n')


## Logistic Regression
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=32, max_iter = 2000, class_weight = 'balanced')

lr_accuracy = cross_val_score(lr,X_train,y_train.values.ravel(), cv=3, scoring ='accuracy')
lr_f1 = cross_val_score(lr,X_train,y_train.values.ravel(), cv=3, scoring ='f1')

print('lr_accuracy: ' +str(lr_accuracy))
print('lr F1_Macro Score: '+str(lr_f1))
print('lr_accuracy_avg: ' + str(lr_accuracy.mean()) +'  |  lr_f1_avg: '+str(lr_f1.mean())+'\n')


## KNN 
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline, Pipeline

knn = make_pipeline(StandardScaler(), KNeighborsClassifier(n_neighbors=3))
knn_accuracy = cross_val_score(knn,X_train,y_train.values.ravel(), cv=3, scoring ='accuracy')
knn_f1 = cross_val_score(knn,X_train,y_train.values.ravel(), cv=3, scoring ='f1')

print('knn_accuracy: ' +str(knn_accuracy))
print('knn F1_Macro Score: '+str(knn_f1))
print('knn_accuracy_avg: ' + str(knn_accuracy.mean()) +'  |  knn_f1_avg: '+str(knn_f1.mean()))


dt_accuracy: [0.82060365 0.82309812 0.82119764]
dt F1_Macro Score: [0.27935131 0.27789764 0.27746978]
dt_accuracy_avg: 0.8216331342755206  |  dt_f1_avg: 0.2782395764131652

lr_accuracy: [0.63564087 0.68287883 0.4755693 ]
lr F1_Macro Score: [0.39506357 0.41276915 0.24206472]
lr_accuracy_avg: 0.5980296667946023  |  lr_f1_avg: 0.34996581243326824



# Model Comparison 
It looks like we chose a pretty good baseline. While it slightly underperforms all of our new models in accuracy, it outperforms all of them in F1 score which is what we care about most for this analysis. Let's look at how everything stacks up. 

|Model          | F1 Score      |
| :------------ | :-----------: |
| **Baseline Naive Bayes**  | **41.3%**     |
| Logistic Regression  | **35.0%**     |
| Decision Tree  | **27.6%**     |
| K Nearest Neighbors | **24.6%**     |



While all of our models outperformed our basline, we still can do better. We can now parameter tune! That means that we make adjustments to the model parameter inputs to better compensate for our specific data. One of the drawbacks of Naive Bayes is that it has virtually no paramaters that we can tune, so our inital results are about the best we will get with it without making changes to our data. 

# Manual Parameter Tuning
Let's try to do some parameter tuning with a few of these models:

Let's start with K Nearest Neighbors,which has a few parameters we can adjust, one of them being the number of K. K is how many other datapoints it uses to make its classification. If k= 3 it uses it sees what the samples 3 closest neighbors is and classifies it as the most common one. If k = 5, it uses its 5 closest datapoints. Let's change the number of k and see if that changes our results. 

In [None]:
#Knn Model Comparison 

#here we will loop through and see which value of k performs the best. 

for i in range(1,7):
    knn = make_pipeline(StandardScaler(), KNeighborsClassifier(n_neighbors=i))
    knn_f1 = cross_val_score(knn,X_train,y_train.values.ravel(), cv=3, scoring ='f1')
    print('K ='+(str(i)) + (': ') + str(knn_f1.mean()))

#What we find is that k=1 is the best estimator for this specific model. We go from 24.6% to 27.6%, a decent improvement! 
#We also realize that KNN may not be the best approach here because of the imbalanced data. 
#The larger the K is, the more of the majority class will automatically be included.

# Randomized Parameter Tuning
Since KNN may not be the best choice, let's explore the deicision tree. Decision trees have a lot more features we can tune. We can tweak the following:
- criterion {gini, entropy, log loss}
- splitter {best, random}
- max depth {int, None}
- min_samples_split {int, None}
- min_samples_leave {int, None}
- min_weight_fraction_leaf {float}
- max_features {int, auto, sqrt, log2, None}
- max_leaf_nodes {int, None}
- min_impurity_decrease {float}
- class_weight {dict, balanced, None}
- ccp_alpha {float}

There are a lot of parameters to tune! If there are just 2 options for each one that would be 2^11, which is 2048 total configurations. In theory, there are infinate numbers of paramater configurations. How do we even get close to finding the best one? 

The answer here is randomized search. We through in all the parameters that we are interested in searching, and the model will randomly select a subset and return the one that produces the best results. 

Still, let's manually select a few paramaters we want to evaluate on and then use randomized search:
- criterion
- split strategy
- max depth
- min_samples_split
- max features

In [None]:
from sklearn.model_selection import RandomizedSearchCV

dt = DecisionTreeClassifier(random_state = 42)

features = {'criterion': ['gini','entropy'],
            'splitter': ['best','random'],
           'max_depth': [2,5,10,20,40,None],
           'min_samples_split': [2,5,10,15],
           'max_features': ['auto','sqrt','log2',None]}

rs_dt = RandomizedSearchCV(estimator = dt, param_distributions =features, n_iter =100, cv = 3, random_state = 42, scoring ='f1')

rs_dt.fit(X_train,y_train)

In [None]:
print('best stcore = ' + str(rs_dt.best_score_))
print('best params = ' + str(rs_dt.best_params_))

# GridsearchCV (Exhaustive Parameter Tuning)
With this we have improved our model f1 score from **27.6% to 27.9%**. This is a decent increase! We also narrowed down some of the features that produced good results. We may want to try a more exhaustive search this time. Gridsearch goes through all of the possible combinations within an range and returns the best outcome. 

This time, let's do an exhaustive search of a smaller number of features and see if we can improve our results even more. 

In [None]:
from sklearn.model_selection import GridSearchCV


features_gs = {'criterion': ['entropy'],
            'splitter': ['random'],
           'max_depth': np.arange(30,50,1), #getting more precise within range
           'min_samples_split': [2,3,4,5,6,7,8,9],
           'max_features': [None]}

gs_dt = GridSearchCV(estimator = dt, param_grid =features_gs, cv = 3, scoring ='f1') #we don't need random state because there isn't randomization like before

gs_dt.fit(X_train,y_train)

In [None]:
print('best stcore = ' + str(gs_dt.best_score_))
print('best params = ' + str(gs_dt.best_params_))

#looks like we can  do a little better with this gridsearch! 

# Bayesian Optimization
I wonnder if we can do better than the funnel approach that we took with random search and gridsearch. What if we used a slightly smarter algorithm to help evaluate our features. Maybe we could explore all of the variables from the previous examples and see if our model missed something. This is where Bayesian Optimization comes in. This is an iterative process where our model improves its understandings of the feature inputs as it goes. (Full breakdown in the video portion of the course)

Now let's try to use this with a larger feature set on the same classifier. This won't guarantee a better result as it still is not an exahustive search, but in theory it let's us cover ground in a more efficient way. 

In [None]:
from skopt import BayesSearchCV
from sklearn.model_selection import StratifiedKFold

In [None]:

from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer

# Choose cross validation method 
cv = StratifiedKFold(n_splits = 3)


bs_lr = BayesSearchCV(
    dt,
    {'criterion': Categorical(['gini','entropy']),
            'splitter': Categorical(['best','random']),
           'max_depth': Integer(10,50),
           'min_samples_split': Integer(2,15),
           'max_features': Categorical(['auto','sqrt','log2',None])},
    random_state=42,
    n_iter= 100,
    cv= cv,
    scoring ='f1')
 
bs_lr.fit(X_train,y_train.values.ravel())

In [None]:
print('best stcore = ' + str(bs_lr.best_score_))
print('best params = ' + str(bs_lr.best_params_))

#while this didn't outperform our gridsearch, it is still a good approach to try when dealing with many different feature options. 
#it still did outperform our originial random search.

# Selecting a Model
We still haven't been able to do better than our baseline. In most cases, we to tune multiple different models until we reach one that performs the best based on our evaluation criteria. We also want to use other considerations like training time, prediction time, prediction time or interperetability to select selct the best model for our use case. 

Since we have one tuned model, lets see if we can improve it by combining it with a few of the other models we have used. This process is called ensembling. In the case of classification, we often use a popular vote metric to select the best model. 

Let's see if an ensemble model of these three classifiers outperforms our baseline model. 

In [None]:
from sklearn.ensemble import VotingClassifier

dt_voting = DecisionTreeClassifier(**{'criterion': 'entropy', 'max_depth': 44, 'max_features': None, 'min_samples_split': 2, 'splitter': 'random'}) # ** allows you to pass in parameters as dict
knn_voting = make_pipeline(StandardScaler(), KNeighborsClassifier(n_neighbors=1))
lr_voting = LogisticRegression(random_state=32, max_iter = 2000, class_weight = 'balanced')

ens = VotingClassifier(estimators = [('dt', dt_voting), ('knn', knn_voting), ('lr',lr_voting)], voting = 'hard')


In [None]:
voting_accuracy = cross_val_score(ens,X_train,y_train.values.ravel(), cv=3, scoring ='accuracy')
voting_f1 = cross_val_score(ens,X_train,y_train.values.ravel(), cv=3, scoring ='f1')

print('voting_accuracy: ' +str(voting_accuracy))
print('voting F1_Macro Score: '+str(voting_f1))
print('voting_accuracy_avg: ' + str(voting_accuracy.mean()) +'  |  voting_f1_avg: '+str(voting_f1.mean()))

In [None]:
ens = VotingClassifier(estimators = [('dt', dt_voting), ('knn', knn_voting), ('lr',lr_voting)], voting = 'soft')
voting_accuracy = cross_val_score(ens,X_train,y_train.values.ravel(), cv=3, scoring ='accuracy')
voting_f1 = cross_val_score(ens,X_train,y_train.values.ravel(), cv=3, scoring ='f1')

print('voting_accuracy: ' +str(voting_accuracy))
print('voting F1_Macro Score: '+str(voting_f1))
print('voting_accuracy_avg: ' + str(voting_accuracy.mean()) +'  |  voting_f1_avg: '+str(voting_f1.mean()))

# Stacked classifier 
In the case of the voting classifer, we didn't get better performance than our baseline model. Let's now try another type of ensembling called stacking. With stacking, we use the outputs of each of our individual models as features into a new model. In this case, where we have a decision tree, a naive baayes classifier, and a svc classifier, these will be the three features that a new model predicts on. 

Let's try running these three through a Naive Bayes Classifier and see what the results look like. 

In [None]:
from sklearn.ensemble import StackingClassifier

ens_stack = StackingClassifier(estimators = [('dt', dt_voting), ('lr',lr_voting), ('nb',GaussianNB())], final_estimator = GaussianNB())

In [None]:
stack_accuracy = cross_val_score(ens_stack,X_train,y_train.values.ravel(), cv=3, scoring ='accuracy')
stack_f1 = cross_val_score(ens_stack,X_train,y_train.values.ravel(), cv=3, scoring ='f1')

print('stacking_accuracy: ' +str(stack_accuracy))
print('stacking F1_Macro Score: '+str(stack_f1))
print('stacking_accuracy_avg: ' + str(stack_accuracy.mean()) +'  |  stack_f1_avg: '+str(stack_f1.mean()))

#in this case it didn't outperfrom, but it often does.

# Ensemble Models
The last main type of ensemble approach that we see is one that is designed that way algorithmically. Typically, random forest or gradient boosted models have ensembling built into their implementation. Let's explor random forest and see how this approach works for our data. (We have a breakdown of the main ensembling techniques in our full course on algorithms). These algorithms leverage multiple decision trees to either vote or give pass information on to subsequent models. 

In [None]:
from sklearn.ensemble import RandomForestClassifier

#first let's try a non-tuned implementation 
rf = RandomForestClassifier(random_state=42)

rf_accuracy = cross_val_score(rf,X_train,y_train.values.ravel(), cv=3, scoring ='accuracy')
rf_f1 = cross_val_score(rf,X_train,y_train.values.ravel(), cv=3, scoring ='f1')

In [None]:
print('rf_accuracy: ' +str(rf_accuracy))
print('rf F1_Macro Score: '+str(rf_f1))
print('rf_accuracy_avg: ' + str(rf_accuracy.mean()) +'  |  rf_f1_avg: '+str(rf_f1.mean()))

#of course, you can tune this model like the others! 

In [None]:
print([int(x) for x in np.linspace(10, 110, num = 11)]+[None])#.append(None))

In [None]:
random_grid = {'n_estimators': [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)],
               'max_features': ['auto', 'sqrt'],
               'max_depth': [int(x) for x in np.linspace(10, 110, num = 11)]+[None],
               'min_samples_split': [2, 5, 10],
               'min_samples_leaf': [1, 2, 4],
               'bootstrap': [True, False]}
rs_rf = RandomizedSearchCV(estimator = rf, param_distributions =random_grid, n_iter =100, cv = 3, random_state = 42, scoring ='f1')

rs_rf.fit(X_train,y_train.values.ravel())

In [None]:
from sklearn.metrics import f1_score

nb.fit(X_train,y_train.values.ravel())
ens.fit(X_train,y_train.values.ravel())
dt_voting.fit(X_train,y_train.values.ravel())
ens_stack.fit(X_train,y_train.values.ravel())
rf_est = RandomForestClassifier()
rf_est.fit(X_train,y_train.values.ravel())

nb_pred = nb.predict(X_test)
ens_pred = ens.predict(X_test)
dt_pred = dt_voting.predict(X_test)
ens_stack_pred = ens_stack.predict(X_test)
rf_pred = rf_est.predict(X_test)

print('baseline score ' + str(f1_score(y_test,nb_pred)))
print('dt score ' + str(f1_score(y_test,dt_pred)))
print('voting score ' + str(f1_score(y_test,ens_pred)))
print('Stacking score ' + str(f1_score(y_test,ens_stack_pred)))
print('rf score ' + str(f1_score(y_test,rf_pred)))

# Summary
In this notebook we implemented the following: 
We will cover:
- Baseline creation
- Model selection
- Parameter tuning
     - manual
     - gridsearch
     - random search
     - basian optomization
- Ensemble models


## Additional Resources
### Hyperparameter Tuning
- [What does "baseline" mean in the context of machine learning?](https://datascience.stackexchange.com/questions/30912/what-does-baseline-mean-in-the-context-of-machine-learning)
- [Sklearn's Dummy Estimators](https://scikit-learn.org/stable/modules/model_evaluation.html#dummy-estimators)
- [7 Hyperparameter Optimization Techniques Every Data Scientist Should Know](https://towardsdatascience.com/7-hyperparameter-optimization-techniques-every-data-scientist-should-know-12cdebe713da)
- [A Comprehensive Guide on Hyperparameter Tuning and its Techniques](https://www.analyticsvidhya.com/blog/2022/02/a-comprehensive-guide-on-hyperparameter-tuning-and-its-techniques/)
- [Hyperparameter tuning in Python by Tooba Jamal](https://towardsdatascience.com/hyperparameter-tuning-in-python-21a76794a1f7)
- [Random Search for Hyper-Parameter Optimization by James Bergestra and Yoshua Bengio](https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf)
- [A Conceptual Explanation of Bayesian Hyperparameter Optimization for Machine Learning by Will Koehrsen](https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f)
- [Bayesian Optimization Primer by SigOpt](https://static.sigopt.com/b/20a144d208ef255d3b981ce419667ec25d8412e2/static/pdf/SigOpt_Bayesian_Optimization_Primer.pdf)
- [Genetic Algorithms by Marcos Del Cueto](https://towardsdatascience.com/genetic-algorithm-to-optimize-machine-learning-hyperparameters-72bd6e2596fc)
- [Simulated Annealing From Scratch in Python by Jason Brownlee](https://machinelearningmastery.com/simulated-annealing-from-scratch-in-python/#:~:text=Simulated%20Annealing-,Simulated%20Annealing%20is%20a%20stochastic%20global%20search%20optimization%20algorithm.,it%20easier%20to%20work%20with.)
- [Optimization Techniques — Simulated Annealing by Frank Liang](https://towardsdatascience.com/optimization-techniques-simulated-annealing-d6a4785a1de7)
- [Hyperparameter optimization for Neural Networks](http://neupy.com/2016/12/17/hyperparameter_optimization_for_neural_networks.html#id13)

### Ensembling
- [Ensemble Methods: Elegant Techniques to Produce Improved Machine Learning Results](https://www.toptal.com/machine-learning/ensemble-methods-machine-learning#:~:text=Ensemble%20methods%20are%20techniques%20that,than%20a%20single%20model%20would)
- [A Gentle Introduction to Ensemble Learning Algorithms by Jason Brownlee](https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/#:~:text=The%20three%20main%20classes%20of,on%20your%20predictive%20modeling%20project.)
- [Types of Ensemble methods in Machine learning by Anju Tajbangshi](https://towardsdatascience.com/types-of-ensemble-methods-in-machine-learning-4ddaf73879db)
- [Introduction to Ensembling/Stacking in Python by Anisotropic](https://www.kaggle.com/code/arthurtok/introduction-to-ensembling-stacking-in-python)
- [Ensembles and Model Stacking by Eshaan Kirpal](https://www.kaggle.com/code/eshaan90/ensembles-and-model-stacking/notebook)