In [1]:
from sklearn.datasets import make_circles, make_moons
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import warnings

warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)


def prepare_data(func):
    X, Y = func(n_samples=1000, noise=0.2, random_state=0)
    tl = int(len(X) * 0.8)
    return X[:tl], X[tl:], Y[:tl], Y[tl:]


def plot_data(X_test, Y_test, Y_pred):
    fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(7, 3))
    sns.scatterplot(x=X_test[:, 0], y=X_test[:, 1], hue=Y_test, ax=ax1)
    ax1.set_title("Actual labels")
    sns.scatterplot(x=X_test[:, 0], y=X_test[:, 1], hue=Y_pred, ax=ax2)
    ax2.set_title("Predicted labels")
    plt.tight_layout()
    plt.show()


def confusion_matrix(Y_test, Y_pred, which):
    y_actu = pd.Series(Y_test, name="Actual")
    y_pred = pd.Series(Y_pred, name="Predicted")
    df_confusion = pd.crosstab(y_actu, y_pred)
    print(f"Confusion matrix for {which} data:")
    display(df_confusion)
    return df_confusion

# Decision Trees

## What are the advantages and disadvantages of decision trees?

Advantages:
- inexpensive to construct
- extremely fast at classifying unknown records
- easy to interpret for small-sized trees
- can easily handle redundant or irrelevant attributes (unless the attributes are interacting)

Disadvantages:
- space of possible decision trees is exponentially large
- greedy approaches are often unable to find the best tree
- does not take into account interactions between attributes
- each decision boundary involves only a single attribute

## How do you build a decision tree model?

To build a decision tree model:
1. Choose a splitting criterion (e.g., information gain, Gini impurity).
2. Select the best attribute to split on based on the splitting criterion.
3. Recursively apply steps 1 and 2 to each child node until a stopping criterion is met.
4. Assign a class label to each leaf node based on the majority class of the training instances in that node.

In [2]:
from sklearn.tree import DecisionTreeClassifier


def decision_boundary(func, which):
    X, Y = func(n_samples=1000, noise=0.2, random_state=0)
    tl = int(len(X) * 0.8)
    X_train, X_test, Y_train, Y_test = X[:tl], X[tl:], Y[:tl], Y[tl:]
    clf = DecisionTreeClassifier(
        criterion="gini",
        splitter="best",
        min_samples_split=2,
        min_samples_leaf=1,
        min_weight_fraction_leaf=0,
        min_impurity_decrease=0,
    )

    clf.fit(X_train, Y_train)
    Y_pred = clf.predict(X_test)
    return confusion_matrix(Y_test, Y_pred, which)


dtc = decision_boundary(make_circles, "circles")
dtm = decision_boundary(make_moons, "moons")

Confusion matrix for circles data:


Predicted,0,1
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,59,42
1,44,55


Confusion matrix for moons data:


Predicted,0,1
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,95,6
1,6,93


## What are the differences between classification and regression trees?

Classification trees are used when the target variable is categorical, while regression trees are used when the target variable is continuous. The splitting criterion for classification trees is typically based on impurity measures such as entropy or Gini index, while regression trees use variance reduction.

## What are impurity measures in decision trees?

Impurity measures are used to determine the best split at each node in a decision tree. The impurity of a node is a measure of how mixed the classes are in the node. Common impurity measures include entropy, Gini index, and misclassification error.

$$ \text{Entropy}(t) = -\sum_{c=1}^{C} p(c|t) \log_2(p(c|t)) $$

$$0 \leq \text{Entropy}(t) \leq 1$$

$$ \text{Gini}(t) = 1 - \sum_{c=1}^{C} [p(c|t)]^2 $$

$$0 \leq \text{Gini}(t) \leq 0.5$$

$$ \text{Misclassification error}(t) = 1 - \max_c[p(c|t)] $$

$$0 \leq \text{Misclassification error}(t) \leq 1$$

Where $t$ is the current node, $C$ is the number of classes, and $p(c|t)$ is the proportion of the samples that belong to class $c$ at node $t$. The impurity measure is used to calculate the information gain or Gini gain for each split.

### How do you use information gain to decide split?
1. Calculate the entropy of the target. $$ \text{Entropy}(S) = - \sum_{i=1}^n p_i \log_2 p_i $$
2. Calculate the entropy of the target for each feature. $$ \text{Entropy}(S, A) = \sum_{i=1}^n \frac{|S_i|}{|S|} \text{Entropy}(S_i) $$ where $S_i$ is the subset of $S$ for which feature $A$ has value $i$. $$ \text{Entropy}(S_i) = - \sum_{i=1}^n p_i \log_2 p_i $$
3. Calculate the information gain for each feature. $$ \text{Information Gain}(S, A) = \text{Entropy}(S) - \text{Entropy}(S, A) $$
4. Choose the feature with the highest information gain.

### How do you use Gini index to decide split?
1. Calculate the Gini index of the target. $$ \text{Gini}(S) = 1 - \sum_{i=1}^n p_i^2 $$
2. Calculate the Gini index of the target for each feature. $$ \text{Gini}(S, A) = \sum_{i=1}^n \frac{|S_i|}{|S|} \text{Gini}(S_i) $$ where $S_i$ is the subset of $S$ for which feature $A$ has value $i$. $$ \text{Gini}(S_i) = 1 - \sum_{i=1}^n p_i^2 $$
3. Calculate the information gain for each feature. $$ \text{Gini Gain}(S, A) = \text{Gini}(S) - \text{Gini}(S, A) $$
4. Choose the feature with the highest gini gain.

# Random Forests

## How do you build a random forest model?

To build a random forest model:
1. Randomly sample the training data with replacement to create multiple bootstrap samples.
2. For each bootstrap sample, grow a decision tree with the following modifications:
    - At each node, randomly select a subset of features to consider for splitting.
    - Split the node using the feature that provides the best split among the randomly selected features.
3. Repeat steps 1 and 2 to create a forest of decision trees.
4. To make predictions, aggregate the predictions of all trees in the forest (e.g., majority vote for classification, average for regression).
    - Hard voting: Each tree votes for a class, and the majority class is chosen.
    - Soft voting: Each tree predicts a probability, and the average probability is taken.

In [3]:
from sklearn.ensemble import RandomForestClassifier


def random_forest(func, which):
    X, Y = func(n_samples=1000, noise=0.2, random_state=0)
    tl = int(len(X) * 0.8)
    X_train, X_test, Y_train, Y_test = X[:tl], X[tl:], Y[:tl], Y[tl:]
    clf = RandomForestClassifier(n_estimators=100, max_features="sqrt")
    clf.fit(X_train, Y_train)
    Y_pred = clf.predict(X_test)
    return confusion_matrix(Y_test, Y_pred, which)


rfc = random_forest(make_circles, "circles")
rfm = random_forest(make_moons, "moons")

Confusion matrix for circles data:


Predicted,0,1
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,59,42
1,38,61


Confusion matrix for moons data:


Predicted,0,1
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,96,5
1,6,93


## What is pruning in a decision tree algorithm?

Pruning is a technique used to prevent overfitting in decision trees. There are two main types of pruning:
- Pre-pruning: Stop growing the tree when a certain condition is met (e.g., maximum depth, minimum samples per leaf).
- Post-pruning: Grow the tree to its maximum size and then remove nodes that do not provide significant improvement in impurity measures.

## What is the difference between the bagging and boosting model?

Bagging (Bootstrap Aggregating) and boosting are ensemble learning techniques that combine multiple models to improve performance.

Bagging:
- Train multiple models independently on different bootstrap samples of the training data (sampling with replacement).
- Combine the predictions of the models through averaging or voting.
- Examples: Random Forest, Bagged Decision Trees.

Pasting: Similar to bagging, but samples are drawn without replacement.

Boosting:
- Train models sequentially, where each model corrects the errors of its predecessor.
- Weight the training instances based on their performance in previous iterations.
- Examples: AdaBoost, Gradient Boosting.

## Describe random forests and their advantages over single-decision trees.

Random forests are an ensemble learning method that combines multiple decision trees to improve performance and reduce overfitting.
Advantages of random forests over single-decision trees:
- Reduced overfitting: By aggregating the predictions of multiple trees, random forests are less prone to overfitting.
- Improved accuracy: Random forests tend to have higher accuracy than single-decision trees.
- Feature importance: Random forests can provide information on the importance of features in the classification process.
- Robustness: Random forests are robust to noise and outliers in the data.
- Parallelization: Random forests can be easily parallelized to speed up training.

# Gradient Boosting

## What is gradient boosting and how does it work?

Gradient boosting is an ensemble learning technique that builds a model in a stage-wise fashion. It combines multiple weak learners (typically decision trees) to create a strong learner. The key idea behind gradient boosting is to fit a new model to the residual errors made by the previous model.

The algorithm works as follows:
1. Fit an initial model (e.g., decision tree) to the data.
2. Calculate the residuals (errors) between the predicted values and the actual values.
3. Fit a new model to the residuals, with the goal of reducing the residuals.

This process is repeated iteratively, with each new model fitting the residuals of the previous model. The final prediction is the sum of the predictions from all models.

## What are the advantages and disadvantages of gradient boosting?

Advantages:
- High predictive accuracy: Gradient boosting often achieves high accuracy on a wide range of problems.
- Handles different types of data: Gradient boosting can handle both numerical and categorical data.
- Feature importance: Gradient boosting provides information on the importance of features in the classification process.
- Robustness: Gradient boosting is robust to noise and outliers in the data.

Disadvantages:
- Computationally expensive: Gradient boosting can be computationally expensive, especially for large datasets.
- Hyperparameter tuning: Gradient boosting requires tuning of hyperparameters to achieve optimal performance.
- Overfitting: Gradient boosting can overfit if the number of trees is too large or the learning rate is too high.

## Which are the hyperparameters in gradient boosting?

- Number of trees (n_estimators): The number of boosting stages to perform. 
- Learning rate (eta): The step size shrinkage used to prevent overfitting. This parameter scales the contribution of each tree.
- Tree depth (max_depth): The maximum depth of each tree. Deeper trees can capture more complex patterns but are more prone to overfitting.
- Minimum leaf size (min_child_weight): The minimum sum of instance weight needed in a child. The instance weight is the number of times a sample appears in the dataset.
- Row sampling (subsample): The fraction of samples used to fit each tree. This parameter can help prevent overfitting.
- Column sampling (colsample_bytree): The fraction of features used to fit each tree. This parameter can help prevent overfitting.
- L1 regularization (reg_alpha): L1 regularization term on weights. This parameter can help prevent overfitting by encouraging sparsity in the model.
- L2 regularization (reg_lambda): L2 regularization term on weights. This parameter can help prevent overfitting by penalizing large weights.
- Split improvement (gamma): The minimum loss reduction required to make a further partition on a leaf node. This parameter can help prevent overfitting by controlling the complexity of the tree.
- Leaf nodes (num_leaves): The maximum number of leaves in each tree. More leaves can make the model more complex and prone to overfitting.
- Minimum samples per leaf (min_child_samples): The minimum number of samples required to form a leaf node. This parameter can help prevent overfitting by controlling the complexity of the tree.

## What are the parameter ranges, and how do they affect the model in gradient boosting?

| Parameter Purpose | XGBoost | LightGBM | Practical Tuning Range | Effect on increasing | Effect on decreasing |
|------------------|----------|-----------|----------------------|---------------------|---------------------|
| Number of trees | `n_estimators` | `n_estimators` | [100, 2000] | More trees -> better performance, longer training time | Fewer trees -> faster training, potential underfitting |
| Learning rate | `learning_rate` (eta) | `learning_rate` | [0.01, 0.3] | Smaller learning rate -> more trees needed, better generalization | Larger learning rate -> faster training, risk of overfitting |
| Tree depth | **`max_depth`** | `max_depth` | [3, 10] | Deeper trees -> more complex model, risk of overfitting | Shallower trees -> simpler model, underfitting |
| Minimum leaf size | `min_child_weight` | `min_child_weight` | [1, 10] | Larger value -> more conservative model, less overfitting | Smaller value -> more aggressive model, risk of overfitting |
| Row sampling | `subsample` | `subsample` | [0.5, 1.0] | Lower subsample -> more robust model, longer training | Higher subsample -> potentially overfit model, faster training |
| Column sampling | `colsample_bytree` | `colsample_bytree` | [0.5, 1.0] | Lower column sampling -> more robust model | Higher column sampling -> potentially overfit model |
| L1 regularization | `reg_alpha` | `reg_alpha` | [0, 1.0] | Stronger L1 -> sparser model, more feature selection | Weaker L1 -> denser model, less feature selection |
| L2 regularization | `reg_lambda` | `reg_lambda` | [0, 5.0] | Stronger L2 -> more conservative model | Weaker L2 -> more aggressive model |
| Split improvement | `gamma` | `min_split_gain` | [0, 1.0] | Higher threshold -> more conservative splitting | Lower threshold -> more aggressive splitting |
| Leaf nodes | `max_leaves` | **`num_leaves`** | [31, 255] | More leaves -> more complex model, risk of overfitting | Fewer leaves -> simpler model, underfitting |
| Minimum samples per leaf | N/A | `min_child_samples` | [5, 30] | More samples -> more stable splits, less overfitting | Fewer samples -> more splits, risk of overfitting |

## Make a table comparing XGBoost and LightGBM.

| Feature | XGBoost | LightGBM |
|---------|---------|----------|
| Tree Growth Strategy | Level-wise, less optimal, more time-consuming | Leaf-wise, best-first, more optimized, deeper tree |
| Speed and Performance | Optimized, parallel processing, cache optimization | Faster, memory-efficient, histogram-based algorithms |
| Memory Usage | More memory consumption | More memory-efficient, histogram-based algorithms |
| Handling of Categorical Features | Latest version supports categorical features | Direct handling without encoding |
| Parallelism and GPU Support | Level-wise parallelism, GPU acceleration | Data and feature parallelism, efficient CPU utilization, GPU support |
| Overfitting Control | Regularization parameters (reg_alpha, reg_lambda) | Leaf-wise growth, num_leaves, min_data_in_leaf, regularization terms |

In [4]:
import optuna
from sklearn.model_selection import train_test_split
import lightgbm as lgb
import xgboost as xgb

xgb.set_config(verbosity=0)

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
def tune_and_plot_lightgbm(func, which):
    X, Y = func(n_samples=1000, noise=0.2, random_state=0)
    X_train, X_test, Y_train, Y_test = train_test_split(
        X, Y, test_size=0.2, random_state=0
    )
    X_train, X_val, Y_train, Y_val = train_test_split(
        X_train, Y_train, test_size=0.2, random_state=0
    )

    def objective(trial):
        params = {
            "n_estimators": trial.suggest_int("n_estimators", 10, 50),
            "max_depth": trial.suggest_int("max_depth", 2, 6),
            "learning_rate": trial.suggest_float("learning_rate", 1e-3, 0.1, log=True),
        }
        model = lgb.LGBMClassifier(**params, verbose=-1)
        model.fit(
            X_train,
            Y_train,
            eval_set=[(X_val, Y_val)],
            callbacks=[lgb.early_stopping(10, first_metric_only=True, verbose=False)],
        )
        return model.score(X_test, Y_test)

    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=20)

    print("Best params:", dict(**study.best_params))

    best_model = lgb.LGBMClassifier(**study.best_params)
    best_model.fit(
        X_train,
        Y_train,
        eval_set=[(X_val, Y_val)],
        callbacks=[lgb.early_stopping(10, first_metric_only=True, verbose=False)],
    )
    Y_pred = best_model.predict(X_test)
    return confusion_matrix(Y_test, Y_pred, which)


lgbc = tune_and_plot_lightgbm(make_circles, "circles")
lgbm = tune_and_plot_lightgbm(make_moons, "moons")

[I 2025-02-09 00:52:34,318] A new study created in memory with name: no-name-2a035ad9-0dc0-40b4-8437-cbbcb37794cc
[I 2025-02-09 00:52:34,462] Trial 0 finished with value: 0.705 and parameters: {'n_estimators': 19, 'max_depth': 5, 'learning_rate': 0.01731471280092051}. Best is trial 0 with value: 0.705.
[I 2025-02-09 00:52:34,473] Trial 1 finished with value: 0.7 and parameters: {'n_estimators': 21, 'max_depth': 4, 'learning_rate': 0.003586280242852283}. Best is trial 0 with value: 0.705.
[I 2025-02-09 00:52:34,487] Trial 2 finished with value: 0.695 and parameters: {'n_estimators': 48, 'max_depth': 4, 'learning_rate': 0.0028074369408103844}. Best is trial 0 with value: 0.705.
[I 2025-02-09 00:52:34,500] Trial 3 finished with value: 0.715 and parameters: {'n_estimators': 32, 'max_depth': 6, 'learning_rate': 0.00300671647278092}. Best is trial 3 with value: 0.715.
[I 2025-02-09 00:52:34,509] Trial 4 finished with value: 0.725 and parameters: {'n_estimators': 29, 'max_depth': 3, 'learning

Best params: {'n_estimators': 40, 'max_depth': 3, 'learning_rate': 0.03513972786503007}
Confusion matrix for circles data:


Predicted,0,1
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,76,16
1,37,71


[I 2025-02-09 00:52:34,798] A new study created in memory with name: no-name-3917f2db-b425-41d8-be5d-cd06948735e7
[I 2025-02-09 00:52:34,813] Trial 0 finished with value: 0.93 and parameters: {'n_estimators': 32, 'max_depth': 5, 'learning_rate': 0.0012519542051542256}. Best is trial 0 with value: 0.93.
[I 2025-02-09 00:52:34,834] Trial 1 finished with value: 0.94 and parameters: {'n_estimators': 43, 'max_depth': 6, 'learning_rate': 0.028513528576031913}. Best is trial 1 with value: 0.94.
[I 2025-02-09 00:52:34,843] Trial 2 finished with value: 0.94 and parameters: {'n_estimators': 15, 'max_depth': 6, 'learning_rate': 0.03789959488288846}. Best is trial 1 with value: 0.94.
[I 2025-02-09 00:52:34,853] Trial 3 finished with value: 0.91 and parameters: {'n_estimators': 29, 'max_depth': 2, 'learning_rate': 0.02111530033943251}. Best is trial 1 with value: 0.94.
[I 2025-02-09 00:52:34,868] Trial 4 finished with value: 0.94 and parameters: {'n_estimators': 23, 'max_depth': 6, 'learning_rate':

Best params: {'n_estimators': 39, 'max_depth': 5, 'learning_rate': 0.09319860773677201}
Confusion matrix for moons data:


Predicted,0,1
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,90,2
1,5,103


In [6]:
def tune_and_plot_xgboost(func, which):
    X, Y = func(n_samples=1000, noise=0.2, random_state=0)
    X_train, X_test, Y_train, Y_test = train_test_split(
        X, Y, test_size=0.2, random_state=0
    )
    X_train, X_val, Y_train, Y_val = train_test_split(
        X_train, Y_train, test_size=0.2, random_state=0
    )

    def objective(trial):
        params = {
            "n_estimators": trial.suggest_int("n_estimators", 10, 50),
            "max_depth": trial.suggest_int("max_depth", 2, 6),
            "learning_rate": trial.suggest_float("learning_rate", 1e-3, 0.1, log=True),
            "use_label_encoder": False,
            "early_stopping_rounds": 10,
            "eval_metric": "logloss",
        }
        model = xgb.XGBClassifier(**params)
        model.fit(X_train, Y_train, eval_set=[(X_val, Y_val)], verbose=False)
        return model.score(X_test, Y_test)

    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=20)

    print("Best params:", dict(**study.best_params))

    best_model = xgb.XGBClassifier(
        **study.best_params, use_label_encoder=False, eval_metric="logloss"
    )
    best_model.fit(X_train, Y_train, eval_set=[(X_val, Y_val)], verbose=False)
    Y_pred = best_model.predict(X_test)
    return confusion_matrix(Y_test, Y_pred, which)


xgbc = tune_and_plot_xgboost(make_circles, "circles")
xgbm = tune_and_plot_xgboost(make_moons, "moons")

[I 2025-02-09 00:52:35,212] A new study created in memory with name: no-name-a6e6b109-173e-4798-8c9f-01f0ce9ee889
[I 2025-02-09 00:52:35,244] Trial 0 finished with value: 0.69 and parameters: {'n_estimators': 35, 'max_depth': 5, 'learning_rate': 0.012979053739306476}. Best is trial 0 with value: 0.69.
[I 2025-02-09 00:52:35,262] Trial 1 finished with value: 0.66 and parameters: {'n_estimators': 17, 'max_depth': 6, 'learning_rate': 0.024290234154806}. Best is trial 0 with value: 0.69.
[I 2025-02-09 00:52:35,278] Trial 2 finished with value: 0.46 and parameters: {'n_estimators': 26, 'max_depth': 3, 'learning_rate': 0.0014916995636006984}. Best is trial 0 with value: 0.69.
[I 2025-02-09 00:52:35,297] Trial 3 finished with value: 0.71 and parameters: {'n_estimators': 35, 'max_depth': 4, 'learning_rate': 0.00704965194077008}. Best is trial 3 with value: 0.71.
[I 2025-02-09 00:52:35,316] Trial 4 finished with value: 0.675 and parameters: {'n_estimators': 19, 'max_depth': 6, 'learning_rate': 

Best params: {'n_estimators': 35, 'max_depth': 4, 'learning_rate': 0.00704965194077008}
Confusion matrix for circles data:


Predicted,0,1
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,76,16
1,42,66


[I 2025-02-09 00:52:35,788] A new study created in memory with name: no-name-0f4ce425-92c2-4d02-b20b-3cddfaf9df32
[I 2025-02-09 00:52:35,801] Trial 0 finished with value: 0.96 and parameters: {'n_estimators': 14, 'max_depth': 6, 'learning_rate': 0.08033287790230817}. Best is trial 0 with value: 0.96.
[I 2025-02-09 00:52:35,824] Trial 1 finished with value: 0.915 and parameters: {'n_estimators': 48, 'max_depth': 2, 'learning_rate': 0.06323707870272523}. Best is trial 0 with value: 0.96.
[I 2025-02-09 00:52:35,845] Trial 2 finished with value: 0.915 and parameters: {'n_estimators': 46, 'max_depth': 3, 'learning_rate': 0.017026624624172475}. Best is trial 0 with value: 0.96.
[I 2025-02-09 00:52:35,858] Trial 3 finished with value: 0.915 and parameters: {'n_estimators': 18, 'max_depth': 3, 'learning_rate': 0.015672139258121505}. Best is trial 0 with value: 0.96.
[I 2025-02-09 00:52:35,878] Trial 4 finished with value: 0.845 and parameters: {'n_estimators': 36, 'max_depth': 4, 'learning_rat

Best params: {'n_estimators': 29, 'max_depth': 4, 'learning_rate': 0.09605257525152569}
Confusion matrix for moons data:


Predicted,0,1
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,90,2
1,4,104


In [15]:
def from_confusion_matrix(df, model, which):
    tn, fp, fn, tp = df.values.ravel()
    precision = tp / (tp + fp)
    recall = tp / (tp + fn)
    f1 = 2 * (precision * recall) / (precision + recall)
    return {
        "model": model,
        "which": which,
        "precision": precision,
        "recall": recall,
        "f1": f1,
        "tp": tp,
        "fp": fp,
        "tn": tn,
        "fn": fn,
    }


results = pd.DataFrame(
    [
        from_confusion_matrix(dtc, "DecisionTree", "circles"),
        from_confusion_matrix(rfc, "RandomForest", "circles"),
        from_confusion_matrix(lgbc, "LightGBM", "circles"),
        from_confusion_matrix(xgbc, "XGBoost", "circles"),
        from_confusion_matrix(dtm, "DecisionTree", "moons"),
        from_confusion_matrix(rfm, "RandomForest", "moons"),
        from_confusion_matrix(lgbm, "LightGBM", "moons"),
        from_confusion_matrix(xgbm, "XGBoost", "moons"),
    ]
)
results

Unnamed: 0,model,which,precision,recall,f1,tp,fp,tn,fn
0,DecisionTree,circles,0.56701,0.555556,0.561224,55,42,59,44
1,RandomForest,circles,0.592233,0.616162,0.60396,61,42,59,38
2,LightGBM,circles,0.816092,0.657407,0.728205,71,16,76,37
3,XGBoost,circles,0.804878,0.611111,0.694737,66,16,76,42
4,DecisionTree,moons,0.939394,0.939394,0.939394,93,6,95,6
5,RandomForest,moons,0.94898,0.939394,0.944162,93,5,96,6
6,LightGBM,moons,0.980952,0.953704,0.967136,103,2,90,5
7,XGBoost,moons,0.981132,0.962963,0.971963,104,2,90,4
