# Machine Learning with Tree-Based Models

## Classification with Decision Trees

Decision trees are a type of supervised learning algorithm used for classification tasks. They are called decision trees because they consist of a series of decisions or questions that lead to a final decision or prediction.

In a decision tree, each internal node represents a feature or attribute, each branch represents a decision rule, and each leaf node represents the outcome or class label. The decision rules are based on the values of the features, and the outcome is determined by following the path from the root node to a leaf node.

The process of building a decision tree involves selecting the best feature to split the data at each internal node. The goal is to create a tree that can accurately classify new instances by minimizing the impurity or uncertainty of the predictions.

Decision trees have several advantages, including:

- Easy to understand and interpret
- Can handle both categorical and numerical features
- Can handle missing values
- Can handle irrelevant features

However, decision trees can also suffer from overfitting, especially when the tree becomes too complex or when the training data is noisy. To mitigate overfitting, techniques such as pruning and setting a maximum depth or minimum number of samples per leaf can be used.

Overall, decision trees are a powerful and versatile algorithm for classification tasks, and they form the basis for more advanced tree-based models such as random forests and gradient boosting.


In [13]:
import pandas as pd
import numpy as np
wbc = pd.read_csv('datasets/wbc.csv')
wbc['diagnosis'] = np.where(wbc['diagnosis']=='M',1,0)

X=wbc[['radius_mean', 'concave points_mean']]
y=wbc['diagnosis']

wbc[['radius_mean', 'concave points_mean', 'diagnosis']].head()

Unnamed: 0,radius_mean,concave points_mean,diagnosis
0,17.99,0.1471,1
1,20.57,0.07017,1
2,19.69,0.1279,1
3,11.42,0.1052,1
4,20.29,0.1043,1


In [12]:

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Split the dataset into 80% train, 20% test
X_train, X_test, y_train, y_test= train_test_split(X, y,                                                    test_size=0.2,  
                         stratify=y,                                                                        random_state=1)

SEED = 1

# Instantiate a DecisionTreeClassifier 'dt' with a maximum depth of 6
dt = DecisionTreeClassifier(max_depth=6, random_state=SEED)

# Fit dt to the training set
dt.fit(X_train, y_train)

# Predict test set labels
y_pred = dt.predict(X_test)
print(y_pred[0:5])


[0 0 0 1 0]


In [14]:
# Import accuracy_score
from sklearn.metrics import accuracy_score

# Predict test set labels
y_pred = dt.predict(X_test)

# Compute test set accuracy  
acc = accuracy_score(y_pred, y_test)
print("Test set accuracy: {:.2f}".format(acc))

Test set accuracy: 0.89


### Logistic regression vs classification tree
A classification tree divides the feature space into rectangular regions. In contrast, a linear model such as logistic regression produces only a single linear decision boundary dividing the feature space into two decision regions.

```
# Import LogisticRegression from sklearn.linear_model
from sklearn.linear_model import  LogisticRegression

# Instatiate logreg
logreg = LogisticRegression(random_state=1)

# Fit logreg to the training set
logreg.fit(X_train, y_train)

# Define a list called clfs containing the two classifiers logreg and dt
clfs = [logreg, dt]

# Review the decision regions of the two classifiers
plot_labeled_decision_regions(X_test, y_test, clfs)
```

![Uploading image.png]()


## Decision Tree Learning

The decision tree is a flowchart-like structure where each internal node represents a feature or attribute, each branch represents a decision rule, and each leaf node represents the outcome or class label.

The process of building a decision tree involves recursively partitioning the training data based on the values of the features. The goal is to create partitions that are as pure as possible, meaning that they contain mostly samples of a single class.

To determine the best feature to split on at each internal node, various criteria can be used, such as Gini impurity or information gain. The splitting process continues until a stopping criterion is met, such as reaching a maximum depth or having a minimum number of samples in a node.

Once the decision tree is built, it can be used to make predictions on new, unseen data by traversing the tree from the root node to a leaf node based on the values of the features.


In [19]:
# Import DecisionTreeClassifier from sklearn.tree
from sklearn.tree import DecisionTreeClassifier

# Instantiate dt_entropy, set 'entropy' as the information criterion
dt_entropy = DecisionTreeClassifier(max_depth=8, criterion='entropy', random_state=1)

# Fit dt_entropy to the training set
dt_entropy.fit(X_train, y_train)

# Instantiate dt_entropy, set 'entropy' as the information criterion
dt_gini = DecisionTreeClassifier(max_depth=8, criterion='gini', random_state=1)

# Fit dt_entropy to the training set
dt_gini.fit(X_train, y_train)

In [20]:
# Import accuracy_score from sklearn.metrics
from sklearn.metrics import accuracy_score

# Use dt_entropy to predict test set labels
y_pred = dt_entropy.predict(X_test)

# Evaluate accuracy_entropy
accuracy_entropy = accuracy_score(y_pred, y_test)

# Use dt_entropy to predict test set labels
y_pred = dt_gini.predict(X_test)

# Evaluate accuracy_entropy
accuracy_gini = accuracy_score(y_pred, y_test)

# Print accuracy_entropy
print(f'Accuracy achieved by using entropy: {accuracy_entropy:.3f}')

# Print accuracy_gini
print(f'Accuracy achieved by using the gini index: {accuracy_gini:.3f}')

Accuracy achieved by using entropy: 0.886
Accuracy achieved by using the gini index: 0.921


## Decision Trees for Regression

Decision trees can also be used for regression tasks. Instead of predicting classes, decision trees for regression predict continuous values.


In [3]:
import pandas as pd
import numpy as np
df = pd.read_csv('datasets/auto.csv')

X=df.drop('mpg', axis=1)
y=df['mpg']

X.head()

Unnamed: 0,displ,hp,weight,accel,origin,size
0,250.0,88,3139,14.5,US,15.0
1,304.0,193,4732,18.5,US,20.0
2,91.0,60,1800,16.4,Asia,10.0
3,250.0,98,3525,19.0,US,15.0
4,97.0,78,2188,15.8,Europe,10.0


In [4]:
# Importing LabelEncoder from Sklearn
# library from preprocessing Module.
from sklearn.preprocessing import LabelEncoder
 
# Creating a instance of label Encoder.
le = LabelEncoder()
 
# Using .fit_transform function to fit label
# encoder and return encoded label
label = le.fit_transform(X['origin'])
 
# removing the column 'Purchased' from df
# as it is of no use now.
X.drop("origin", axis=1, inplace=True)
 
# Appending the array to our dataFrame
# with column name 'Purchased'
X["origin"] = label
 
# printing Dataframe
X

Unnamed: 0,displ,hp,weight,accel,size,origin
0,250.0,88,3139,14.5,15.0,2
1,304.0,193,4732,18.5,20.0,2
2,91.0,60,1800,16.4,10.0,0
3,250.0,98,3525,19.0,15.0,2
4,97.0,78,2188,15.8,10.0,1
...,...,...,...,...,...,...
387,250.0,88,3021,16.5,15.0,2
388,151.0,90,2950,17.3,10.0,2
389,98.0,68,2135,16.6,10.0,0
390,250.0,110,3520,16.4,15.0,2


In [6]:
# Import DecisionTreeRegressor
from sklearn.tree import DecisionTreeRegressor
# Import train_test_split 
from sklearn.model_selection import train_test_split
# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE
# Split data into 80% train and 20% test
X_train, X_test, y_train, y_test= train_test_split(X, y,                                                    test_size=0.2,random_state=3)


In [39]:
# Import DecisionTreeRegressor from sklearn.tree
from sklearn.tree import DecisionTreeRegressor

# Instantiate dt
dt = DecisionTreeRegressor(max_depth=8,
             min_samples_leaf=0.13,
            random_state=3)

# Fit dt to the training set
dt.fit(X_train, y_train)

In [40]:
# Import mean_squared_error from sklearn.metrics as MSE
from sklearn.metrics import mean_squared_error as MSE

# Compute y_pred
y_pred = dt.predict(X_test)

# Compute mse_dt
mse_dt = MSE(y_pred, y_test)

# Compute rmse_dt
rmse_dt = np.sqrt(mse_dt)

# Print rmse_dt
print("Test set RMSE of dt: {:.2f}".format(rmse_dt))

Test set RMSE of dt: 4.37


## Bias-Variance Trade-off

The bias-variance trade-off is a fundamental concept in machine learning that deals with the relationship between the bias and variance of a model.

**Bias** refers to the error introduced by approximating a real-world problem with a simplified model. A model with high bias tends to oversimplify the problem and make strong assumptions, leading to underfitting. Underfitting occurs when a model is unable to capture the underlying patterns in the data.

**Variance** refers to the variability of a model's predictions for different training sets. A model with high variance is sensitive to the specific training data and tends to overfit. Overfitting occurs when a model becomes too complex and starts to memorize the training data instead of learning the general patterns.

The goal in machine learning is to find the right balance between bias and variance. A model with high bias may have low accuracy on both the training and test data, while a model with high variance may have high accuracy on the training data but poor generalization to new, unseen data.

To summarize:

- High bias models are simple and make strong assumptions, leading to underfitting.
- High variance models are complex and sensitive to training data, leading to overfitting.
- The bias-variance trade-off aims to find the optimal level of complexity that minimizes both bias and variance, resulting in a model that generalizes well to new data.

Perform K-Fold cross-validation to determine effectiveness of model.If CV error > training error, then model has overfitted and suffers from high variance.

In [9]:
# Import train_test_split from sklearn.model_selection
from sklearn.model_selection import train_test_split, cross_val_score

# Set SEED for reproducibility
SEED = 1

# Split the data into 70% train and 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=SEED)

# Instantiate a DecisionTreeRegressor dt
dt = DecisionTreeRegressor(max_depth=4, min_samples_leaf=0.26, random_state=SEED)

# Compute the array containing the 10-folds CV MSEs
MSE_CV_scores = - cross_val_score(dt, X_train, y_train, cv=10, 
                       scoring='neg_mean_squared_error',
                       n_jobs=-1)

# Compute the 10-folds CV RMSE
RMSE_CV = (MSE_CV_scores.mean())**(0.5)

# Print RMSE_CV
print('CV RMSE: {:.2f}'.format(RMSE_CV))

CV RMSE: 5.14


In [10]:
# Import mean_squared_error from sklearn.metrics as MSE
from sklearn.metrics import mean_squared_error as MSE

# Fit dt to the training set
dt.fit(X_train, y_train)

# Predict the labels of the training set
y_pred_train = dt.predict(X_train)

# Evaluate the training set RMSE of dt
RMSE_train = (MSE(y_train, y_pred_train))**(0.5)

# Print RMSE_train
print('Train RMSE: {:.2f}'.format(RMSE_train))

Train RMSE: 5.15


The training error is roughly equal to the 10-folds CV error obtained in the previous exercise, however ```baseline_RMSE``` = 5.1, therefore the model suffers from high bias because both training and CV error are roughly the same and greater than the baseline error.

## Ensemble Classification

Ensemble classification is a machine learning technique that combines multiple individual classifiers to make predictions. The idea behind ensemble classification is that by combining the predictions of multiple classifiers, the overall performance can be improved compared to using a single classifier.

There are different methods for ensemble classification, such as bagging, boosting, and stacking. These methods vary in how the individual classifiers are trained and combined.

## Hard Voting

Hard voting is a simple and commonly used method for combining the predictions of multiple classifiers in ensemble classification. In hard voting, each classifier in the ensemble makes a prediction, and the final prediction is determined by majority voting. The class label that receives the most votes from the classifiers is selected as the final prediction.

Hard voting can be effective when the individual classifiers in the ensemble are diverse and make independent errors. By combining their predictions, the ensemble can achieve better overall accuracy and robustness.


In [12]:
# Import functions to compute accuracy and split data
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
# Import models, including VotingClassifier meta-model
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.ensemble import VotingClassifier

# Set seed for reproducibility
SEED=1

# Instantiate lr
lr = LogisticRegression(random_state=SEED)

# Instantiate knn
knn = KNN(n_neighbors=27)

# Instantiate dt
dt = DecisionTreeClassifier(min_samples_leaf=0.13, random_state=SEED)

# Define the list classifiers
classifiers = [('Logistic Regression', lr), ('K Nearest Neighbours', knn), ('Classification Tree', dt)]


```
# Iterate over the pre-defined list of classifiers
for clf_name, clf in classifiers:    
 
    # Fit clf to the training set
    clf.fit(X_train, y_train)    
   
    # Predict y_pred
    y_pred = clf.predict(X_test)
    
    # Calculate accuracy
    accuracy = accuracy_score(y_pred, y_test) 
   
    # Evaluate clf's accuracy on the test set
    print('{:s} : {:.3f}'.format(clf_name, accuracy))
    
Logistic Regression : 0.741
K Nearest Neighbours : 0.701
Classification Tree : 0.707

# Import VotingClassifier from sklearn.ensemble
from sklearn.ensemble import VotingClassifier

# Instantiate a VotingClassifier vc
vc = VotingClassifier(estimators=classifiers)     

# Fit vc to the training set
vc.fit(X_train, y_train)   

# Evaluate the test set predictions
y_pred = vc.predict(X_test)

# Calculate accuracy score
accuracy = accuracy_score(y_pred, y_test)
print('Voting Classifier: {:.3f}'.format(accuracy))

Voting Classifier: 0.764
```

**Bootstrapping** and **bagging** are techniques in machine learning that involve resampling data to improve the performance and stability of models, especially in ensemble methods like Random Forest. Here's a brief explanation of each:

### Bootstrapping:

Bootstrapping is a statistical resampling technique where multiple datasets of the same size are created by randomly sampling data points with replacement from the original dataset.
This process generates multiple subsets (bootstrapped samples) that may contain duplicate data points and omit others.
Bootstrapping is commonly used for estimating statistical properties of a dataset and can be applied to train multiple models, each on a different bootstrapped sample, to assess model stability and variability.


### Bagging (Bootstrap Aggregating):

Bagging is an ensemble machine learning technique that uses bootstrapping to improve model performance and reduce overfitting.
In bagging, multiple base models (e.g., decision trees) are trained independently on different bootstrapped samples from the training data.
Predictions from these base models are then combined, often by averaging or voting, to make a final prediction.
Bagging helps reduce the variance of the model, making it more robust and less prone to overfitting, which can lead to improved generalization on unseen data.

```
# Import DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier

# Import BaggingClassifier
from sklearn.ensemble import BaggingClassifier

# Instantiate dt
dt = DecisionTreeClassifier(random_state=1)

# Instantiate bc
bc = BaggingClassifier(base_estimator=dt, n_estimators=50, random_state=1)

# Fit bc to the training set
bc.fit(X_train, y_train)

# Predict test set labels
y_pred = bc.predict(X_test)

# Evaluate acc_test
acc_test = accuracy_score(y_pred, y_test)
print('Test set accuracy of bc: {:.2f}'.format(acc_test)) 

Test set accuracy of bc: 0.67
```

**Out-of-Bag (OOB) Evaluation** is used to estimate the performance of an ensemble model, such as Random Forest, without the need for a separate validation dataset. 

- When building an ensemble model through bagging (Bootstrap Aggregating), multiple base models (e.g., decision trees) are trained on different bootstrapped subsets of the training data.
- Not all data points from the original dataset are included in each bootstrap sample. Some data points are left out or remain "out of the bag."
- OOB evaluation takes advantage of these out-of-bag data points. For each base model, the data points that were not included in its respective bootstrap sample are used to evaluate the model's performance. Essentially, these out-of-bag data points serve as a validation set for the model.
- OOB evaluations from all base models are then aggregated, typically through averaging or voting, to provide an overall estimate of the ensemble model's performance.
- OOB evaluation is valuable because it gives an unbiased estimate of how well the ensemble model is likely to perform on unseen data, without the need for a separate validation set. This makes it a convenient and efficient method for assessing the quality of bagged models and helps in hyperparameter tuning or model selection.

```
# Import DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier

# Import BaggingClassifier
from sklearn.ensemble import BaggingClassifier

# Instantiate dt
dt = DecisionTreeClassifier(min_samples_leaf=8, random_state=1)

# Instantiate bc
bc = BaggingClassifier(base_estimator=dt, 
            n_estimators=50,
            oob_score=True,
            random_state=1)
            
# Fit bc to the training set 
bc.fit(X_train, y_train)

# Predict test set labels
y_pred = bc.predict(X_test)

# Evaluate test set accuracy
acc_test = accuracy_score(y_pred, y_test)

# Evaluate OOB accuracy
acc_oob = bc.oob_score_

# Print acc_test and acc_oob
print('Test set accuracy: {:.3f}, OOB accuracy: {:.3f}'.format(acc_test, acc_oob))

Test set accuracy: 0.698, OOB accuracy: 0.702
```


**Random Forests** is based on the concept of bagging (Bootstrap Aggregating) and decision trees.

- Ensemble Technique: Random Forests is an ensemble learning method that combines multiple decision trees to make predictions. Each decision tree is trained on a different subset of the data using a technique called bootstrapping.

- Bootstrapping: Bootstrapping involves creating multiple random subsets of the original dataset, with replacement. Each decision tree is trained on one of these subsets. Some data points are included multiple times in a subset, while others may be left out.

- Random Feature Selection: In addition to bootstrapping, Random Forests introduce randomness by selecting a random subset of features at each node of the decision tree. This helps decorrelate the trees and reduces overfitting.

- Voting or Averaging: Once all the decision trees are trained, they can be used to make predictions on new, unseen data. For classification tasks, the mode (most frequent class) of the individual tree predictions is taken as the final prediction. For regression tasks, the predictions from all trees are averaged.

- Reduced Variance: The key advantage of Random Forests is that they tend to have lower variance compared to a single decision tree. This makes them less prone to overfitting and more robust when dealing with noisy or complex datasets.

- Highly Effective: Random Forests are known for their high predictive accuracy and are widely used in various machine learning applications, including classification, regression, and feature selection.

```
# Import RandomForestRegressor
from sklearn.ensemble import RandomForestRegressor

# Instantiate rf
rf = RandomForestRegressor(n_estimators=25,
            random_state=2)
            
# Fit rf to the training set    
rf.fit(X_train, y_train) 

# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE

# Predict the test set labels
y_pred = rf.predict(X_test)

# Evaluate the test set RMSE
rmse_test = MSE(y_test, y_pred)**0.5

# Print rmse_test
print('Test set RMSE of rf: {:.2f}'.format(rmse_test))

# Create a pd.Series of features importances
importances = pd.Series(data=rf.feature_importances_,
                        index= X_train.columns)

# Sort importances
importances_sorted = importances.sort_values()

# Draw a horizontal barplot of importances_sorted
importances_sorted.plot(kind='barh', color='lightgreen')
plt.title('Features Importances')
plt.show()

![Uploading image.png]()
```

**Boosting** is a machine learning ensemble technique that combines multiple weak learners to create a strong learner. It is a sequential process where each weak learner is trained to correct the mistakes made by the previous weak learners. The final prediction is made by combining the predictions of all the weak learners.

**AdaBoost (Adaptive Boosting)** is a specific implementation of boosting. In AdaBoost, each weak learner is assigned a weight based on its performance. The weak learners with higher weights have more influence on the final prediction. AdaBoost iteratively trains weak learners on different subsets of the training data, adjusting the weights of the training instances to focus on the difficult-to-classify examples. The final prediction is made by combining the predictions of all the weak learners, weighted by their individual performance.


```
# Import DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier

# Import AdaBoostClassifier
from sklearn.ensemble import AdaBoostClassifier

# Instantiate dt
dt = DecisionTreeClassifier(max_depth=2, random_state=1)

# Instantiate ada
ada = AdaBoostClassifier(base_estimator=dt, n_estimators=180, random_state=1)

# Fit ada to the training set
ada.fit(X_train, y_train)

# Compute the probabilities of obtaining the positive class
y_pred_proba = ada.predict_proba(X_test)[:,1]

# Import roc_auc_score
from sklearn.metrics import roc_auc_score

# Evaluate test-set roc_auc_score
ada_roc_auc = roc_auc_score(y_test, y_pred_proba)

# Print roc_auc_score
print('ROC AUC score: {:.2f}'.format(ada_roc_auc))

ROC AUC score: 0.70
```

## Gradient Boosting

Gradient Boosting is a machine learning technique that combines multiple weak models (typically decision trees) to create a strong predictive model. It is a type of ensemble learning method where each weak model is trained to correct the mistakes made by the previous models.

The main idea behind gradient boosting is to iteratively add new models to the ensemble, with each new model focusing on the examples that were poorly predicted by the previous models. This is achieved by fitting the new model to the residuals (the differences between the actual and predicted values) of the previous models.

The key concept in gradient boosting is the use of a loss function and an optimization algorithm to minimize the loss. The loss function measures the difference between the predicted and actual values, and the optimization algorithm determines how the new model is fitted to the residuals.

Gradient boosting has become a popular technique in various machine learning tasks, such as regression, classification, and ranking. It is known for its ability to handle complex relationships and produce accurate predictions.

Some popular implementations of gradient boosting include XGBoost, LightGBM, and CatBoost.

```
# Import GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor

# Instantiate gb
gb = GradientBoostingRegressor(max_depth=4, 
            n_estimators=200,
            random_state=2)
            
# Fit gb to the training set
gb.fit(X_train,y_train)

# Predict test set labels
y_pred = gb.predict(X_test)

# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE

# Compute MSE
mse_test = MSE(y_pred, y_test)

# Compute RMSE
rmse_test = mse_test ** 0.5

# Print RMSE
print('Test set RMSE of gb: {:.3f}'.format(rmse_test))

Test set RMSE of gb: 52.071
```

## Stochastic Gradient Boosting

Stochastic Gradient Boosting is an extension of the Gradient Boosting algorithm that introduces randomness into the training process. Instead of using the entire training set to train each base learner, stochastic gradient boosting randomly selects a subset of the training data for each base learner. This introduces variability and helps to reduce overfitting.

The main steps in stochastic gradient boosting are as follows:

1. Initialize the model with a constant value.
2. For each iteration:
   - Sample a random subset of the training data.
   - Fit a base learner to the sampled data.
   - Compute the negative gradient of the loss function with respect to the current model predictions.
   - Update the model by adding a scaled version of the negative gradient.
3. Repeat steps 2 until a specified number of iterations or a stopping criterion is reached.

Stochastic gradient boosting can be used for both regression and classification problems. It is a powerful algorithm that often achieves better performance than traditional gradient boosting, especially when dealing with large datasets.


```
# Import GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor

# Instantiate sgbr
sgbr = GradientBoostingRegressor(max_depth=4, 
            subsample=0.9,
            max_features=0.75,
            n_estimators=200,
            random_state=2)
            
# Fit sgbr to the training set
sgbr.fit(X_train, y_train)

# Predict test set labels
y_pred = sgbr.predict(X_test)

# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE

# Compute test set MSE
mse_test = MSE(y_test,y_pred)

# Compute test set RMSE
rmse_test = mse_test ** 0.5

# Print rmse_test
print('Test set RMSE of sgbr: {:.3f}'.format(rmse_test))

Test set RMSE of sgbr: 49.621
```

## Tuning Hyperparameters for CART Models

When working with decision tree models, it is important to tune the hyperparameters to optimize the model's performance. In this section, we will explore how to tune the hyperparameters for CART (Classification and Regression Trees) models.

There are several hyperparameters that can be tuned for CART models, including:

- `max_depth`: The maximum depth of the tree. A deeper tree can capture more complex relationships in the data, but it can also lead to overfitting.
- `min_samples_split`: The minimum number of samples required to split an internal node. Increasing this value can prevent overfitting.
- `min_samples_leaf`: The minimum number of samples required to be at a leaf node. Increasing this value can prevent overfitting.
- `max_features`: The number of features to consider when looking for the best split. A smaller value can reduce overfitting.

To tune these hyperparameters, we can use techniques such as grid search or random search. Grid search involves specifying a grid of hyperparameter values and evaluating the model's performance for each combination of values. Random search involves randomly sampling from a distribution of hyperparameter values and evaluating the model's performance.

Let's see an example of how to tune the hyperparameters for a CART model using grid search.


In [0]:
# Import necessary libraries
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV

# Create a decision tree regressor
dt = DecisionTreeRegressor()

# Define the hyperparameter grid
param_grid = {
    'max_depth': [3, 5, 7],
    'min_samples_split': [2, 4, 6],
    'min_samples_leaf': [1, 2, 3],
    'max_features': [None, 'sqrt', 'log2']
}

# Perform grid search
grid_search = GridSearchCV(dt, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Get the best hyperparameters
best_params = grid_search.best_params_

# Print the best hyperparameters
print('Best Hyperparameters:', best_params)

## Tuning Random Forest Hyperparameters

Tuning the hyperparameters of a Random Forest model is an important step in optimizing its performance. By adjusting the hyperparameters, we can find the best combination of settings that result in the most accurate and robust model.

There are several hyperparameters that can be tuned in a Random Forest model, including:

- `n_estimators`: The number of trees in the forest.
- `max_depth`: The maximum depth of each tree.
- `min_samples_split`: The minimum number of samples required to split an internal node.
- `min_samples_leaf`: The minimum number of samples required to be at a leaf node.
- `max_features`: The number of features to consider when looking for the best split.

To tune the hyperparameters, we can use techniques such as grid search or random search. Grid search involves specifying a grid of possible values for each hyperparameter and evaluating the model's performance for each combination of values. Random search involves randomly sampling from the hyperparameter space and evaluating the model's performance for each sampled combination of values.

Let's see an example of how to tune the hyperparameters of a Random Forest model using grid search.

In [0]:
# Import necessary libraries
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

# Create a Random Forest regressor
rf = RandomForestRegressor()

# Define the hyperparameter grid
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 5, 7],
    'min_samples_split': [2, 4, 6],
    'min_samples_leaf': [1, 2, 3],
    'max_features': ['auto', 'sqrt', 'log2']
}

# Perform grid search
grid_search = GridSearchCV(rf, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Get the best hyperparameters
best_params = grid_search.best_params_

# Print the best hyperparameters
print('Best Hyperparameters:', best_params)