Absolutely! Let's walk through an example where **cross-validation combined with grid search** helps improve the performance of a **Decision Tree classifier** by tuning its hyperparameters.

---

## 🌳 Why This Matters for Decision Trees

Decision trees are **prone to overfitting**, especially when they are allowed to grow deep without constraint. Using **cross-validation + hyperparameter tuning**, we can find the **best depth and splitting strategy**, which can significantly improve **generalization**.

---

### ✅ Full Example: Decision Tree with Grid Search and Cross-Validation

We'll use the **Iris dataset** again to keep things simple, and tune:

* `max_depth`: how deep the tree can grow
* `min_samples_split`: minimum samples to split a node

---

#### 🧪 Code Example

```python
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

# Load the data
X, y = load_iris(return_X_y=True)

# Base model
dtree = DecisionTreeClassifier(random_state=42)

# Define hyperparameter grid
param_grid = {
    'max_depth': [None, 2, 3, 4, 5],
    'min_samples_split': [2, 5, 10]
}

# Set up grid search with 5-fold cross-validation
grid_search = GridSearchCV(dtree, param_grid, cv=5)

# Fit the model
grid_search.fit(X, y)

# Output results
print("Best parameters:", grid_search.best_params_)
print("Best cross-validated accuracy:", grid_search.best_score_)
```

---

### 📝 Sample Output

```
Best parameters: {'max_depth': 3, 'min_samples_split': 2}
Best cross-validated accuracy: 0.9733
```

---

### 🔍 How This Helps

| Approach             | Description                                 | Accuracy |
| -------------------- | ------------------------------------------- | -------- |
| Default DecisionTree | Overfits if `max_depth` is too large        | \~0.93   |
| Tuned via CV         | Better generalization (e.g., `max_depth=3`) | \~0.97   |

By evaluating multiple combinations and averaging across folds, we:

* **Avoid overfitting** by choosing a more constrained tree
* **Improve test accuracy**
* Ensure the model is **robust to small fluctuations in the data**

---

Would you like to visualize the decision tree or see training-vs-validation accuracy for different depths?


In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [3]:

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

# Base model
dtree = DecisionTreeClassifier(random_state=42)

# Define hyperparameter grid
param_grid = {
    'max_depth': [None, 2, 3, 4, 5],
    'min_samples_split': [2, 5, 10]
}

# Set up grid search with 5-fold cross-validation
grid_search = GridSearchCV(dtree, param_grid, cv=5)

# Fit the model
grid_search.fit(X, y)

# Output results
print("Best parameters:", grid_search.best_params_)
print("Best cross-validated accuracy:", grid_search.best_score_)


Best parameters: {'max_depth': 3, 'min_samples_split': 2}
Best cross-validated accuracy: 0.9733333333333334


Support vector machines

In [5]:
from sklearn import datasets
from sklearn.svm import SVC

# Load data
X, y = datasets.load_iris(return_X_y=True)

# Train/test split


# Train SVM classifier
clf = SVC(kernel='linear')  # try 'rbf', 'poly' too
clf.fit(X_train, y_train)

# Predict
print(clf.score(X_test, y_test))


0.9333333333333333


kNN

In [6]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train k-NN with k=3
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predict and score
print(knn.score(X_test, y_test))


0.9333333333333333


Naive bayes

In [7]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train Naive Bayes
nb = GaussianNB()
nb.fit(X_train, y_train)

# Predict and evaluate
print(nb.score(X_test, y_test))


1.0
