# **Cross-Validation**
# What is Cross-Validation?
Cross-validation is a statistical method used to evaluate how well a machine learning model generalizes to an independent dataset. The data is split into multiple parts, with each part being used as a test set while the remaining data is used for training.

**Process:**
Split the Data: Divide the data into multiple parts (folds).
Training and Testing: For each fold, use one part as test data and the rest as training data.
Generalization: Cross-validation helps to reduce overfitting and ensures that the model’s accuracy is more generalized across unseen data.




In [2]:
from sklearn.model_selection import cross_val_score

from sklearn.linear_model import LogisticRegression

from sklearn.datasets import load_iris



# Load dataset

iris = load_iris()

X, y = iris.data, iris.target



# Initialize the model

model = LogisticRegression(max_iter=200)



# Apply 5-fold cross-validation

scores = cross_val_score(model, X, y, cv=5)

print(f"Cross-validation scores: {scores}")

Cross-validation scores: [0.96666667 1.         0.93333333 0.96666667 1.        ]


# Hyperparameter Tuning
**What are Hyperparameters?**
Hyperparameters are parameters that are set before the learning process begins, unlike model parameters that are learned from the data. Hyperparameters affect the model’s performance and generalization ability. Examples include:

Learning rate
Batch size
Number of neighbors (in KNN)
Number of trees (in Random Forest)

# Methods of Hyperparameter Tuning:
**Manual Tuning:** Adjusting hyperparameters one by one and evaluating performance. This can be time-consuming but is useful for understanding the effect of each parameter.
                            **Grid Search:** A technique where you define a parameter grid and evaluate every possible combination. It’s computationally expensive but thorough.
**Randomized Search:** Instead of evaluating all combinations, random search evaluates a random sample of combinations. This is more computationally efficient than grid search.

In [5]:
from sklearn.model_selection import GridSearchCV

from sklearn.ensemble import RandomForestClassifier



# Define model and parameters

model = RandomForestClassifier()

param_grid = {'n_estimators': [10, 50, 100], 'max_depth': [None, 10, 20]}



# Apply Grid Search

grid_search = GridSearchCV(model, param_grid, cv=5)

grid_search.fit(X, y)

print("Best parameters found:", grid_search.best_params_)



Best parameters found: {'max_depth': 10, 'n_estimators': 50}


In [4]:
from sklearn.model_selection import RandomizedSearchCV

from sklearn.ensemble import RandomForestClassifier

from scipy.stats import randint



# Define model and parameters

model = RandomForestClassifier()

param_dist = {'n_estimators': randint(10, 100), 'max_depth': randint(1, 20)}



# Apply Randomized Search

random_search = RandomizedSearchCV(model, param_dist, n_iter=100, cv=5)

random_search.fit(X, y)

print("Best parameters found:", random_search.best_params_)

Best parameters found: {'max_depth': 11, 'n_estimators': 63}


# **Final Thoughts**
Cross-validation and hyperparameter tuning are indispensable techniques in machine learning. Cross-validation helps ensure the model generalizes well to unseen data, while hyperparameter tuning optimizes the model’s performance. By applying these methods systematically, you can significantly improve your model’s accuracy, robustness, and reliability in real-world applications.

This Lecture provides a comprehensive guide on implementing cross-validation and hyperparameter tuning in machine learning projects, along with practical examples and challenges associated with these techniques.