<a href="https://colab.research.google.com/github/mithun-martin/MACHINE-LEARNING/blob/main/HYPER_PARAMETER_TUNING.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#GRID SEARCH CV

In [2]:
#GridSearchCV is a brute-force technique for hyperparameter tuning. It trains the model using all possible combinations of specified
#hyperparameter values to find the best-performing setup.
#It is slow and uses a lot of computer power which makes it hard to use with big datasets or many settings.

In [3]:
# It works using below steps:

# Create a grid of potential values for each hyperparameter.
# Train the model for every combination in the grid.
# Evaluate each model using cross-validation.
# Select the combination that gives the highest score.
# For example if we want to tune two hyperparameters C and Alpha for a Logistic Regression Classifier model with the following sets of values:
# C = [0.1, 0.2, 0.3, 0.4, 0.5]
# Alpha = [0.01, 0.1, 0.5, 1.0]

# The grid search technique will construct multiple versions of the model with all possible combinations of C and Alpha, resulting in a total of 5 * 4 = 20 different models.
# The best-performing combination is then chosen.


In [6]:
#eg of hyperparametr tuning for a randoemforwest model

In [8]:
#no need to tune all te hyperpatameters only tune inp needed ones

In [9]:
# 📌 Important Hyperparameters for Common ML Algorithms

# 🔵 Logistic Regression:
# - C: Inverse of regularization strength → typical values: [0.01, 0.1, 1, 10]
# - penalty: Type of regularization → ['l1', 'l2', 'elasticnet']
# - solver: Optimization algorithm → ['liblinear', 'saga']
# 👉 Most important: C, penalty

# 🟢 K-Nearest Neighbors (KNN):
# - n_neighbors: Number of neighbors to consider → [3, 5, 7, 9]
# - weights: How to weight neighbors → ['uniform', 'distance']
# - metric: Distance metric → ['euclidean', 'manhattan']
# 👉 Most important: n_neighbors

# 🟣 Support Vector Machine (SVM):
# - C: Regularization parameter → [0.1, 1, 10]
# - kernel: Kernel type → ['linear', 'rbf', 'poly']
# - gamma: Kernel coefficient (for 'rbf', 'poly') → ['scale', 'auto', 0.1, 1, 10]
# 👉 Most important: C, kernel, gamma

# 🟠 Decision Tree:
# - max_depth: Maximum depth of the tree → [None, 5, 10, 20]
# - min_samples_split: Minimum samples to split a node → [2, 5, 10]
# - criterion: Split quality measure → ['gini', 'entropy']
# 👉 Most important: max_depth

# 🟡 Random Forest:
# - n_estimators: Number of trees → [50, 100, 200]
# - max_depth: Max depth of each tree → [None, 5, 10, 20]
# - max_features: Features to consider at each split → ['auto', 'sqrt']
# - min_samples_split: Minimum samples to split a node → [2, 5, 10]
# 👉 Most important: n_estimators, max_depth

# 🟡 K-Means Clustering:
# - n_clusters: Number of clusters to form → [2, 3, 4, 5, 6]
# - init: Method for initialization → ['k-means++', 'random']
# - max_iter: Maximum number of iterations → [100, 200, 300]
# 👉 Most important: n_clusters


In [19]:

from sklearn.ensemble import RandomForestClassifier
# 📦 Import libraries
from sklearn.datasets import load_iris        # sample dataset
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score


In [23]:
# 📥 Load sample dataset
data = load_iris()
X = data.data     # features
y = data.target   # target labels

# ✂️ Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [24]:
model = RandomForestClassifier()

In [25]:
#1)defining paras 2 tune
params = {
    "n_estimators" : [50,100],
    "max_depth":[5,10]
}

In [26]:
#2)Set up Grid Search with 3-fold cross-validation
grid_search = GridSearchCV(estimator=model,param_grid=params,cv = 3)

In [16]:
# 👉 estimator = model → which model to use (Random Forest here)
# 👉 param_grid = param_grid → hyperparameter grid we created
# 👉 cv=3 → use 3-fold cross-validation (data split into 3 parts for testing/validation)

In [27]:
# 📌 Step 3: Fit the grid search to the training data
grid_search.fit(X_train,y_train)

In [28]:
#4)best printing
print("BEST PARAS ARE: ", grid_search.best_params_)

BEST PARAS ARE:  {'max_depth': 5, 'n_estimators': 100}
