<a href="https://colab.research.google.com/github/iqraiqbal4142/Ai_Note_book/blob/main/KNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cross-validation
is a resampling technique used in machine learning and statistics to assess and validate the performance of a predictive model. Its primary purpose is to provide an estimate of a model's performance on an independent dataset by partitioning the available data into subsets for training and testing. Cross-validation helps ensure that the model's performance is robust and not overly dependent on a specific random split of the data.

Here are the key concepts and steps involved in cross-validation:

#Data Splitting:
The dataset is divided into two (or more) mutually exclusive subsets:

#Training Set:
This subset is used to train the machine learning model. It serves as the data from which the model learns the underlying patterns and relationships.
#Testing (Validation) Set:
This subset is used to evaluate the model's performance. It serves as the data to validate how well the model generalizes to unseen examples.
#K-Fold Cross-Validation:
The most common form of cross-validation is K-Fold Cross-Validation, where the dataset is divided into K equal-sized "folds" or subsets. The model is trained and tested K times, each time using a different fold as the validation set while the remaining K-1 folds are used for training. This process helps ensure that the model is evaluated on different portions of the data.

#Leave-One-Out Cross-Validation (LOOCV):
 In LOOCV, K is set to the number of data points in the dataset. For each iteration, one data point is held out as the validation set, and the model is trained on the remaining data points. This process is repeated for each data point, and the results are averaged.

#Stratified Cross-Validation:
 In classification tasks, it's often important to maintain the class distribution when splitting the data. Stratified cross-validation ensures that each fold has a similar class distribution as the entire dataset, reducing the risk of biased evaluation.

#Performance Metrics:
Various performance metrics can be used to evaluate the model during cross-validation, depending on the problem type. Common metrics include accuracy, precision, recall, F1-score, mean squared error (MSE), and others.

#Model Selection and Hyperparameter Tuning:
 Cross-validation is often used to compare different models or tune hyperparameters. By comparing models' performance across multiple folds, you can select the best-performing model or find optimal hyperparameter settings.

#Final Model Evaluation:
 Once the model selection and hyperparameter tuning are complete, the final model is trained on the entire dataset (or a larger portion if additional data is available) and evaluated on an independent test set. This step provides an estimate of the model's performance in a real-world scenario.

Cross-validation is a crucial technique for assessing a model's performance, especially when dealing with limited data. It helps in detecting issues like overfitting (a model that performs well on the training data but poorly on unseen data) and provides a more reliable estimate of a model's generalization performance. Common libraries in Python, such as scikit-learn, provide tools for implementing various cross-validation techniques.




In [None]:
# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data  # Features
y = iris.target  # Target variable (class labels)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Feature scaling (optional but recommended for KNN)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create a KNN classifier
k = 3  # Number of neighbors (you can adjust this value)
knn_classifier = KNeighborsClassifier(n_neighbors=k)

# Train the classifier on the training data
knn_classifier.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn_classifier.predict(X_test)

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 1.0
