#### Q1. Write a Python code to implement the KNN classifier algorithm on load_iris dataset in sklearn.datasets.

#### solve
Python code that implements the K-Nearest Neighbors (KNN) classifier on the Iris dataset using sklearn's load_iris function:

In [1]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Labels

# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling (important for KNN)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create a KNN classifier model (using 5 nearest neighbors by default)
knn = KNeighborsClassifier(n_neighbors=5)

# Train the model
knn.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Output the results
print(f"Accuracy: {accuracy * 100:.2f}%")
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)


Accuracy: 100.00%

Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



####
Explanation:

- Loading the Iris dataset: The load_iris() function from sklearn.datasets provides the Iris dataset, which contains 150 samples of iris flowers, each with 4 features and 3 target classes.

- Splitting the dataset: The data is split into training and testing sets using the train_test_split() function. Here, 80% of the data is used for training and 20% for testing.

- Feature scaling: KNN is distance-based, so it's important to scale the features to ensure each feature contributes equally to the distance metric. The StandardScaler() is used to standardize the features.

- KNN model creation: The KNN classifier is created using KNeighborsClassifier(), and the number of neighbors is set to 5 by default.

- Model training and prediction: The fit() method trains the KNN model on the training data, and predict() makes predictions on the test data.

- Model evaluation: The model's accuracy, confusion matrix, and classification report are displayed using accuracy_score(), confusion_matrix(), and classification_report().

#### Q2. Write a Python code to implement the KNN regressor algorithm on load_boston dataset in sklearn.datasets.

#### solve
 Python code that implements the K-Nearest Neighbors (KNN) regressor algorithm on the Boston housing dataset using sklearn's load_boston function. However, it's worth noting that the load_boston dataset is deprecated due to ethical concerns regarding the dataset, so I'll provide an alternative using fetch_openml to get the Boston housing data from OpenML.

In [2]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Load the Boston housing dataset
boston = fetch_openml(name='boston', version=1)
X = pd.DataFrame(boston.data, columns=boston.feature_names)  # Features
y = pd.Series(boston.target, name='MEDV')  # Target variable (median value of owner-occupied homes in $1000s)

# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling (important for KNN)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create a KNN regressor model (using 5 nearest neighbors by default)
knn_regressor = KNeighborsRegressor(n_neighbors=5)

# Train the model
knn_regressor.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn_regressor.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Output the results
print(f"Mean Squared Error: {mse:.2f}")
print(f"R^2 Score: {r2:.2f}")


Mean Squared Error: 20.61
R^2 Score: 0.72


Explanation:

- Loading the Boston housing dataset: The dataset is fetched using fetch_openml() instead of load_boston(). The features and target variable are extracted into X and y, respectively.

- Splitting the dataset: The data is split into training and testing sets using the train_test_split() function, where 80% of the data is used for training and 20% for testing.

- Feature scaling: The features are standardized using StandardScaler(), which is important for KNN since it is sensitive to the scale of the data.

- KNN regressor model creation: A KNN regressor model is created using KNeighborsRegressor(), with the number of neighbors set to 5.

- Model training and prediction: The fit() method trains the KNN model on the training data, and the predict() method is used to make predictions on the test data.

- Model evaluation: The model's performance is evaluated using Mean Squared Error (MSE) and R² score. The mean_squared_error() and r2_score() functions calculate these

#### Q3. Write a Python code snippet to find the optimal value of K for the KNN classifier algorithm using cross-validation on load_iris dataset in sklearn.datasets.

#### solve
To find the optimal value of K for the KNN classifier using cross-validation, we can use GridSearchCV from sklearn.model_selection. Here's a Python code snippet that implements this approach on the Iris dataset:

In [3]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Labels

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create a KNN model
knn = KNeighborsClassifier()

# Create a dictionary of possible values for K
param_grid = {'n_neighbors': range(1, 31)}  # K from 1 to 30

# Use GridSearchCV to find the optimal value of K
grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Get the best K value
best_k = grid_search.best_params_['n_neighbors']
print(f"The optimal value of K is: {best_k}")

# Train the KNN model with the optimal K
knn_best = KNeighborsClassifier(n_neighbors=best_k)
knn_best.fit(X_train, y_train)

# Make predictions and evaluate the model
y_pred = knn_best.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy with K={best_k}: {accuracy * 100:.2f}%")


The optimal value of K is: 3
Accuracy with K=3: 100.00%


####
Explanation:
- Data Loading: The load_iris() function loads the Iris dataset.

- Data Splitting: The dataset is split into training (80%) and testing (20%) sets using train_test_split().

- Feature Scaling: The features are standardized using StandardScaler() to ensure that all features contribute equally.

- K Range: The param_grid dictionary contains a range of values for n_neighbors (K) from 1 to 30.

- Cross-Validation: GridSearchCV performs cross-validation with 5 folds (cv=5) to find the best K value by checking accuracy for each K.

- Best K and Final Evaluation: After finding the optimal K, the model is retrained with that value, and the accuracy on the test set is printed.

#### Q4. Implement the KNN regressor algorithm with feature scaling on load_boston dataset in sklearn.datasets.

#### solve
 Python code snippet that implements the K-Nearest Neighbors (KNN) regressor algorithm on the Boston housing dataset using feature scaling. Since the load_boston function is deprecated due to ethical concerns, we will use fetch_openml to load the dataset from OpenML.

In [4]:
# Import necessary libraries
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Load the Boston housing dataset
boston = fetch_openml(name='boston', version=1, as_frame=True)
X = boston.data  # Features
y = boston.target  # Target variable (median value of owner-occupied homes in $1000s)

# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling (important for KNN)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create a KNN regressor model (using 5 nearest neighbors by default)
knn_regressor = KNeighborsRegressor(n_neighbors=5)

# Train the model
knn_regressor.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn_regressor.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Output the results
print(f"Mean Squared Error: {mse:.2f}")
print(f"R^2 Score: {r2:.2f}")


Mean Squared Error: 20.61
R^2 Score: 0.72


Explanation:

- Loading the Dataset: The Boston housing dataset is loaded using fetch_openml(). The as_frame=True parameter ensures that the data is returned as a pandas DataFrame, making it easier to work with.

- Data Splitting: The dataset is split into training (80%) and testing (20%) sets using the train_test_split() function.

- Feature Scaling: The features are standardized using StandardScaler(), which is crucial for KNN since it is sensitive to the scale of the data.

- KNN Regressor Model: A KNN regressor model is created using KNeighborsRegressor(), with the number of neighbors set to 5.

- Training and Prediction: The model is trained using the training data with the fit() method, and predictions are made on the test set with the predict() method.

- Model Evaluation: The model's performance is evaluated using Mean Squared Error (MSE) and R² score to quantify how well the model predicts housing prices.

#### Q5. Write a Python code snippet to implement the KNN classifier algorithm with weighted voting on load_iris dataset in sklearn.datasets.

#### solve
Python code snippet that implements the K-Nearest Neighbors (KNN) classifier algorithm with weighted voting on the Iris dataset using sklearn. In weighted voting, the contributions of neighbors to the prediction are weighted based on their distances, with closer neighbors having a larger influence on the final prediction.

In [5]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Labels

# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create a KNN classifier model with weighted voting
knn = KNeighborsClassifier(n_neighbors=5, weights='distance')

# Train the model
knn.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Output the results
print(f"Accuracy: {accuracy * 100:.2f}%")
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)


Accuracy: 100.00%

Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



####
Explanation:

- Loading the Iris Dataset: The Iris dataset is loaded using load_iris(), which provides both the feature data (X) and the target labels (y).

- Data Splitting: The dataset is split into training (80%) and testing (20%) sets using the train_test_split() function.

- Feature Scaling: The features are standardized using StandardScaler(), which is important for KNN since it is sensitive to the scale of the data.

- KNN Classifier with Weighted Voting: A KNN classifier is created using KNeighborsClassifier(), with n_neighbors set to 5 and weights='distance' to enable weighted voting based on the distances of the neighbors.

- Model Training and Prediction: The model is trained on the training data using the fit() method, and predictions are made on the test data using the predict() method.

- Model Evaluation: The model's accuracy, confusion matrix, and classification report are printed to evaluate its performance.

#### Q6. Implement a function to standardise the features before applying KNN classifier.

#### solve
Below is a Python function that standardizes the features of a dataset before applying the KNN classifier. This function uses StandardScaler from sklearn.preprocessing to scale the features. The function takes in the features and the target labels, splits the data into training and testing sets, standardizes the features, and then fits a KNN classifier.

In [6]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

def standardize_and_knn(X, y, n_neighbors=5, test_size=0.2, random_state=42):
    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)

    # Initialize the StandardScaler
    scaler = StandardScaler()

    # Fit the scaler on the training data and transform both train and test data
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    # Create the KNN classifier with weighted voting
    knn = KNeighborsClassifier(n_neighbors=n_neighbors, weights='distance')

    # Train the model
    knn.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = knn.predict(X_test)

    # Evaluate the model
    accuracy = accuracy_score(y_test, y_pred)
    conf_matrix = confusion_matrix(y_test, y_pred)
    class_report = classification_report(y_test, y_pred)

    # Output the results
    print(f"Accuracy: {accuracy * 100:.2f}%")
    print("\nConfusion Matrix:")
    print(conf_matrix)
    print("\nClassification Report:")
    print(class_report)

# Example usage
if __name__ == "__main__":
    # Load the Iris dataset
    iris = load_iris()
    X = iris.data  # Features
    y = iris.target  # Labels

    # Call the function
    standardize_and_knn(X, y, n_neighbors=5)


Accuracy: 100.00%

Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



####
Explanation of the Function:

- Function Definition: The function standardize_and_knn() takes four parameters:

x : The features of the dataset.

y : The taget labels.

N-neighbors : The proportion of the dataset to include in the test split (default is 5).

test_size : The proportion of the dataset to include in the test split(default is 0.2)

random_state: Controls the shuffling applied to the data before applying the split (default is 42).

- Data Splitting: The dataset is split into training and testing sets using train_test_split().

- Standardization: The StandardScaler is used to standardize the training data, and the same transformation is applied to the test data.

- KNN Classifier Creation: A KNN classifier is created with weighted voting enabled.

- Model Training and Prediction: The model is trained on the standardized training data, and predictions are made on the standardized test data.

- Model Evaluation: The function prints the accuracy, confusion matrix, and classification report to evaluate the model's performance.

#### Q7. Write a Python function to calculate the euclidean distance between two points.

#### solve
Python function that calculates the Euclidean distance between two points in n-dimensional space. The function takes two lists or tuples representing the coordinates of the two points and returns the distance between them.

In [7]:
import math

def euclidean_distance(point1, point2):
    """
    Calculate the Euclidean distance between two points.

    Parameters:
    - point1: A list or tuple representing the coordinates of the first point.
    - point2: A list or tuple representing the coordinates of the second point.

    Returns:
    - Distance: The Euclidean distance between point1 and point2.
    """
    # Ensure both points have the same dimension
    if len(point1) != len(point2):
        raise ValueError("Both points must have the same number of dimensions.")

    # Calculate the sum of squared differences
    distance = math.sqrt(sum((a - b) ** 2 for a, b in zip(point1, point2)))

    return distance

# Example usage
if __name__ == "__main__":
    point_a = (3, 4)
    point_b = (0, 0)

    distance = euclidean_distance(point_a, point_b)
    print(f"The Euclidean distance between {point_a} and {point_b} is: {distance:.2f}")


The Euclidean distance between (3, 4) and (0, 0) is: 5.00


####
Explanation of the Function:
Function Definition: The function euclidean_distance() takes two parameters, point1 and point2, which should be lists or tuples representing the coordinates of the two points.

Dimension Check: The function checks if both points have the same number of dimensions. If not, it raises a ValueError.

Distance Calculation: The function calculates the Euclidean distance using the formula:

          Distance = root Sigma(x1-x2)^2

It uses a generator expression to compute the squared differences between corresponding coordinates of the two points, sums them up, and then takes the square root.

Return Value: The function returns the calculated distance.

#### Q8. Write a Python function to calculate the manhattan distance between two points.

#### solve
Python function that calculates the Manhattan distance between two points in n-dimensional space. The Manhattan distance (also known as the L1 distance or taxicab distance) is the sum of the absolute differences of their coordinates.

In [8]:
def manhattan_distance(point1, point2):
    """
    Calculate the Manhattan distance between two points.

    Parameters:
    - point1: A list or tuple representing the coordinates of the first point.
    - point2: A list or tuple representing the coordinates of the second point.

    Returns:
    - Distance: The Manhattan distance between point1 and point2.
    """
    # Ensure both points have the same dimension
    if len(point1) != len(point2):
        raise ValueError("Both points must have the same number of dimensions.")

    # Calculate the sum of absolute differences
    distance = sum(abs(a - b) for a, b in zip(point1, point2))

    return distance

# Example usage
if __name__ == "__main__":
    point_a = (3, 4)
    point_b = (1, 1)

    distance = manhattan_distance(point_a, point_b)
    print(f"The Manhattan distance between {point_a} and {point_b} is: {distance}")


The Manhattan distance between (3, 4) and (1, 1) is: 5


In [None]:
####
Explanation of the Function:
Function Definition: The function manhattan_distance() takes two parameters, point1 and point2, which should be lists or tuples representing the coordinates of the two points.

Dimension Check: The function checks if both points have the same number of dimensions. If not, it raises a ValueError.

Distance Calculation: The function calculates the Manhattan distance using the formula: