# Q.1

##### Main Difference:

* Euclidean Distance: Measures the straight-line distance between two points in Euclidean space. It is sensitive to differences in all dimensions.
* Manhattan Distance: Measures the distance between two points along the axes at right angles. It is the sum of the absolute differences of their coordinates.

#### Effect on Performance:

* Euclidean Distance: Tends to be more sensitive to large differences in any single dimension, which can dominate the distance calculation. It is suitable for data with fewer dimensions and where all features contribute equally.
* Manhattan Distance: Less sensitive to outliers and large differences in individual dimensions. It is often better for high-dimensional data or when the differences in individual dimensions are more meaningful.

##### Performance Implications:

* KNN Classifier: The choice of distance metric can affect the boundaries between classes. Euclidean distance might lead to smoother boundaries, while Manhattan distance can create more orthogonal boundaries.
* KNN Regressor: The predicted values might vary more smoothly with Euclidean distance, while Manhattan distance can lead to more block-like variations in predictions.

# Q.2

### The techniques to choose the optimal value of K

1. Cross Velidation
2. Elbow Method
3. Grid Search


# Q.3

### Effect on Performance:

The choice of distance metric influences how distances are calculated between points, which in turn affects the neighbors selected for each point.
1. Euclidean Distance: More suitable for data where the features are continuous and where the distance in all dimensions is equally important.
2. Manhattan Distance: Better for data with high dimensions or when features are on different scales or less sensitive to outliers.

### Situations to Choose:

1. Euclidean Distance: Use when the data is normalized, and the features are of similar scale and contribute equally to the outcome.
2. Manhattan Distance: Use when dealing with high-dimensional data, when features have different scales, or when robustness to outliers is needed.

# Q.4

Common Hyperparameters:

Number of Neighbors (k): Determines the number of nearest neighbors to consider. A smaller k can lead to a model that is too sensitive (high variance), while a larger k can lead to oversmoothing (high bias).

Distance Metric: Choices include Euclidean, Manhattan, Minkowski, etc. The choice of metric affects how distances are calculated and which points are considered neighbors.

Weights: Determines whether all neighbors are weighted equally or if closer neighbors have more influence (e.g., uniform vs. distance weighting).

Tuning Hyperparameters:

Grid Search: Systematically explore combinations of hyperparameters using cross-validation to find the optimal settings.
Random Search: Randomly sample hyperparameter combinations and evaluate performance, which can be more efficient for large hyperparameter spaces.
Bayesian Optimization: Use probabilistic models to guide the search for optimal hyperparameters based on past evaluations.

# Q.5 

### Effect of Training Set Size:

Larger Training Set: Generally improves the performance of KNN because it provides more data points to determine the nearest neighbors, leading to better generalization.
Smaller Training Set: May lead to overfitting and poorer generalization because the model has fewer data points to rely on.

### Techniques to Optimize Training Set Size:

Data Augmentation: Generate additional data points through techniques like rotation, scaling, or noise addition to increase the size of the training set.

Feature Selection: Reduce the dimensionality of the data by selecting the most relevant features, which can make the algorithm more efficient and potentially improve performance.

Sampling Techniques: Use stratified sampling to ensure that the training set is representative of the entire dataset.

Cross-Validation: Use cross-validation to make the most of the available data by training and validating the model on different subsets of the data.

# Q.6

### Potential Drawbacks:

Computational Complexity: KNN is computationally expensive, especially for large datasets, because it requires calculating the distance to every training point during prediction.
Curse of Dimensionality: Performance degrades with high-dimensional data because distances become less meaningful.
Sensitivity to Irrelevant Features: Irrelevant or redundant features can distort distance calculations and degrade performance.
Imbalanced Data: KNN can perform poorly if the dataset is imbalanced, as the majority class can dominate the predictions.

### Overcoming Drawbacks:

Efficient Data Structures: Use data structures like KD-trees or Ball-trees to speed up the nearest neighbor search.

Dimensionality Reduction: Apply techniques like Principal Component Analysis (PCA) or t-SNE to reduce the number of dimensions.

Feature Selection: Use feature selection methods to remove irrelevant or redundant features.

Normalization: Normalize or standardize the features to ensure they contribute equally to distance calculations.

Handling Imbalanced Data: Use techniques like SMOTE (Synthetic Minority Over-sampling Technique) to balance the dataset or modify the distance metric to account for class imbalance