## Question 1
### What is K-Nearest Neighbors (KNN) and how does it work in both classification and regression problems?

**Answer:**

K-Nearest Neighbors (KNN) is a supervised, instance-based learning algorithm that makes predictions based on the similarity between data points. It does not build an explicit model during training; instead, it stores the entire training dataset and performs computation only at prediction time.

**Working Principle:**
1. Choose the value of *k* (number of neighbors).
2. Compute the distance between the test point and all training points (commonly Euclidean distance).
3. Select the *k* closest neighbors.
4. Aggregate their outputs to make a prediction.

**Classification:**
- The predicted class is the majority class among the *k* neighbors.
- Example: If 3 out of 5 nearest neighbors belong to class A, the test sample is classified as class A.

**Regression:**
- The predicted value is the average (or weighted average) of the target values of the *k* neighbors.
- Example: Predicting house prices using the mean price of nearby houses.

**Advantages:** Simple, non-parametric, effective for small datasets.

**Limitations:** Computationally expensive, sensitive to noise and feature scaling.

## Question 2
### What is the Curse of Dimensionality and how does it affect KNN performance?

**Answer:**

The Curse of Dimensionality refers to problems that arise when working with high-dimensional data. As the number of features increases, the volume of the feature space grows exponentially, causing data points to become sparse.

**Effect on KNN:**
- Distances between points become less meaningful.
- The difference between nearest and farthest neighbors shrinks.
- KNN struggles to find truly "close" neighbors.

**Consequences:**
- Reduced classification accuracy.
- Increased computational cost.
- Higher risk of overfitting.

**Mitigation:** Dimensionality reduction techniques such as PCA and proper feature scaling.

## Question 3
### What is Principal Component Analysis (PCA)? How is it different from feature selection?

**Answer:**

Principal Component Analysis (PCA) is an unsupervised dimensionality reduction technique used in data analysis and machine learning to reduce the number of features in a dataset while preserving as much important information (variance) as possible. PCA achieves this by transforming the original correlated features into a new set of uncorrelated variables called principal components.

**Feature Selection Explained:**
Feature selection involves choosing the most relevant features from the original dataset without transforming them. Common techniques include:
- Filter methods (correlation, chi-square).
- Wrapper methods (recursive feature elimination).
- Embedded methods (Lasso, tree-based importance).

**Key Differences:**
- PCA creates new features, while feature selection chooses existing ones.
- PCA focuses on maximizing variance, whereas feature selection focuses on feature relevance.
- PCA may reduce interpretability, but feature selection keeps features understandable.

## Question 4
### What are eigenvalues and eigenvectors in PCA, and why are they important?

**Answer:**

In Principal Component Analysis (PCA), eigenvalues and eigenvectors are fundamental mathematical concepts used to identify the most important directions in which the data varies.

PCA is based on the covariance matrix of the dataset, which captures how features vary with respect to each other. Eigenvalues and eigenvectors are computed from this covariance matrix.

**Importance:**
1) Dimensionality Reduction - Eigenvalues allow us to select only those components that explain most of the variance, reducing dimensionality while preserving information.
2) Noise Reduction- Components with very small eigenvalues often represent noise and can be discarded.
3) Explained Variance -Eigenvalues are used to compute the explained variance ratio, which helps in selecting components (e.g., 95% variance rule).
4) Efficient Feature Transformation-
Eigenvectors define the transformation from original features to principal components, ensuring minimal information loss.

## Question 5
### How do KNN and PCA complement each other when applied in a single pipeline?

**Answer:**

- PCA is a dimensionality reduction technique that transforms high-dimensional data into fewer uncorrelated principal components.

- KNN is a distance-based algorithm whose performance depends on meaningful distance calculations.

- In high-dimensional data, KNN suffers from the curse of dimensionality, where distances become less reliable.

- PCA reduces dimensions, removes noise, and eliminates correlated features.

- After PCA, distances between data points are more meaningful, improving KNN accuracy.

- PCA reduces computational cost and overfitting.

- The combined PCA + KNN pipeline improves efficiency, generalization, and model performance.

- This pipeline is widely used in image processing, gene expression analysis, and text data.

## Question 6
### Train a KNN Classifier on the Wine dataset with and without feature scaling. Compare model accuracy in both cases.

In [2]:

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Without scaling
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
acc_no_scaling = accuracy_score(y_test, knn.predict(X_test))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn.fit(X_train_scaled, y_train)
acc_scaling = accuracy_score(y_test, knn.predict(X_test_scaled))

acc_no_scaling, acc_scaling


(0.7222222222222222, 0.9444444444444444)

## Question 7
### Train a PCA model on the Wine dataset and print the explained variance ratio of each principal component

In [3]:

from sklearn.decomposition import PCA

pca = PCA()
pca.fit(X_train_scaled)
pca.explained_variance_ratio_


array([0.35900066, 0.18691934, 0.11606557, 0.07371716, 0.0665386 ,
       0.04854582, 0.04195042, 0.02683922, 0.0234746 , 0.01889734,
       0.01715943, 0.01262928, 0.00826257])

## Question 8
### Train a KNN Classifier on the PCA-transformed dataset (retain top 2 components). Compare the accuracy with the original dataset.


In [5]:

pca2 = PCA(n_components=2)
X_train_pca = pca2.fit_transform(X_train_scaled)
X_test_pca = pca2.transform(X_test_scaled)

knn.fit(X_train_pca, y_train)
acc_pca = accuracy_score(y_test, knn.predict(X_test_pca))

acc_pca


1.0

## Question 9
### Train a KNN Classifier with different distance metrics (euclidean,manhattan) on the scaled Wine dataset and compare the results.

In [6]:

results = {}
for metric in ['euclidean', 'manhattan']:
    knn = KNeighborsClassifier(n_neighbors=5, metric=metric)
    knn.fit(X_train_scaled, y_train)
    results[metric] = accuracy_score(y_test, knn.predict(X_test_scaled))

results


{'euclidean': 0.9444444444444444, 'manhattan': 0.9444444444444444}

## Question 10
### You are working with a high-dimensional gene expression dataset to
classify patients with different types of cancer.
Due to the large number of features and a small number of samples, traditional models
overfit.

**Answer:**

1. **Using PCA to Reduce Dimensionality:**
- Apply PCA to transform gene features into principal components
- Retains maximum variance while removing noise and redundancy
- Reduces risk of overfitting
2. **Deciding Number of Components:**
- Use explained variance ratio (90â€“95%)
- Analyze cumulative variance / elbow plot
- Validate using cross-validation performance
3. **Using KNN After PCA:**
- Train KNN on PCA-transformed data
- PCA improves distance calculations
- Tune value of k using validation
4. **Model Evaluation:**
- Cross-validation for reliability
- Metrics:
    - Accuracy
    - Precision & Recall (important in cancer diagnosis)
    - F1-score
    - Confusion Matrix
    - ROC-AUC

5. **Justification to Stakeholders:**
- Reduces overfitting
- Computationally efficient
- Robust for small datasets
- Clinically reliable and scalable
- Suitable for real-world biomedical applications