1. What is K-Nearest Neighbors (KNN) and how does it work in both
classification and regression problems?

ANS-

Here’s a detailed explanation of **K-Nearest Neighbors (KNN)** and how it works for **both classification and regression**:

---

## **1. What is K-Nearest Neighbors (KNN)?**

* **KNN** is a **non-parametric, instance-based (lazy) learning algorithm**.
* **Non-parametric:** It does not make assumptions about the underlying data distribution.
* **Instance-based:** It does not learn a model during training; instead, it stores the training data and makes predictions only when a new sample is encountered.

**Key idea:**

> A new data point is classified or predicted based on the **“K” most similar (nearest) points** in the training dataset.

---

## **2. How KNN Works**

**Step 1: Choose K**

* K = number of nearest neighbors to consider.
* Small K → sensitive to noise (high variance)
* Large K → smoother predictions, may miss local patterns (high bias)

**Step 2: Compute distance**

* Common metrics:

  * Euclidean distance (most common)
  * Manhattan distance
  * Minkowski distance
* Calculate the distance between the new sample and all training points.

**Step 3: Find K nearest neighbors**

* Sort all training points by distance.
* Select the K closest points.

**Step 4: Make a prediction**

* **Classification:**

  * Assign the class that appears **most frequently** among the K neighbors (majority vote).
  * Example: K = 5 neighbors → 3 “Yes”, 2 “No” → predict “Yes”.
* **Regression:**

  * Take the **average (or weighted average)** of the target values of the K neighbors.
  * Example: K = 3 neighbors → target values = [100, 120, 110] → predicted value = 110.

---

## **3. Advantages of KNN**

* Simple and intuitive.
* Non-parametric → works for complex decision boundaries.
* Naturally handles multi-class classification.

---

## **4. Disadvantages of KNN**

* Computationally expensive for large datasets (distance calculation to all points).
* Sensitive to **irrelevant features** and **feature scaling** → usually requires normalization or standardization.
* Choice of K and distance metric greatly affects performance.

---

## **5. Summary Table**

| Aspect     | Classification                     | Regression                                   |
| ---------- | ---------------------------------- | -------------------------------------------- |
| Prediction | Majority vote of K neighbors       | Average (or weighted) of K neighbors’ values |
| Output     | Discrete class label               | Continuous numeric value                     |
| Example    | Predict if a customer will default | Predict house price                          |
| Key metric | Accuracy, F1-score                 | Mean Squared Error, R²                       |

---

### **6. One-Line Explanation**

> KNN predicts the outcome of a new data point based on the labels or values of its K nearest neighbors in the training set, using a distance metric to measure similarity.



2. What is the Curse of Dimensionality and how does it affect KNN
performance?

ANS-

Here’s a detailed explanation of the **Curse of Dimensionality** and its impact on **K-Nearest Neighbors (KNN)**:

---

## **1. What is the Curse of Dimensionality?**

The **Curse of Dimensionality** refers to the problems that arise when working with **high-dimensional data** (many features).

Key points:

1. **Data sparsity:**

   * As the number of dimensions increases, the volume of the feature space grows **exponentially**.
   * Data points become **sparse**, meaning neighbors are far apart, and “closeness” loses meaning.

2. **Distance measures become less meaningful:**

   * Most distance metrics (like Euclidean distance) become **less discriminative** in high dimensions.
   * The difference between the nearest and farthest points becomes very small.

3. **Increased computational cost:**

   * More dimensions → more calculations for distance → slower predictions, especially for KNN.

---

## **2. How it Affects KNN Performance**

KNN relies on the assumption that **similar points are close in the feature space**.

**Problems in high dimensions:**

1. **Neighbors are far away:**

   * The “nearest neighbor” may be almost as far as the farthest point → weakens the predictive power.

2. **Noise sensitivity:**

   * High-dimensional data often includes irrelevant or redundant features → KNN may consider noisy dimensions, degrading accuracy.

3. **Need for more data:**

   * To maintain the same density of points in higher dimensions, you need exponentially more training data.
   * Otherwise, KNN struggles to find meaningful neighbors.

---

## **3. Mitigation Strategies**

1. **Feature selection:**

   * Keep only the most relevant features to reduce dimensionality.

2. **Dimensionality reduction:**

   * Techniques like **PCA (Principal Component Analysis)**, **t-SNE**, or **UMAP** can reduce dimensions while preserving structure.

3. **Distance metric adjustment:**

   * Use **weighted distances** or **Mahalanobis distance** that account for feature relevance.

4. **Increase dataset size:**

   * More data points help maintain meaningful neighborhoods in higher dimensions.

---

### **4. Intuition / Analogy**

> Imagine trying to find the closest friend in a huge city.
> In 2D (like a small town), your closest friend is easy to locate.
> In 100 dimensions (like hundreds of features), everyone seems equally far away, making it hard to define “nearest.”

---

### **5. One-Line Exam Answer**

> The Curse of Dimensionality occurs when high-dimensional data makes distance metrics less meaningful and data sparse, causing KNN to struggle to identify truly “near” neighbors and degrading its performance.




3. What is Principal Component Analysis (PCA)? How is it different from
feature selection?

ANS-

Here’s a clear explanation of **Principal Component Analysis (PCA)** and how it differs from **feature selection**:

---

## **1. What is PCA?**

**Principal Component Analysis (PCA)** is a **dimensionality reduction technique** used to transform high-dimensional data into a **lower-dimensional space** while retaining as much **variance (information)** as possible.

**Key points:**

* PCA creates **new features** called **principal components (PCs)**.
* Each principal component is a **linear combination of the original features**.
* The components are **orthogonal (uncorrelated)** to each other.
* The first principal component captures the **most variance**, the second captures the next most, and so on.

**Steps in PCA:**

1. Standardize the data (mean=0, variance=1).
2. Compute the covariance matrix.
3. Compute eigenvectors and eigenvalues of the covariance matrix.
4. Sort eigenvectors by eigenvalues (largest first) → these are the principal components.
5. Project the original data onto the top **k components** to reduce dimensionality.

**Use cases:**

* Reduce dimensionality to speed up training.
* Remove multicollinearity in features.
* Visualize high-dimensional data in 2D or 3D.

---

## **2. How PCA Works vs Feature Selection**

| Aspect                       | PCA (Feature Extraction)                                                | Feature Selection                                            |
| ---------------------------- | ----------------------------------------------------------------------- | ------------------------------------------------------------ |
| **Goal**                     | Create new features (principal components) that summarize original data | Select a subset of original features                         |
| **Output**                   | Linear combinations of original features                                | Original features only                                       |
| **Variance Preservation**    | Maximizes variance in new components                                    | Doesn’t change features, may or may not capture all variance |
| **Dimensionality Reduction** | Always reduces dimensions                                               | Reduces by selecting fewer features                          |
| **Correlation Handling**     | Removes correlation among components                                    | Correlated features may remain                               |
| **Interpretability**         | Harder to interpret (components are combinations)                       | Easier to interpret (original features kept)                 |

---

## **3. Intuition / Analogy**

> PCA is like **rotating and projecting a cloud of points** in a high-dimensional space onto directions where the points are most spread out, keeping the “essence” of the data.
> Feature selection is like **choosing the most important existing axes** without creating new ones.

---

### **4. One-Line Exam Answer**

> PCA reduces dimensionality by creating new orthogonal features (principal components) that capture maximum variance, whereas feature selection reduces dimensionality by picking a subset of the original features without transformation.





4. What are eigenvalues and eigenvectors in PCA, and why are they
important?

ANS-

Here’s a clear explanation of **eigenvalues and eigenvectors in PCA** and why they are important:

---

## **1. Eigenvectors and Eigenvalues: The Basics**

In linear algebra, for a square matrix (A):

[
A \mathbf{v} = \lambda \mathbf{v}
]

* **(\mathbf{v})** → **eigenvector**: a direction in space that remains pointing in the same direction after transformation by (A).
* **(\lambda)** → **eigenvalue**: the scaling factor by which the eigenvector is stretched or shrunk.

Intuitively:

* Eigenvectors = directions of data variation
* Eigenvalues = magnitude of variation along those directions

---

## **2. How They Are Used in PCA**

**Step 1: Compute covariance matrix**

* Let (X) be your standardized data ((mean=0)).
* Covariance matrix (C = \frac{1}{n-1} X^T X) captures how features vary together.

**Step 2: Find eigenvectors and eigenvalues of the covariance matrix**

* **Eigenvectors** → directions along which data varies the most (principal components).
* **Eigenvalues** → variance explained by each eigenvector.

**Step 3: Sort eigenvectors by eigenvalues**

* Higher eigenvalue → more variance explained → more important component.

**Step 4: Project data onto top eigenvectors**

* Reduces dimensionality while keeping most information.

---

## **3. Why They Are Important**

1. **Eigenvectors = principal directions**

   * They define the new axes (principal components) along which the data is spread.

2. **Eigenvalues = measure of importance**

   * Larger eigenvalues mean the component captures more variance.
   * Helps decide how many components to keep (e.g., choose top K that explain 95% variance).

3. **Dimensionality reduction**

   * Eigenvectors allow us to transform the data into fewer dimensions without losing significant information.

---

### **4. Intuition / Analogy**

> Imagine a cloud of 2D points shaped like an ellipse:
>
> * The **long axis** is the direction of maximum variance → **first eigenvector**, eigenvalue = length along that axis.
> * The **short axis** is the second eigenvector, eigenvalue = length along that axis.
>   PCA projects the data onto these axes to reduce dimensions while keeping the spread.

---

### **5. One-Line Exam Answer**

> In PCA, eigenvectors define the principal directions of data variance, and eigenvalues indicate how much variance each direction captures, guiding which components to retain for dimensionality reduction.



5.  How do KNN and PCA complement each other when applied in a single
pipeline?

ANS-

Here’s a clear explanation of how **KNN and PCA** can complement each other when used together in a single pipeline:

---

## **1. The Problem**

* **K-Nearest Neighbors (KNN)** relies on **distance metrics** (usually Euclidean distance) to find neighbors.
* **Curse of Dimensionality:** In high-dimensional datasets, distances become less meaningful, and KNN performance drops.
* **High-dimensional data** may contain redundant or irrelevant features that confuse KNN.

---

## **2. How PCA Helps**

**Principal Component Analysis (PCA)** reduces dimensionality while retaining the **most variance in the data**.

* **Key idea:** Project high-dimensional data onto a lower-dimensional space that captures the main patterns.
* **Benefits for KNN:**

  1. Reduces noise and irrelevant features → more meaningful distances.
  2. Reduces computational cost → fewer dimensions to calculate distances.
  3. Mitigates the curse of dimensionality → KNN neighbors are more informative.

---

## **3. How They Work Together**

**Pipeline:**

1. **Step 1:** Apply PCA to transform the original features into top principal components.
2. **Step 2:** Feed the lower-dimensional data into KNN.
3. **Step 3:** KNN computes distances in the PCA-transformed space and predicts labels (classification) or values (regression).

**Intuition:**

> PCA acts as a **preprocessing step** that summarizes the most important information, while KNN uses that information to make more reliable predictions.

---

## **4. Advantages of the Pipeline**

| Benefit             | Explanation                                                                            |
| ------------------- | -------------------------------------------------------------------------------------- |
| Faster computation  | Fewer dimensions → less distance calculations                                          |
| Better accuracy     | Irrelevant/noisy features are removed, improving neighbor selection                    |
| Reduced overfitting | KNN is less likely to overfit noisy high-dimensional data                              |
| Visualization       | PCA components can be plotted in 2D/3D to see data clusters, aiding KNN interpretation |

---

### **5. One-Line Summary**

> Using PCA before KNN reduces dimensionality and noise, making distances more meaningful, improving KNN accuracy, and reducing computation time.



6.  Train a KNN Classifier on the Wine dataset with and without feature
scaling. Compare model accuracy in both cases.

ANS-

# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# -------------------------------
# 1. KNN WITHOUT feature scaling
# -------------------------------
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred_no_scaling = knn.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

# -------------------------------
# 2. KNN WITH feature scaling
# -------------------------------
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn_scaled = KNeighborsClassifier(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = knn_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# -------------------------------
# Print Results
# -------------------------------
print


In [None]:
7. Train a PCA model on the Wine dataset and print the explained variance
ratio of each principal component.


ANS-

# Import required libraries
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train PCA
pca = PCA()
X_pca = pca.fit_transform(X_scaled)

# Print explained variance ratio for each principal component
explained_variance = pca.explained_variance_ratio_
for i, ratio in enumerate(explained_variance):
    print(f"Principal Component {i+1}: {ratio:.4f}")


In [None]:
8. Train a KNN Classifier on the PCA-transformed dataset (retain top 2 components). Compare the accuracy with the original dataset.

ANS-

# Import required libraries
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split original dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# -------------------------------
# 1. KNN on ORIGINAL dataset
# -------------------------------
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn_original = KNeighborsClassifier(n_neighbors=5)
knn_original.fit(X_train_scaled, y_train)
y_pred_original = knn_original.predict(X_test_scaled)
accuracy_original = accuracy_score(y_test, y_pred_original)

# -------------------------------
# 2. KNN on PCA-transformed dataset (top 2 components)
# -------------------------------
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

knn_pca = KNeighborsClassifier(n_neighbors=5)
knn_pca.fit(X_train_pca, y_train)
y_pred_pca = knn_pca.predict(X_test_pca)
accuracy_pca = accuracy_score(y_test, y_pred_pca)

# -------------------------------
# Print results
# -------------------------------
print("KNN Accuracy on Original Dataset:", accuracy_original)
print("KNN Accuracy on PCA-transformed Dataset (2 components):", accuracy_pca)


In [None]:
9. Train a KNN Classifier with different distance metrics (euclidean,
manhattan) on the scaled Wine dataset and compare the results.

ANS-
# Import required libraries
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# -------------------------------
# KNN with Euclidean distance
# -------------------------------
knn_euclidean = KNeighborsClassifier(n_neighbors=5, metric='euclidean')
knn_euclidean.fit(X_train_scaled, y_train)
y_pred_euclidean = knn_euclidean.predict(X_test_scaled)
accuracy_euclidean = accuracy_score(y_test, y_pred_euclidean)

# -------------------------------
# KNN with Manhattan distance
# -------------------------------
knn_manhattan = KNeighborsClassifier(n_neighbors=5, metric='manhattan')
knn_manhattan.fit(X_train_scaled, y_train)
y_pred_manhattan = knn_manhattan.predict(X_test_scaled)
accuracy_manhattan = accuracy_score(y_test, y_pred_manhattan)

# -------------------------------
# Print results
# -------------------------------
print("KNN Accuracy with Euclidean distance:", accuracy_euclidean)
print("KNN Accuracy with Manhattan distance:", accuracy_manhattan)


In [None]:
10. You are working with a high-dimensional gene expression dataset to
classify patients with different types of cancer.
Due to the large number of features and a small number of samples, traditional models
overfit.
Explain how you would:
● Use PCA to reduce dimensionality
● Decide how many components to keep
● Use KNN for classification post-dimensionality reduction
● Evaluate the model
● Justify this pipeline to your stakeholders as a robust solution for real-world
biomedical data

ANS-

# Import required libraries
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# -------------------------------
# KNN with Euclidean distance
# -------------------------------
knn_euclidean = KNeighborsClassifier(n_neighbors=5, metric='euclidean')
knn_euclidean.fit(X_train_scaled, y_train)
y_pred_euclidean = knn_euclidean.predict(X_test_scaled)
accuracy_euclidean = accuracy_score(y_test, y_pred_euclidean)

# -------------------------------
# KNN with Manhattan distance
# -------------------------------
knn_manhattan = KNeighborsClassifier(n_neighbors=5, metric='manhattan')
knn_manhattan.fit(X_train_scaled, y_train)
y_pred_manhattan = knn_manhattan.predict(X_test_scaled)
accuracy_manhattan = accuracy_score(y_test, y_pred_manhattan)

# -------------------------------
# Print results
# -------------------------------
print("KNN Accuracy with Euclidean distance:", accuracy_euclidean)
print("KNN Accuracy with Manhattan distance:", accuracy_manhattan)
