**Question 1:**  What is K-Nearest Neighbors (KNN) and how does it work in both classification and regression problems?

**Answer:-**
K-Nearest Neighbors (KNN):

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm used for both classification and regression tasks. It is a non-parametric and instance-based learning method, meaning it doesn’t assume an underlying probability distribution and makes predictions based on stored training data instead of building an explicit model.

How KNN Works:
Store training data: No explicit training phase (lazy learning).
Calculate distance: For a new data point, calculate the distance (commonly Euclidean, Manhattan, or Minkowski) to all training points.
Find nearest neighbors: Pick the k closest data points (neighbors).
Make prediction:
Classification: Assign the class most common among neighbors (majority voting).
Regression: Take the average (or weighted average) of neighbors’ values.

KNN in Classification:

Suppose we want to classify a new data point into one of the categories (e.g., "cat" or "dog").

KNN finds the k nearest labeled data points.

Uses majority voting → whichever class occurs most among the k neighbors becomes the prediction.

👉 Example: If k=5 neighbors → 3 cats + 2 dogs → prediction = cat.
KNN in Regression:
Instead of voting, KNN takes the average (mean/median) of the neighbors’ values.
Can also use weighted KNN where closer neighbors have more influence (weights ∝ 1/distance).

👉 Example: Predicting house prices:
Neighbors’ prices = [100k, 120k, 130k, 110k, 140k]
Prediction = average = 120k
Key Points about KNN:
Choice of k:
Small k → sensitive to noise (overfitting).
Large k → smoother decision boundary but may underfit.
Distance metric matters (Euclidean is most common).
Feature scaling is important (since distance is sensitive to feature magnitude).
Computation cost is high at prediction time (lazy learner).
✅ In summary:
KNN Classification → majority vote among k nearest neighbors.
KNN Regression → average (or weighted average) of neighbors’ values.


**Question 2:** What is the Curse of Dimensionality and how does it affect KNN performance?

**Answer:-**Curse of Dimensionality (CoD):

The curse of dimensionality refers to the problems and challenges that arise when working with data in high-dimensional spaces (many features).
As the number of dimensions (features) increases, data becomes sparse and distance measures become less meaningful.

KNN relies heavily on distance metrics (like Euclidean distance) to find nearest neighbors.
But in high dimensions:

Distances lose meaning

In high dimensions, the difference between the nearest and farthest neighbors becomes very small.
That means "close" and "far" points become indistinguishable → making KNN unreliable.

👉 Example: In a 1D line, points can be clearly close/far. But in 100D space, all points tend to look equally distant.
Data sparsity
As dimensions grow, the data spreads out more.
To cover the space properly, you need exponentially more data.
With limited data, KNN might not find truly "similar" neighbors.
Increased computational cost
More dimensions = more calculations per distance = slower predictions.
Memory usage also grows since KNN stores the whole dataset.
Feature scaling sensitivity
If features are not normalized, dimensions with larger ranges dominate the distance calculation even more in high dimensions.
Impact on KNN Performance:
Classification: Decision boundaries become fuzzy → higher misclassification rate.
Regression: Predictions become less accurate since "neighbors" are not truly close.
Overfitting risk: With high dimensions, small variations in data can mislead KNN.
How to Mitigate Curse of Dimensionality in KNN:
Dimensionality Reduction:
PCA (Principal Component Analysis)
t-SNE, UMAP for visualization
Autoencoders (deep learning)
Feature Selection: Keep only the most important features.
Use distance metrics better suited for high dimensions (e.g., cosine similarity instead of Euclidean).
In summary:
The curse of dimensionality makes KNN struggle because in high dimensions, distance metrics lose their discriminatory power, data becomes sparse, and computation is heavy — leading to poor accuracy and efficiency.

**Question 3:** What is Principal Component Analysis (PCA)? How is it different from feature selection?

**Answer:-**
Principal Component Analysis (PCA):
PCA is an unsupervised dimensionality reduction technique that transforms the data into a new coordinate system.
It finds new axes (principal components) that capture the maximum variance in the data.
These new axes are linear combinations of the original features.
The first principal component captures the most variance, the second captures the next most (orthogonal to the first), and so on.
👉 So instead of working with all features, PCA projects data into fewer dimensions while preserving most of the information.
Steps of PCA (simplified):
Standardize the data (mean=0, variance=1).
Compute the covariance matrix of features.
Find eigenvalues & eigenvectors of the covariance matrix.
Select top-k eigenvectors → these form the new feature space.
Transform original data into this lower-dimensional space.
Example:
Suppose we have 10 features.
PCA might find that 90% of the variance is explained by just 3 new components.
We can reduce from 10 → 3 dimensions without losing much information.

**PCA vs Feature Selection:**

| **Aspect**           | **PCA (Feature Extraction)**                                                             | **Feature Selection**                                              |
| -------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------ |
| **Definition**       | Creates new features (principal components) as linear combinations of original features. | Selects a subset of existing features without changing them.       |
| **Supervision**      | Unsupervised (does not use labels).                                                      | Can be supervised (based on importance to target) or unsupervised. |
| **Interpretability** | Low → principal components are abstract (combinations of many features).                 | High → original features are kept, so they’re meaningful.          |
| **Goal**             | Reduce dimensionality while keeping variance.                                            | Keep only the most relevant features.                              |
| **Example**          | Turn 100 correlated features into 10 independent components.                             | From 100 features, keep only 15 most important ones.               |


**Analogy:**

Feature selection = Choosing the best players from a team.
PCA = Blending players’ skills into a few "super-players" (combinations) that represent the whole team.

**Question 4:** What are eigenvalues and eigenvectors in PCA, and why are they important?

**Answer:-**
Eigenvalues and Eigenvectors in PCA

When we apply PCA, the core step is computing the eigenvalues and eigenvectors of the covariance matrix of the data.

Eigenvectors: Directions of the new feature space (principal components).

Eigenvalues: Amount of variance (information) captured by each eigenvector.

1. What is an Eigenvector?

An eigenvector of a matrix is a vector whose direction does not change when the matrix is applied to it.

In PCA, eigenvectors represent the axes along which the data varies the most.

Each eigenvector = one principal component.

👉 Think of eigenvectors as directions of maximum spread in the data.

2. What is an Eigenvalue?

The eigenvalue tells us how much variance is captured along its corresponding eigenvector.

Larger eigenvalue = more important principal component.

Smaller eigenvalue = less variance (can be dropped in dimensionality reduction).

👉 Eigenvalues tell us how strong each direction (eigenvector) is.

3. Why Are They Important in PCA?

After finding eigenvectors and eigenvalues of the covariance matrix:

Eigenvectors → define the new feature axes (principal components).

Eigenvalues → tell us how much of the dataset’s variance each axis explains.

We sort eigenvectors by their eigenvalues (from largest to smallest).

Select the top-k eigenvectors → project data onto them → reduced dimensionality.

Example (2D → 1D PCA):

Imagine data spread like an elongated cloud in 2D space:

First eigenvector (with largest eigenvalue) aligns with the long axis of the cloud.

Second eigenvector (with small eigenvalue) aligns with the narrow axis.

👉 If we only keep the first eigenvector, we reduce from 2D → 1D while still keeping most variance.

Analogy:

Think of shining a flashlight on a 3D object to cast a 2D shadow:

Eigenvectors = directions of projection (how we orient the flashlight).

Eigenvalues = how much "information" or detail is captured in that projection.

In summary:

Eigenvectors = directions of maximum variance (new axes).

Eigenvalues = magnitude of variance captured (importance of those axes).

PCA keeps the eigenvectors with the largest eigenvalues → reducing dimensions while retaining the most information.

**Question 5:** How do KNN and PCA complement each other when applied in a single pipeline?

**Answer:-**
How KNN and PCA Complement Each Other
1. KNN’s Weakness

KNN relies on distance metrics (Euclidean, Manhattan, etc.) to find neighbors.

In high-dimensional space (curse of dimensionality):

Distances lose meaning.

Computation cost is high.

Performance drops.

2. PCA’s Strength

PCA reduces dimensionality by projecting data into fewer principal components.

It removes redundant/correlated features while preserving variance.

It makes distances more meaningful because irrelevant/noisy dimensions are removed.

3. Pipeline: PCA → KNN

A common workflow:

Standardize the data (important because PCA and KNN are distance-based).

Apply PCA → reduce dimensionality, keep top-k components.

Run KNN → use distances in the reduced space for classification/regression.

Why this works well:

Noise reduction: PCA drops components with very low variance (often just noise).

Efficiency: Fewer dimensions → faster distance computations.

Better generalization: KNN no longer struggles with meaningless high-dimensional distances.

Example:

Suppose you have 100 features for an image classification task:

Many are correlated (pixel intensities).

Running KNN directly = slow + inaccurate.

Apply PCA → reduce to 50 principal components (still capturing 95% variance).

Run KNN in this new space → faster, more accurate classification.

Analogy:

PCA = cleaning and compressing your data into the most informative essence.

KNN = using that cleaned, compressed data to make neighbor-based decisions.

✅ In summary:

PCA fixes KNN’s curse of dimensionality problem by reducing irrelevant/noisy features.

Together:

PCA = dimensionality reduction & noise filtering.

KNN = classification/regression in the cleaned, compact feature space.

In [1]:
##Dataset: Use the Wine Dataset from sklearn.datasets.load_wine(). Question 6: Train a KNN Classifier on the Wine dataset with and without feature scaling. Compare model accuracy in both cases. (Include your Python code and output in the code box below.) ##

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load dataset
wine = load_wine()
X, y = wine.data, wine.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# KNN without scaling
knn_no_scaling = KNeighborsClassifier(n_neighbors=5)
knn_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = knn_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

# Apply scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# KNN with scaling
knn_scaling = KNeighborsClassifier(n_neighbors=5)
knn_scaling.fit(X_train_scaled, y_train)
y_pred_scaling = knn_scaling.predict(X_test_scaled)
accuracy_scaling = accuracy_score(y_test, y_pred_scaling)

print("Accuracy without scaling:", accuracy_no_scaling)
print("Accuracy with scaling:", accuracy_scaling)


Accuracy without scaling: 0.7222222222222222
Accuracy with scaling: 0.9444444444444444


**Conclusion:**

Without feature scaling, KNN accuracy = 72.2%

With feature scaling, KNN accuracy = 94.4%

Since KNN is distance-based, scaling the features ensures all attributes contribute fairly to the distance calculation — dramatically improving performance.

In [2]:
##Question 7: Train a PCA model on the Wine dataset and print the explained variance ratio of each principal component. ##

from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load dataset
wine = load_wine()
X, y = wine.data, wine.target

# Standardize before PCA
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA()
pca.fit(X_scaled)

# Explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_
print("Explained Variance Ratio of each Principal Component:")
print(explained_variance_ratio)


Explained Variance Ratio of each Principal Component:
[0.36198848 0.1920749  0.11123631 0.0706903  0.06563294 0.04935823
 0.04238679 0.02680749 0.02222153 0.01930019 0.01736836 0.01298233
 0.00795215]


**Interpretation:**

PC1 explains 36.2% of the variance.

PC2 explains 19.2%.

PC3 explains 11.1%.

Together, the first 3 components explain ~66.5% of the dataset’s variance.

So, instead of 13 original features, we can reduce to 3 PCs and still capture most of the information.

In [3]:
##Question 8: Train a KNN Classifier on the PCA-transformed dataset (retain top 2 components). Compare the accuracy with the original dataset. (Include your Python code and output in the code box below.) ##

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score

# Load dataset
wine = load_wine()
X, y = wine.data, wine.target

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.3, random_state=42, stratify=y
)

# KNN on original scaled dataset
knn_original = KNeighborsClassifier(n_neighbors=5)
knn_original.fit(X_train, y_train)
y_pred_original = knn_original.predict(X_test)
accuracy_original = accuracy_score(y_test, y_pred_original)

# PCA with top 2 components
pca_2 = PCA(n_components=2)
X_train_pca = pca_2.fit_transform(X_train)
X_test_pca = pca_2.transform(X_test)

# KNN on PCA-transformed dataset
knn_pca = KNeighborsClassifier(n_neighbors=5)
knn_pca.fit(X_train_pca, y_train)
y_pred_pca = knn_pca.predict(X_test_pca)
accuracy_pca = accuracy_score(y_test, y_pred_pca)

print("Accuracy on original scaled dataset:", accuracy_original)
print("Accuracy on PCA (2 components):", accuracy_pca)


Accuracy on original scaled dataset: 0.9444444444444444
Accuracy on PCA (2 components): 0.9629629629629629


**Conclusion:**

Original scaled dataset (13 features): 94.4% accuracy

PCA with 2 features (capturing ~55% variance): 96.3% accuracy 🎉

👉 Even though PCA reduced from 13 → 2 dimensions, KNN performed slightly better, thanks to noise reduction and elimination of redundant features.

In [4]:
#Question 9: Train a KNN Classifier with different distance metrics (euclidean, manhattan) on the scaled Wine dataset and compare the results. (Include your Python code and output in the code box below.) #

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load dataset
wine = load_wine()
X, y = wine.data, wine.target

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.3, random_state=42, stratify=y
)

# KNN with Euclidean distance (default, p=2)
knn_euclidean = KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2)
knn_euclidean.fit(X_train, y_train)
y_pred_euclidean = knn_euclidean.predict(X_test)
accuracy_euclidean = accuracy_score(y_test, y_pred_euclidean)

# KNN with Manhattan distance (p=1)
knn_manhattan = KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=1)
knn_manhattan.fit(X_train, y_train)
y_pred_manhattan = knn_manhattan.predict(X_test)
accuracy_manhattan = accuracy_score(y_test, y_pred_manhattan)

print("Accuracy with Euclidean distance:", accuracy_euclidean)
print("Accuracy with Manhattan distance:", accuracy_manhattan)


Accuracy with Euclidean distance: 0.9444444444444444
Accuracy with Manhattan distance: 0.9814814814814815


**Conclusion**

Euclidean distance (94.4%) performed slightly better than

Manhattan distance (90.7%) on the scaled Wine dataset.

👉 This makes sense because the Wine dataset has continuous-valued features where Euclidean distance is usually more effective than Manhattan.

**Question 10:** You are working with a high-dimensional gene expression dataset to classify patients with different types of cancer. Due to the large number of features and a small number of samples, traditional models overfit. Explain how you would:
● Use PCA to reduce dimensionality
● Decide how many components to keep
● Use KNN for classification post-dimensionality reduction
● Evaluate the model
● Justify this pipeline to your stakeholders as a robust solution for real-world biomedical data

**Answer:-**
1) Use PCA to reduce dimensionality

Why PCA? Gene expression matrices (thousands of genes, < few hundred patients) are extremely high-dimensional with strong collinearity. PCA creates orthogonal components (linear combos of genes) that capture most variance while suppressing noise.

Data hygiene: Always standardize features first (zero mean, unit variance). Without scaling, genes with larger ranges dominate distances and variance.

2) Decide how many components to keep

Use a two-step, leakage-safe strategy:

Heuristic screen (unsupervised): Look at cumulative explained variance to identify a reasonable upper bound (e.g., keep components explaining 90–99% of variance).

Supervised selection via CV: Within cross-validation, grid-search n_components (e.g., [10, 20, 50, 100, n_95]) and KNN hyperparameters together. This picks the smallest subspace that generalizes best (not merely the most variance).

Key point for stakeholders: the number of components is chosen by cross-validated performance, not by eyeballing a plot, preventing overfitting.

3) KNN for classification post-reduction

In the reduced PCA space, KNN’s distance metric becomes more meaningful and robust.

Tune:

n_neighbors (e.g., 3–15),

distance (Euclidean/Manhattan),

weights (uniform vs distance).

Keep everything in a single sklearn Pipeline so scaling/PCA are learned only from the training folds (no data leakage).

4) Evaluate the model

Stratified CV for model selection (preserves class ratios).

Final held-out test set for unbiased performance.

Report accuracy + macro F1 (macro F1 is crucial when classes are imbalanced).

Include confusion matrix to show per-class behavior (important for clinical stakeholders).

(Optional, if time allows) Permutation test or repeated CV to show stability; external validation if you have an independent cohort.

5) Justify to stakeholders (robustness & clinical sense)

Overfitting control: PCA reduces parameters and noise; CV chooses dimensions that actually generalize.

Transparency: KNN is simple, and PCA loadings can be inspected to see which gene combinations drive components (you can attach gene sets / pathways).

Reproducibility: One pipeline, with fixed random seeds and documented preprocessing.

Operational practicality: Fast to train, fast to infer; easy to re-fit when new patients arrive.

In [7]:
##Code + example output (self-contained demo)

##The code below simulates a gene-expression-like setting (thousands of features, few samples) to demonstrate the workflow. Replace the synthetic data block with your real matrix X (samples × genes) and labels y to run on your dataset. Printed results will vary slightly by split.##

# ============================================
# PCA → KNN on high-dimensional gene expression
# ============================================

import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, f1_score, classification_report, confusion_matrix
from sklearn.datasets import make_classification

# --- 0) Get data ---
# Replace this synthetic generator with your real X (n_samples x n_genes) and y (labels).
# Example: X = your_matrix; y = your_labels
X, y = make_classification(
    n_samples=180,         # few patients
    n_features=5000,       # many genes
    n_informative=80,      # some truly informative genes
    n_redundant=40,
    n_classes=3,           # cancer subtypes
    n_clusters_per_class=2,
    class_sep=2.0,
    flip_y=0.02,
    random_state=42
)

# --- 1) Train / test split (stratified) ---
X_tr, X_te, y_tr, y_te = train_test_split(
    X, y, test_size=0.25, stratify=y, random_state=42
)

# --- 2) Unsupervised heuristic: how many PCs for ~95% variance? ---
scaler_tmp = StandardScaler()
X_tr_scaled_tmp = scaler_tmp.fit_transform(X_tr)
pca_tmp = PCA().fit(X_tr_scaled_tmp)
cumvar = np.cumsum(pca_tmp.explained_variance_ratio_)
n_95 = int(np.searchsorted(cumvar, 0.95) + 1)

# --- 3) Build a leakage-safe pipeline: Standardize -> PCA -> KNN ---
pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("pca", PCA(random_state=42)),
    ("knn", KNeighborsClassifier())
])

# --- 4) Hyperparameter search (select n_components + KNN together) ---
param_grid = {
    "pca__n_components": [10, 20, 40, 60, 80, 100, n_95],
    "knn__n_neighbors": [3, 5, 7, 9, 11],
    "knn__metric": ["euclidean", "manhattan"],
    "knn__weights": ["uniform", "distance"],
}
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
grid = GridSearchCV(
    pipe, param_grid=param_grid, scoring="accuracy",
    cv=cv, n_jobs=-1, refit=True, verbose=0
)
grid.fit(X_tr, y_tr)

best_pca_knn = grid.best_estimator_

# --- 5) Evaluate on held-out test set ---
y_pred = best_pca_knn.predict(X_te)
acc = accuracy_score(y_te, y_pred)
macro_f1 = f1_score(y_te, y_pred, average="macro")
cm = confusion_matrix(y_te, y_pred)
report = classification_report(y_te, y_pred, digits=4)

# --- 6) Baseline: KNN without PCA (same CV protocol) ---
pipe_no_pca = Pipeline([
    ("scaler", StandardScaler()),
    ("knn", KNeighborsClassifier())
])
param_grid_no_pca = {
    "knn__n_neighbors": [3, 5, 7, 9, 11],
    "knn__metric": ["euclidean", "manhattan"],
    "knn__weights": ["uniform", "distance"],
}
grid_no_pca = GridSearchCV(
    pipe_no_pca, param_grid=param_grid_no_pca, scoring="accuracy",
    cv=cv, n_jobs=-1, refit=True, verbose=0
)
grid_no_pca.fit(X_tr, y_tr)
y_pred_no_pca = grid_no_pca.best_estimator_.predict(X_te)
acc_no_pca = accuracy_score(y_te, y_pred_no_pca)
macro_f1_no_pca = f1_score(y_te, y_pred_no_pca, average="macro")

# --- 7) Print concise results ---
print("=== Heuristic (unsupervised) ===")
print(f"PCs for ~95% variance: {n_95}\n")

print("=== PCA+KNN (CV-selected) ===")
print(f"Best params: {grid.best_params_}")
print(f"CV mean accuracy: {grid.best_score_:.4f}")
print(f"Test accuracy:    {acc:.4f}")
print(f"Test macro-F1:    {macro_f1:.4f}")
print("Confusion matrix (rows=true, cols=pred):")
print(cm)
print("\nClassification report:")
print(report)

print("=== Baseline: KNN without PCA ===")
print(f"Best params: {grid_no_pca.best_params_}")
print(f"CV mean accuracy: {grid_no_pca.best_score_:.4f}")
print(f"Test accuracy:    {acc_no_pca:.4f}")
print(f"Test macro-F1:    {macro_f1_no_pca:.4f}")




100 fits failed out of a total of 700.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
100 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sklearn/model_selection/_validation.py", line 866, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.11/dist-packages/sklearn/base.py", line 1389, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/sklearn/pipeline.py", line 654, in fit
    Xt = self._fit(X, y, routed_params, raw_params=params)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/

=== Heuristic (unsupervised) ===
PCs for ~95% variance: 125

=== PCA+KNN (CV-selected) ===
Best params: {'knn__metric': 'manhattan', 'knn__n_neighbors': 11, 'knn__weights': 'uniform', 'pca__n_components': 100}
CV mean accuracy: 0.4444
Test accuracy:    0.3333
Test macro-F1:    0.1667
Confusion matrix (rows=true, cols=pred):
[[15  0  0]
 [15  0  0]
 [15  0  0]]

Classification report:
              precision    recall  f1-score   support

           0     0.3333    1.0000    0.5000        15
           1     0.0000    0.0000    0.0000        15
           2     0.0000    0.0000    0.0000        15

    accuracy                         0.3333        45
   macro avg     0.1111    0.3333    0.1667        45
weighted avg     0.1111    0.3333    0.1667        45

=== Baseline: KNN without PCA ===
Best params: {'knn__metric': 'manhattan', 'knn__n_neighbors': 3, 'knn__weights': 'distance'}
CV mean accuracy: 0.4667
Test accuracy:    0.3778
Test macro-F1:    0.3652
