# Assignment 2: K-Nearest Neighbors (KNN) & Linear Regression

This assignment focuses on **understanding two fundamental supervised learning algorithms — K-Nearest Neighbors (KNN)** and **Linear Regression** — using simple, interpretable datasets.

You will explore how both models learn from data, visualize their behavior, and reflect on how model parameters, noise, and outliers affect performance.

In [2]:
from sklearn.datasets import load_iris
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load Iris dataset
iris = load_iris()

# Use only two features: sepal length and sepal width
X = iris.data[:, :2]
y = iris.target

# Convert to DataFrame for easier visualization
df = pd.DataFrame(X, columns=iris.feature_names[:2])
df['target'] = y
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),target
0,5.1,3.5,0
1,4.9,3.0,0
2,4.7,3.2,0
3,4.6,3.1,0
4,5.0,3.6,0


---
## Question 1 (10 points): Visualize the Dataset

Create a scatter plot to visualize the relationship between the two features. Color each point by its class (target).

✏️ **Task:**
- Plot sepal length vs sepal width.  
- Use different colors for each class.  
- Label the axes and add a title.

In [None]:
# Your code here

---
## Question 2 (5 points): Split the Data

Split the dataset into **training (80%)** and **testing (20%)** sets.

✏️ **Task:**
- Use `train_test_split` from `sklearn.model_selection`.  
- Set a fixed `random_state` for reproducibility.  
- Print the number of samples in each set.

In [None]:
# Your code here

---
## Question 4 (10 points): Train a Baseline KNN Model

Train a KNN classifier using:
- `k = 5`
- The default distance metric (`minkowski`)

✏️ **Task:**
- Fit the model on your training data.  
- Predict on the test data.  
- Print the accuracy, confusion matrix, and classification report.

In [None]:
# Your code here

---
## Question 5 (20 points): Effect of Different k Values

Experiment with different values of `k` from 1 to 20. Observe how model performance changes.

✏️ **Task:**
- Loop over `k` values from 1 to 20.  
- For each `k`, train and evaluate the model.  
- Record training and testing accuracy.  
- Plot both accuracies versus `k`.  



In [None]:
# Your code here

💭 **Reflection:**
- Which value of `k` gives the best performance?  
- What happens when `k` is very small or very large?  
- How do these changes relate to **overfitting** and **underfitting**?

---
## Question 6 (10 points): Visualize Decision Boundaries

Now that you have found the best value of `k`, let’s see how KNN separates the classes for different distance metrics.

✏️ **Task:**
- For each distance metric, plot how KNN divides the feature space.  
- Show the **regions** belonging to each class and the **test points** on top of them.  
- Compare how the boundaries change for each metric.

💡 **Hint:**  
You can use `np.meshgrid` to create a grid of coordinate points, then predict each point’s class using your trained KNN model.  
Visualize the result using `plt.contourf()` for colored regions and `plt.scatter()` to plot the test samples.


In [5]:
# Your code here

---
## Q7 – Generate Synthetic Data (10 points)

Explore how **Linear Regression** behaves under different noise levels.

✏️ **Task:**
- Generate two synthetic datasets using `make_regression`:
  - **Scenario 1:** Low noise (`noise=10`)  
  - **Scenario 2:** High noise (`noise=50`)  
- **Visualize** each dataset using a scatter plot:
  - Plot the generated points for both scenarios on separate graphs.  
  - Label the axes and include a title indicating the noise level.

📊 **Hint Example:**
```python
print("--- Scenario 1: Low Noise Data ---")
X1, y1 = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)

print("\n--- Scenario 2: High Noise Data ---")
X2, y2 = make_regression(n_samples=100, n_features=1, noise=50, random_state=42)
```

In [None]:
# Your code here

---
## Q8 – Fit Linear Regression Models (20 points)

Now, let’s fit a Linear Regression model to both datasets.

✏️ **Task:**
- Fit one model to each dataset.
- Plot the fitted regression lines on top of the scatter plots.
- Compare visually how noise affects model fit.


In [None]:
# Your code here

    ---
    ## Q9 – Analyze Model Parameters (15 points)

    Let’s compare the slope (coefficient) and bias (intercept) of both models.

    ✏️ **Task:**
    - Print slope and bias for each model.
    - Discuss how higher noise levels affect model stability and parameters


In [None]:
# Your code here