
# Introduction to Statistical Learning: Bias, Variance, and Bias-Variance Tradeoff

## Q1: What is Unsupervised Learning?
Unsupervised learning is a type of machine learning where the algorithm is trained on data that does not have labeled outputs. The goal is to uncover hidden patterns, relationships, or structures in the data.

### Example: Clustering in Python using K-Means

```python
from sklearn.cluster import KMeans
import numpy as np

X = np.array([[1, 2], [2, 3], [3, 4], [5, 6], [6, 7], [8, 9]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

print("Cluster Labels:", labels)
print("Cluster Centroids:", centroids)
```

## Q2: How to Select Training and Testing Data
To select training and testing data, the dataset is usually split into training and testing sets.

### Python example:

```python
from sklearn.model_selection import train_test_split
import pandas as pd

data = {'Feature1': [1, 2, 3, 4, 5, 6],
        'Feature2': [5, 6, 7, 8, 9, 10],
        'Target': [0, 1, 0, 1, 0, 1]}
df = pd.DataFrame(data)

X = df[['Feature1', 'Feature2']]
y = df['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print(X_train, y_train, X_test, y_test)
```

## Q3: Bias, Variance, and Bias-Variance Tradeoff
Bias refers to the error introduced by approximating a complex real-world problem with a simplified model. Variance refers to the model's sensitivity to small changes in the training data.

### Example of Bias-Variance Tradeoff

```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

np.random.seed(42)
X = np.random.rand(100, 1) * 6 - 3
y = 0.5 * X**3 - X + 2 + np.random.randn(100, 1) * 3

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

def plot_model(degree):
    poly_features = PolynomialFeatures(degree=degree, include_bias=False)
    X_poly_train = poly_features.fit_transform(X_train)
    X_poly_test = poly_features.transform(X_test)
    
    model = LinearRegression()
    model.fit(X_poly_train, y_train)
    
    y_train_predict = model.predict(X_poly_train)
    y_test_predict = model.predict(X_poly_test)
    
    train_mse = mean_squared_error(y_train, y_train_predict)
    test_mse = mean_squared_error(y_test, y_test_predict)
    
    X_plot = np.linspace(-3, 3, 100).reshape(100, 1)
    X_plot_poly = poly_features.transform(X_plot)
    y_plot = model.predict(X_plot_poly)
    
    plt.scatter(X_train, y_train)
    plt.plot(X_plot, y_plot, color='r')
    plt.title(f"Degree {degree} Polynomial\nTrain MSE: {train_mse:.2f}, Test MSE: {test_mse:.2f}")
    plt.show()

plot_model(1)
plot_model(3)
plot_model(15)
```




# Week 2: Linear and Polynomial Regression

## 1. Linear Regression

Linear regression is a supervised learning algorithm that models the relationship between a dependent variable (target) and one or more independent variables (features) using a straight line.

The general form of a linear regression model:
\[
y = eta_0 + eta_1 x_1 + eta_2 x_2 + \dots + eta_n x_n + \epsilon
\]
Where:
- \(y\) is the target variable.
- \(eta_0\) is the intercept.
- \(eta_1, \dots, eta_n\) are the coefficients for each feature \(x_1, x_2, \dots, x_n\).
- \(\epsilon\) is the error term (noise).

### Gradient Descent
Gradient descent is an optimization algorithm used to minimize the cost function by iteratively adjusting the model parameters.

### Python Example of Linear Regression using `sklearn`:
```python
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

model = LinearRegression()
model.fit(X, y)

X_new = np.array([[0], [2]])
y_predict = model.predict(X_new)

plt.plot(X, y, "b.")
plt.plot(X_new, y_predict, "r-", label="Predictions")
plt.title("Linear Regression Example")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

print("Intercept:", model.intercept_)
print("Slope:", model.coef_)
```

## 2. Polynomial Regression

Polynomial regression is an extension of linear regression where the relationship between features and the target variable is modeled as an nth-degree polynomial.

### Python Example of Polynomial Regression using `sklearn`:
```python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Ridge

np.random.seed(42)
X = 6 * np.random.rand(100, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(100, 1)

model = Pipeline([
    ("poly_features", PolynomialFeatures(degree=2, include_bias=False)),
    ("ridge_reg", Ridge(alpha=1))
])

model.fit(X, y)

X_new = np.linspace(-3, 3, 100).reshape(100, 1)
y_predict = model.predict(X_new)

plt.plot(X, y, "b.")
plt.plot(X_new, y_predict, "r-", label="Polynomial (Degree=2) Fit")
plt.title("Polynomial Regression with Ridge Regularization")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()
```
