### **State University of Campinas - UNICAMP** </br>
**Course**: MC886A </br>
**Professor**: Marcelo da Silva Reis </br>
**TA (PED)**: Marcos Vinicius Souza Freire

---

### **Hands-On: Introduction to Machine Learning and Tensors in PyTorch**
##### Notebook: 03 Linear Model

> Based on Explore mtcars by Krasser (2024)[1] [https://cran.r-project.org/web/packages/explore/vignettes/explore-mtcars.html](https://cran.r-project.org/web/packages/explore/vignettes/explore-mtcars.html)
---

### **Table of Contents**

1. [**Objectives**](#objectives) </br>
2. [**Exploratory Data Analysis (EDA)**](#2-exploratory-data-analysis-eda) </br>
3. [**Mathematical Theory of Linear Regression**](#3-mathematical-theory-of-linear-regression) </br>
4. [**Building the Linear Model in PyTorch**](#4-building-the-linear-model-in-pytorch) </br>
5. [**Extending to a Deep Model**](#5-extending-to-a-deep-model) </br>
6. [**Discussion**](#6-discussion) </br>
7. [**REFERENCES**](#references)

---

#### Objective
- **Understand the Data:** Explore the `mtcars` dataset to identify patterns.
- **Build a Model:** Implement linear regression in PyTorch to predict `mpg`.
- **Learn the Theory:** Connect code to the math behind linear regression.
- **Discuss Results:** Interpret model performance and weights.

- **What is mtcars?**
  - We'll work with the `mtcars` dataset from 1974 Motor Trend magazine. It has 32 cars with 11 features like miles per gallon (`mpg`), weight (`wt`), horsepower (`hp`), cylinders (`cyl`), and gears (`gear`). Our task is to predict `mpg` using these features.
  - Show the first 5 rows (your output):
    ```
    First 5 rows of mtcars:
                        mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  carb
    Mazda RX4          21.0    6  160.0  110  3.90  2.620  16.46   0   1     4     4
    Mazda RX4 Wag      21.0    6  160.0  110  3.90  2.875  17.02   0   1     4     4
    Datsun 710         22.8    4  108.0   93  3.85  2.320  18.61   1   1     4     1
    Hornet 4 Drive     21.4    6  258.0  110  3.08  3.215  19.44   1   0     3     1
    Hornet Sportabout  18.7    8  360.0  175  3.15  3.440  17.02   0   0     3     2
    ```

- **Why Linear Regression?**
  - Linear regression is a foundational ML technique. It assumes a linear relationship between inputs (e.g., `wt`, `hp`) and output (`mpg`). We’ll use PyTorch to build it and see how well it works.

- **Plan:**
  - Explore data → Build a simple model → Extend to a deep model → Discuss results → Tie to math.

---

#### **2. Exploratory Data Analysis (EDA)**

- **Code Demo:**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

# Load data
mtcars = pd.read_csv('/content/drive/MyDrive/TA-PED/CLASSROOM/hands-on_00/data/mtcars.csv', index_col=0)

# Histogram of 'gear'
fig = px.histogram(mtcars, x='gear', title='Number of Cars by Gear')
fig.show()

# Scatter plot: Weight vs MPG
fig = px.scatter(mtcars, x='wt', y='mpg', trendline='ols',
                title='Weight vs MPG', labels={'wt': 'Weight (1000 lbs)', 'mpg': 'Miles per Gallon'},
                 trendline_color_override='red')
fig.show()

# 3. Boxplots
fig = px.box(mtcars, x='cyl', y='mpg', title='MPG Distribution by Number of Cylinders',
             color='cyl', color_discrete_sequence=px.colors.qualitative.Set2)
fig.update_layout(xaxis_title='Cylinders', yaxis_title='Miles Per Gallon (mpg)', showlegend=False)
fig.show()

fig = px.box(mtcars, x='gear', y='wt', title='Weight Distribution by Number of Gears',
             color='gear', color_discrete_sequence=px.colors.qualitative.Set3)
fig.update_layout(xaxis_title='Gears', yaxis_title='Weight (1000 lbs)', showlegend=False)
fig.show()

# Boxplot with Points
fig = go.Figure()
fig.add_trace(go.Box(x=mtcars['am'], y=mtcars['mpg'], boxpoints='all', jitter=0.3, pointpos=-1.8,
                     marker=dict(color='rgb(255, 127, 127)'), line=dict(color='rgb(255, 127, 127)'),
                     name='MPG by Transmission'))
fig.update_layout(title='MPG Distribution by Transmission Type',
                  xaxis_title='Transmission (0 = Automatic, 1 = Manual)',
                  yaxis_title='Miles Per Gallon (mpg)', showlegend=False)
fig.update_traces(boxmean=True)  # Show mean line
fig.show()

# Pair Plot (Scatter Matrix)
fig = px.scatter_matrix(mtcars, dimensions=['mpg', 'wt', 'hp', 'cyl'], color='gear',
                        title='Pairwise Relationships (Colored by Gear)',
                        color_continuous_scale='viridis', height=800)
fig.update_traces(diagonal_visible=False)  # Hide diagonal histograms (optional: keep with histogram=True)
fig.show()

# Correlation matrix
corr = mtcars.corr()
fig = px.imshow(corr, text_auto=True, title='Feature Correlation Matrix', color_continuous_scale='RdBu', height=600)
fig.show()

# MPG distribution
fig = go.Figure()
fig.add_trace(go.Histogram(x=mtcars['mpg'], nbinsx=10, marker=dict(color='skyblue', line=dict(color='black', width=1))))
fig.update_layout(title='Distribution of MPG', xaxis_title='mpg', yaxis_title='Frequency', bargap=0.1)
fig.show()

# Weight vs MPG with gear coloring
fig = go.Figure()
fig.add_trace(go.Scatter(x=mtcars['wt'], y=mtcars['mpg'], mode='markers',
                        marker=dict(size=10, color=mtcars['gear'], colorscale='Viridis', showscale=True),
                        text=mtcars.index))
fig.update_layout(title='Weight vs MPG (colored by Gear)', xaxis_title='Weight (wt)',
                  yaxis_title='Miles Per Gallon (mpg)', coloraxis_colorbar_title='Number of Gears')
fig.show()

# Summary stats
print("\nSummary Statistics:")
print(mtcars.describe())

- **Key Points:**
  - **Gear Histogram:** Most cars have 3 or 4 gears, few have 5. Does gear affect `mpg`?
  - **Weight vs MPG Scatter:** Lighter cars (low `wt`) tend to have higher `mpg`. The trendline confirms a negative slope.
  - **Correlation Matrix:** `wt` (-0.87), `cyl` (-0.85), and `hp` (-0.78) have strong negative correlations with `mpg`. `gear` (0.48) is positive but weaker.
  - **MPG Histogram:** MPG ranges from 10 to 34, with a peak around 15–20.
  - **Gear-Colored Scatter:** Cars with 5 gears (yellow) are lighter and have higher `mpg`.

---

#### **3. Mathematical Theory of Linear Regression**
**Goal:** Explain the math behind the model.

- **Formulation:**
  - Linear regression predicts `mpg` (y) as a weighted sum of features (X) plus a bias.
  - Equation:  
    $
    y = w_1 \cdot x_1 + w_2 \cdot x_2 + \dots + w_n \cdot x_n + b
    $
    - $ y $: Predicted `mpg`
    - $ x_1, x_2, \dots, x_n $: Features (`wt`, `hp`, `cyl`, `gear`)
    - $ w_1, w_2, \dots, w_n $: Weights (learned)
    - $ b $: Bias (learned)

- **Loss Function:**
  - We minimize the Mean Squared Error (MSE) to find the best $ w $ and $ b $:

    $
    \text{MSE} = \frac{1}{m} \sum_{i=1}^{m} (y_i - \hat{y}_i)^2
    $
    - $ m $: Number of samples
    - $ y_i $: Actual `mpg`
    - $ \hat{y}_i $: Predicted `mpg`

- **Optimization:**
  - We use Stochastic Gradient Descent (SGD) to update weights:

    $
    w_j \leftarrow w_j - \eta \cdot \frac{\partial \text{MSE}}{\partial w_j}
    $
    
    $
    b \leftarrow b - \eta \cdot \frac{\partial \text{MSE}}{\partial b}
    $
    - $ \eta $: Learning rate (0.01 in our case)
    - Gradients are computed automatically by PyTorch via backpropagation.

- **Note:**
  - Draw the equation on a board: $ \text{mpg} = w_1 \cdot \text{wt} + w_2 \cdot \text{hp} + w_3 \cdot \text{cyl} + w_4 \cdot \text{gear} + b $.
  - Weights tell us how much each feature impacts `mpg`. Negative $ w $ means higher feature values lower `mpg`.

---

#### **4. Building the Linear Model in PyTorch**

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Prepare data
features = ['wt', 'hp', 'cyl', 'gear']
X = mtcars[features].values
y = mtcars['mpg'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)
y_train = torch.FloatTensor(y_train).view(-1, 1)
y_test = torch.FloatTensor(y_test).view(-1, 1)

# Define model
class LinearModel(nn.Module):
    def __init__(self, input_size):
        super(LinearModel, self).__init__()
        self.linear = nn.Linear(input_size, 1)
    def forward(self, x):
        return self.linear(x)

input_size = X_train.shape[1]
model = LinearModel(input_size)
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Train
num_epochs = 1000
for epoch in range(num_epochs):
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluate
model.eval()
with torch.no_grad():
    y_pred = model(X_test)
    test_loss = criterion(y_pred, y_test)
    print(f'Test Loss: {test_loss.item():.4f}')

# Results
weights = model.linear.weight.data.numpy()
bias = model.linear.bias.data.numpy()
print("\nModel Weights:", weights)
print("Bias:", bias)

- **Explanation:**
  - **Data Prep:** We standardize features so `wt` (1.5–5.4) and `hp` (52–335) are on the same scale.
  - **Model:** This is $ y = w \cdot X + b $. `nn.Linear` computes it.
  - **Training:** Loss drops from 12.5 to 5.0, meaning the model learns. SGD adjusts $ w $ and $ b $ to minimize MSE.
  - **Weights:** Negative weights for `wt` (-2.65), `hp` (-1.43), `cyl` (-1.34) match our EDA—higher values lower `mpg`. Positive `gear` (0.52) suggests more gears increase `mpg`.
  - **Bias:** 20.17 is the baseline `mpg` when all features are zero (after scaling).

- **Graph:**


In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=y_test.flatten(), y=y_pred.flatten(), mode='markers', marker=dict(size=10, color='blue'), name='Predictions'))
fig.add_trace(go.Scatter(x=[y_test.min(), y_test.max()], y=[y_test.min(), y_test.max()], mode='lines', line=dict(color='red', dash='dash'), name='Perfect Prediction'))
fig.update_layout(title='Predicted vs Actual MPG', xaxis_title='Actual MPG', yaxis_title='Predicted MPG', showlegend=True)
fig.show()

  
  - Points near the red line are good predictions. Some scatter shows errors.

---

#### **5. Extending to a Deep Model**

In [None]:
class DeepLinearModel(nn.Module):
    def __init__(self, input_size):
        super(DeepLinearModel, self).__init__()
        self.layer1 = nn.Linear(input_size, 10)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(10, 1)
    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = self.layer2(x)
        return x

model = DeepLinearModel(input_size)
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
losses = []
for epoch in range(num_epochs):
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    losses.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

model.eval()
with torch.no_grad():
    y_pred = model(X_test)
    test_loss = criterion(y_pred, y_test)
    print(f'Test Loss: {test_loss.item():.4f}')

- **Graph:**

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=np.arange(1, num_epochs + 1), y=losses, mode='lines', line=dict(color='blue'), name='Training Loss'))
fig.update_layout(title='Training Loss Over Epochs', xaxis_title='Epoch', yaxis_title='Loss')
fig.show()

fig = go.Figure()
fig.add_trace(go.Scatter(x=y_test.flatten(), y=y_pred.flatten(), mode='markers', marker=dict(size=10, color='blue'), name='Predictions'))
fig.add_trace(go.Scatter(x=[y_test.min(), y_test.max()], y=[y_test.min(), y_test.max()], mode='lines', line=dict(color='red', dash='dash'), name='Perfect Prediction'))
fig.update_layout(title='Predicted vs Actual MPG (Deep Model)', xaxis_title='Actual MPG', yaxis_title='Predicted MPG')
fig.show()

- **Explanation:**
  - We add a hidden layer with 10 neurons and ReLU to capture non-linear patterns. Loss drops to 2.4, better than the 5.0 from linear model.
  - Test loss (7.06) is still higher than training, suggesting some overfitting.

---

#### **6. Discussion**

- **Model Comparison:**
  - Linear: Train Loss = 4.9996, Test Loss = 7.7062
  - Deep: Train Loss = 2.4012, Test Loss = 7.0613
  - The deep model fits training data better (lower loss), but both struggle on test data. Why? Small dataset (32 cars) and overfitting.

- **Weights Insight:**
  - Negative `wt` weight (-2.65) aligns with our scatter plot. Lighter cars have higher `mpg`.

- **Questions:**
  - What features seem most important? How could we improve the model? (Hint: More data, regularization, feature selection.)

---

### Mathematical Recap Handout
- **Model:** $ \hat{y} = W \cdot X + b $
- **Loss:** $ \text{MSE} = \frac{1}{m} \sum (y - \hat{y})^2 $
- **Gradient Update:** $ w_j \leftarrow w_j - \eta \cdot \frac{\partial \text{MSE}}{\partial w_j} $


---

#### **REFERENCES**

Krasser, Roland (2024). Explore mtcars

[1] Krasser, R. (2024). Explore mtcars. Cran R-project [https://cran.r-project.org/web/packages/explore/vignettes/explore-mtcars.html](https://cran.r-project.org/web/packages/explore/vignettes/explore-mtcars.html).