 # COMP3314 - Assignment 4 Question 1: Regression (30 Points)

A small business has been tracking its yearly profits and the number of advertisement spots purchased each year for the past four years. They are interested in understanding how their advertising efforts affect their annual profits using simple linear regression.

The data collected from 2020 to 2023 is as follows:

| Year | Advertisement Cost ($x_i$)  | Annual Profit ($y_i$) |
|------|-----------------------------|-----------------------|
| 2020 | 20                          | 60                    |
| 2021 | 40                          | 65                    |
| 2022 | 60                          | 75                    |
| 2023 | 80                          | 85                    |


## Q1-1 (5 points)

For the regression model, the predicted value $\hat{y}_i$ is calculated as:
$$
\hat{y}_i=w_0+w_1 x_i
$$
where:
- $w_0$ is the intercept of the regression line,
- $w_1$ is the slope of the regression line,
- $x_i$ is the number of advertisement spots in the i-th data point.

Write down the formula for computing the sum of squared errors (SSE) between the predicted values and the actual values for the given data points. We denote the actual value as $y_i$ and the predicted value as $\hat{y}_i$, and the SSE is denoted as $S$.

Your solution here:

$S=\frac{1}{2}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$

## Q1-2 (5 points)

Compute the partial derivatives of the sum of $S$ with respect to the intercept $w_0$ and the slope $w_1$, denoted as $\frac{\partial S}{\partial w_0}$ and $\frac{\partial S}{\partial w_1}$, respectively.

Your solution here:

$\frac{\partial{}S}{\partial{}w_0} = \frac{\partial{}}{\partial{}w_0} \frac{1}{2}\sum_{i=1}^{n} (y_i-w_0-w_1x_i)^2 $

$\frac{\partial{}S}{\partial{}w_0} = \sum_{i=1}^{n} (y_i-w_0-w_1x_i) \frac{\partial{}}{\partial{}w_0} (y_i-w_0-w_1x_i) $

$\frac{\partial{}S}{\partial{}w_0} = \sum_{i=1}^{n} (y_i-w_0-w_1x_i) (-1) $

$\frac{\partial{}S}{\partial{}w_0} = \sum_{i=1}^{n} (-y_i+w_0+w_1x_i) $

$\frac{\partial{}S}{\partial{}w_1} = \frac{\partial{}}{\partial{}w_1} \frac{1}{2}\sum_{i=1}^{n} (y_i-w_0-w_1x_i)^2 $

$\frac{\partial{}S}{\partial{}w_1} = \sum_{i=1}^{n} (y_i-w_0-w_1x_i) \frac{\partial{}}{\partial{}w_1} (y_i-w_0-w_1x_i) $

$\frac{\partial{}S}{\partial{}w_1} = \sum_{i=1}^{n} (y_i-w_0-w_1x_i) (-x_i) $

$\frac{\partial{}S}{\partial{}w_1} = \sum_{i=1}^{n} (-x_iy_i+w_0x_i+w_1x_i^2) $

## Q1-3 (10 points)

Given the data points, compute the optimal values of $w_0$ and $w_1$ that minimize the sum of squared errors $S$. You shall first set the partial derivatives to zero, i.e., $\frac{\partial S}{\partial w_0}=0$ and $\frac{\partial S}{\partial w_1}=0$, and then solve the equations to find the optimal values of $w_0$ and $w_1$.

Your solution here:

$\frac{\partial{}S}{\partial{}w_0} = 0 $

$\sum_{i=1}^{n} (-y_i+w_0+w_1x_i) = 0 $

$\frac{\partial{}}{\partial{}w_0} \frac{1}{2}\sum_{i=1}^{n} (y_i-w_0-w_1x_i)^2 = 0$

$\sum_{i=1}^{n} (-x_iy_i+w_0x_i+w_1x_i^2) = 0$

$ -y_i+w_0+w_1x_i = -x_iy_i+w_0x_i+w_1x_i^2 $

$ -60+w_0+20w_1 = -20*60+20w_0+40w_1 — ①$

$ -65+w_0+40w_1 = -40*65+40w_0+160w_1 — ②$

$ w_0 = 53.8 $

$ w_1 = 3.1875 $

## Q1-4 (5 points)

The company plans to spend 100 on advertisement spots in year 2024. Using the regression model, what is the predicted profit for the company in 2024?

Your solution here:


$\hat{y}_i=w_0+w_1 x_i$

$\hat{y}_{2024} = 53.8+3.1875*100$

$\hat{y}_{2024} = 372.55$


## Q1-5 (5 points)

Write a Python code with sklearn to perform linear regression on the given data points and predict the profit for the company in 2024. Print out the optimal values of $w_0$ and $w_1$ and the predicted profit for 2024 and compare the results with the manual calculation.

Your solution here:

In [5]:
from sklearn.linear_model import LinearRegression
from numpy import ndarray
import numpy as np

X_train: ndarray = np.asarray([20, 40, 60, 80]).reshape((-1, 1))
y_train: ndarray = np.asarray([60, 65, 75, 85]).reshape((-1, 1))

regressor: LinearRegression = LinearRegression().fit(X_train, y_train)

X_test: ndarray = np.asarray([100]).reshape((-1, 1))

print(f"Predicted profit (2024): {regressor.predict(X_test)[0][0]}, w0: {regressor.intercept_[0]}, w1: {regressor.coef_[0][0]}")

Predicted profit (2024): 92.5, w0: 50.0, w1: 0.425
