<img style="float: right;" src="../../assets/htwlogo.svg">

# Exercise: Derivatives and linear models

In the following Jupyter notebook, we test our knowledge of derivatives and linear models. Ready?

**Author**: _Erik Rodner_<br>

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

## Exercise 1: Derive and compute derivatives of a multivariate function

Our goal in this exercise is to compute the partial derivatives of a function $f(x,y)$.
Implement the following functions and their respective gradients (code template follows in the next cell):
1. $f(x,y) = 3x^2 + 2xy + y^2$
2. $f(x,y) = (x-3e^{-xy})^2$
3. $f(x,y) = log(x^2 - 4x + 4)$ 

In [None]:
def function_to_diff(x, y):
    return 3*x**2 + 2*x*y + y**2

Your task is now to compute the partial derivatives of the function, the following method needs to return
the gradient vector at location (x,y):

In [None]:
def compute_gradient(x, y):
    df_dx = 0 # modify this code!
    df_dy = 0 # modify this code!
    return np.array([df_dx, df_dy])

Of course, we need to check the correctness of the gradients. One way to test the gradients is too check whether they are close
to finite difference:
$$
\frac{\partial f}{\partial x} \approx \frac{f(x + \epsilon,y) - f(x-\epsilon,y)}{2\epsilon}
$$

Remark: there are different ways for approximating the gradients with finite difference, such as simple forward differences.

In [None]:
# Function to compute numerical gradient using finite differences
def compute_numerical_gradient(f, x, y, epsilon=1e-5):
    """Approximates the gradient (partial derivatives) using finite differences."""
    df_dx = (f(x + epsilon, y) - f(x - epsilon, y)) / (2 * epsilon)
    df_dy = (f(x, y + epsilon) - f(x, y - epsilon)) / (2 * epsilon)
    return np.array([df_dx, df_dy])

# Function to test the correctness using finite differences
def test_gradient_func(x_test = 1.0, y_test = 2.0):
    expected_gradient = compute_gradient(x_test, y_test)
    numerical_gradient = compute_numerical_gradient(function_to_diff, x_test, y_test)
    
    if np.allclose(expected_gradient, numerical_gradient, atol=1e-6):
        print("Test passed! The analytical gradient matches the numerical approximation.")
    else:
        print("Test failed. Analytical gradient:", expected_gradient, 
              "Numerical gradient:", numerical_gradient)

Ready to test your code?

In [None]:
test_gradient_func()


## Exercise 2: Linear regression

Implement linear regression using the normal equation method. **Fill in missing code to solve for weights using matrix operations.**

In the following code, we use a simple but effective trick. Instead of taking care of the bias term seperately, we simply
append a $1$ to the inputs $\tilde{\mathbf{x}} = [\mathbf{x}, 1]^T = [x_1, x_2, \ldots, x_D, 1]^T$. Afterwards, we simply use a linear model
without bias term on $\mathbf{x}'$: 
$$\tilde{\mathbf{w}}^T \tilde{\mathbf{x}} = [w_1, \ldots, w_D, b]^T [x_1, x_2, \ldots, x_D, 1]^T = \mathbf{w}^T \mathbf{x} + b$$ 

Summary: add a 1 to your inputs and skip thinking about the bias term (in most cases)

In [None]:
# Generate synthetic data for regression
X, y = make_regression(n_samples=100, n_features=1, noise=10)

# Add a bias term (intercept) to X
X_b = np.c_[X, np.ones((X.shape[0], 1))]  # add x0 = 1 to each instance

Now it's your turn, compute the best parameter vector $\mathbf{w}$. It does not have to be a single line of code (but it is possible):

In [None]:
w_best = np.zeros(X_b.shape[1])


Let's check your solution visually:

In [None]:
# Display the computed weights
print("Optimal weights (including bias):", w_best)

# Predict values using the derived model
y_pred = X_b.dot(w_best)

# Plotting the results
plt.scatter(X, y, color='blue', label='Data points')
plt.plot(X, y_pred, color='red', label='Linear regression line (Normal Equation)')
plt.title('Linear Regression Optimization using Normal Equation')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.legend()
plt.show()

## Exercise 3: Explore the stability of linear regression

What happens if you add more noise to the features? What happens if you add outliers? Can you think about an algorithm that deals with these outliers?