In [34]:
import numpy as np
import math
from typing import Tuple
import plotly.express as px
import plotly.graph_objects as go

## 1. Generate data - 3 points
Write a function `gen_exp(N)` that generates toy data. The method should have a parameter $N$, and should return $(N, 1)$-dimensional vectors $\mathbf{x}$ and $\mathbf{y}$, where $\mathbf{x}$ contains evenly spaced values from 0 to (including) 2, and the elements $y_i$ of $\mathbf{y}$ are distributed according to:

$$t_i \sim \mathcal{N}(\mu_i, \sigma^2)$$

where $x_i$ is the $i$-th element of $\bf{x}$, the mean $\mu_i = e^{x_i}$ and the standard deviation $\sigma = 0.25$.

In [35]:
# EXERCISE
def gen_exp(N: int) -> Tuple[np.ndarray, np.ndarray]:
    x = np.linspace(0, 2, num=N).reshape(N, -1)
    y = np.array([np.random.normal(x_i, 0.25, 1) for x_i in np.exp(x)]).reshape(N, -1)

    return x, y

In [36]:
### Test your function
N = 100
x, y = gen_exp(N)

assert x.shape == (N, 1), "the shape of x is incorrect (should be (N,1))"
assert y.shape == (N, 1), "the shape of y is incorrect (should be (N,1))"

## 2. Linear Regression - 4 points


The linear regression model is defined as (here, we use the matrix form). Note that we do not model intercept explicitly (with another parameter):

$$\mathbf{\hat{y}} = \mathbf{x} \mathbf{\alpha} + \mathbf{\epsilon}$$

As discussed in classes, using simple optimisation techniques gives us a closed-form solution for $\alpha$:

$$ \mathbf{\alpha} = \left(\mathbf{x}^{T}\mathbf{x}\right)^{-1}\mathbf{x}^{T}\mathbf{y} $$

You can use `np.linalg.inv` to compute matrix inverse.

In [37]:
# EXERCISE
def fit_alpha(x: np.ndarray, y: np.ndarray) -> np.ndarray:
    return np.linalg.inv(x.T @ x) @ x.T @ y

Write a method `predict(x, beta)` that uses our linear model to predict values of unseen datapoints

In [38]:
# EXERCISE
def predict(x: np.ndarray, alpha: np.ndarray) -> np.ndarray:
    return x @ alpha

In [39]:
### Test your functions
alpha = fit_alpha(x, y)

assert alpha.shape == (1, 1), "the shape of beta is incorrect (should be (1,1))"

## 3. Feature Engineering - 3 points

Write a method `augment_data(x)` that augments our variables set (which for now consists of $\mathbf{x}$). We want two augmentation of the model:

1. Intercept ($\mathbf{\beta})$
2. Quadratic ($\mathbf{x}^{2})$

Note that `fit_alpha(x)` returns $\alpha$ as a single array. As such, if $\mathbf{x}$ consist of more than one column, $\alpha$ will have multiple values as well. Structure your dataset such that $\alpha$ will be in the following order [intercept, $x$, $x^{2}$]

In [40]:
# EXERCISE
def augment_data(x: np.ndarray) -> np.ndarray:
    augmented_x = np.c_[x, x ** 2, np.ones(len(x)).T]

    return augmented_x

### Test your function
alpha_aug = fit_alpha(augment_data(x), y)

assert alpha_aug.shape == (3, 1), "the shape of beta_aug is incorrect (should be (3,1))"

# 4.Visualization - 5 points
Visualize data and your results. Create a plot that contains
* Training datapoints as dots
* Function that is approximated (exponential function) as a line
* Your linear approximation (from Ex.2) as a line
* Your augmented approximation (with feature engineering; from Ex.3) as a line

To draw lines use at least 100 points, use evenly spaced $x$ values from range $x \in [0, 2]$. Add corresponding description for each entity in the plot. Make sure they have different colors.

In [41]:
fig = go.Figure()

fig = px.scatter(x=x.reshape(-1), y=y.reshape(-1))
fig.add_trace(go.Scatter(x=x.reshape(-1), y=predict(augment_data(x), fit_alpha(augment_data(x), y)).reshape(-1), name='augmented linear fit', mode='lines'))
fig.add_trace(go.Scatter(x=x.reshape(-1), y=y.reshape(-1), name='approximated function', mode='lines'))
fig.add_trace(go.Scatter(x=x.reshape(-1), y=predict(x, fit_alpha(x, y)).reshape(-1), name='linear fit', mode='lines'))

fig.show()