# Machine Learning

## Linear regression (part 1)

### Submission

<u>Submission:</u>

Compress all files into **single zip** archive and submit via Wikamp. See below the content of the archive (replace the `name` and `surname` with proper values):
```
📂 name.surname.zip
+-- 📜 02-Linear regression (part 1).ipynb
```

<u>Grades</u>

| Percentage of all points | mark |
| :----                    | ---: |
| [0-50)   | 2   |
| [50-60)  | 3   |
| [60-70)  | 3.5 |
| [70-80)  | 4   |
| [80-90)  | 4.5 |
| [90-100] | 5   |

<u>Penalties</u>

* `mark - 0.5` if tasks are submitted after laboratory (but less than 7 days); 
* `mark - 1` if tasks are submitted after one week (>=7 days but < 14 days);
* `mark - 1.5` if tasks are submitted later than two weeks (>=14 days).

<u>Warning:</u>

It is NOT allowed to share your .ipynb file with other students. All students should download the exercise files directly from WIKAMP. Group work is considered as plagiarism.

<u>Plagiarism Disclaimer:</u>

I hereby declare that this exercise is my own and autonomous work. I am aware of the consequences and I am familiar with the university regulations.


In [1]:
import numpy as np
import matplotlib.pyplot as plt

### Task 1. Create training dataset.
Create a univariate training dataset. The dataset consists of input data `X` and output data `Y`.

- **Step 1:** Create input data `X` of 20 elements in range `1..100` using `np.linspace` function (description below).
- **Step 2:** Calculate output data `Y` using below formula:
\begin{equation}
y = \frac{x}{4} + 25
\end{equation}
- **Step 3:** Add noise to data. Use `np.random.normal` function with `mean=0` and set `std` to standard deviation of `Y`.

See below desctiption of mentioned `numpy` functions:

#### **[np.linspace](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)**


```python
numpy.linspace(start, stop, num)
```

Returns `num` evenly spaced samples, calculated over the interval `[start, stop]`.

**Parameters**
* `start` - The starting value of the sequence.
* `stop` - The end value of the sequence.
* `num` - Number of samples to generate. Default is 50. Must be non-negative.

**Example:**
```python
np.linspace(2.0, 3.0, num=5)
array([2.  , 2.25, 2.5 , 2.75, 3.  ])
```


#### [np.random.normal](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html)

```python
random.normal(loc=0.0, scale=1.0, size=None)
```

Returns random samples from a normal (Gaussian) distribution.

**Parameters**
* `loc` - Mean ("centre") of the distribution.
* `scale` - Standard deviation (spread or “width”) of the distribution. Must be non-negative.
* `size` - Output shape.

**Example:**
```python
np.random.normal(3, 2.5, size=(2, 4))
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random
```

In [2]:
# >>> WRITE YOUR CODE IN THIS CELL <<<
X = np.linspace(1, 100, num=20)
Y = 0.25 * X + 25 + 

array([  1.        ,   6.21052632,  11.42105263,  16.63157895,
        21.84210526,  27.05263158,  32.26315789,  37.47368421,
        42.68421053,  47.89473684,  53.10526316,  58.31578947,
        63.52631579,  68.73684211,  73.94736842,  79.15789474,
        84.36842105,  89.57894737,  94.78947368, 100.        ])

### Task 2. Display the samples.
To display the samples use `plt.scatter` function.

Example:
```python
plt.figure()
plt.scatter(X, Y)
plt.title('Samples')
plt.xlabel('x')
plt.ylabel('y')
```

In [None]:
# >>> WRITE YOUR CODE IN THIS CELL <<<


## Task 3. Write a function.
You already know the exact function (it is the same as you used while generating the dataset), but in this task let's imagine you do NOT know the exact function parameters. Assume that you only know that the function is defined by below formula:
$$
y = ax + b
$$.
Your task is to implement above function in python.


In [None]:
# >>> WRITE YOUR CODE IN THIS CELL <<<


## Task 4. Display the output of the function.
Set the parameters `a=-0.5`, `b=0` and display the function output at the same chart with samples from task 2.

Example:
```python
plt.figure()
plt.title('Function')
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(X, Y)
pred = fun(X, -0.5) # the function fun is implemented in previous task
plt.plot(X, pred, color='r')
```

In [None]:
# >>> WRITE YOUR CODE IN THIS CELL <<<


## Task 5. Cost function.

Write a cost function that calculates how much the prediction differs from the expected values. The cost function takes two parameters `Yt` (correct values - ground truth) and `Yp` (prediction). Use the Root Mean Squared Error (RMSE) as the metric:
$$
\textit{RMSE(Yt, Yp)} = \sqrt{\frac{1}{n}\sum_{i=0}^{n-1}{(\textit{Yp} - \textit{Yt})^2}}
$$

In [None]:
# >>> WRITE YOUR CODE IN THIS CELL <<<


## Task 6. Display the cost function for different values of parameter `a`.
Plot the cost function for different values of parameter `a` (set the parameter `b = 0`). Check 20 values in range between -2 and 2.

_You will notice that the function drops until particular value and then increases. The point where the function achieves the minimum is the solution._

NOTE: Use the `plt.plot` to draw the chart.

In [None]:
# >>> WRITE YOUR CODE IN THIS CELL <<<


## Task 7. Display the cost function for different values of parameters `a` and `b`.
Plot the cost function for different values of parameter `a` and `b`. Use below range:

| Variable | Range    |
| :------- | -------: |
| `a`      | -2..2    |
| `b`      | -50..100 |

To display this chart use 3D projection. `np.meshgrid` function can be used to generate the coordinates. Example:

```python
# ...
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.set_xlabel('a')
ax.set_ylabel('b')
ax.set_zlabel('cost')
plt.title("Cost function")
A, B = np.meshgrid(A, B)
ax.plot_surface(A, B, cost, cmap="gist_ncar")
```

_NOTE: `%matplotlib notebook` makes the chart interactive._


In [None]:
%matplotlib notebook
from mpl_toolkits.mplot3d import Axes3D
# >>> WRITE YOUR CODE BELOW <<<


## Write conclusions
Write your conclusions.