# Introduction to Data Science and Machine Learning

<p align="center">
    <img width="699" alt="image" src="https://user-images.githubusercontent.com/49638680/159042792-8510fbd1-c4ac-4a48-8320-bc6c1a49cdae.png">
</p>

---

## KNN and Linear regression - Homework

Here a couple of exercises to better fix in your mind the working schemas of KNN and Linear Regression algorithms.

### KNN

#### Exercise 1

> Use $k$NN just implemented to solve a classification problem. (_e.g._ the notorious Iris classification problem).

##### Import data

In [12]:
# I imported data for you. Your are welcome! 🙂
import pandas as pd
import numpy as np

from sklearn.datasets import load_iris

data = load_iris()

X = data['data']
y = data['target']

print(f"Dataset is made by {len(X)} data, whose first 5 lines are \n {X[:5]} \n ")
print(f"Target vector is {len(y)}-long, and targets names are \n {data['target_names']}")

Dataset is made by 150 data, whose first 5 lines are 
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]] 
 
Target vector is 150-long, and targets names are 
 ['setosa' 'versicolor' 'virginica']


#### Exercise 2

> Apply $k$NN to the [wave energy outputs regression problem](https://archive.ics.uci.edu/ml/datasets/Wave+Energy+Converters#) with a big dataset. Use different metrics and compare numerical performances.

### Linear Regression

#### Exercise 1

Is there a relationship between water salinity & water temperature? Can you predict the water temperature based on salinity? 

1. Using data contained in this [csv](https://www.kaggle.com/sohier/calcofi#bottle.csv) try to give an answer to this question.

2. Knowing that we have to find the _minimun_ of the cost function with respect to $\beta$ and that $ \partial_\beta J(\beta) = 0 $ is an equation in $\beta$. Use linear algebra to find the right coefficients $\beta$ without any loop calculation.

3. Use the equation found above to (re-)calculate $\beta$ and compare with the gradient descent and `sklearn` results.

_Hint for point 2._ Recall that one may use matrix notation to write
$$ \partial_\beta J(\beta) = X^t(X\beta - y) $$

#### Exercise 2

For example, we want to study the trend of fuel consumption as a function of the engine capacity, we can collect our measures in a table like the following.

| Engine capacity (cm$^3$) | Average Consumption (l/100km) |
|---|----|
| $800$  |  $6$    | 
| $1000$ |  $7.5$  | 
| $1100$ |  $8$    | 
| $1200$ |  $8.7$  | 
| $1600$ |  $12.4$ | 
| $2000$ |  $16$   | 
| $3000$ |  $20$   | 
| $4500$ |  $28$   | 

Apply linear regression to find the average consumption of an engine with `test_capacity = 1800`.

Use both `sklearn` library and your defined functions and compare the results.

_Hint for data conversion._ Recall that one may use pandas and python dictionaries to create dataframes.

```python
measures = pd.DataFrame({'Consumption_avg': [6, 7.5, 8, 8.7, 12.4, 16, 20, 28], 
                         'Capacity': [800, 1000, 1100, 1200, 1600, 2000, 3000, 4500]})
```