# Find the best model

## Setup

In [62]:
import pandas as pd

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

## Data

### Create data

In [19]:
df = pd.DataFrame(
    {'sales': [2500, 4500, 6500, 8500, 10500, 12500, 14500, 16500, 18500, 20500],
      'ads'  : [900, 1400, 3600, 3800, 6200, 5200, 6800, 8300, 9800, 10100]}
)

### Variable lists

We need to prepare our data for the scikit-learn model (which does not work with Pandas dataframes):

- We create the variable `y_label` (our dependent variable, also called label): choose "`sales`" (this step will be helpful in later notebooks)

- We select our features (independent variables) and save them as `X`: choose "`ads`"

- Select and save the dependent variable as `y`

Hint:

```pyhton
y_label = "___"

X = df[["___"]]
y = df[___]
```

In [20]:
### BEGIN SOLUTION
y_label = "sales"

X = df[["ads"]]
y = df[y_label]
### END SOLUTION 

In [21]:
X

Unnamed: 0,ads
0,900
1,1400
2,3600
3,3800
4,6200
5,5200
6,6800
7,8300
8,9800
9,10100


In [22]:
y

0     2500
1     4500
2     6500
3     8500
4    10500
5    12500
6    14500
7    16500
8    18500
9    20500
Name: sales, dtype: int64

In [23]:
### Check your code (note that there are some hidden checks)
assert len(X) == 10

### BEGIN HIDDEN TESTS 
assert y_label == "sales"
assert len(y) == 10
### END HIDDEN TESTS

## Model

### Select model

Hint:

Save the `LinearRegression()`estimator as object called `reg`

---

```pyhton
___ = ___()
```

---

In [24]:
### BEGIN SOLUTION
reg = LinearRegression()
### END SOLUTION 

In [32]:
### Check your code
assert reg.get_params() == {'copy_X': True, 'fit_intercept': True, 'n_jobs': None, 'normalize': 'deprecated', 'positive': False}

### Fit model


Fit the model (your `reg` object) to the data with `.fit()`.

In the fit function, you first need to provide the features and then the dependent variable (features, dependent variable)

Hint:


---

```pyhton
___.___(___, ___)
```

---



In [33]:
### BEGIN SOLUTION
reg.fit(X, y)
### END SOLUTION

In [44]:
### Check your code
assert reg.positive == False

### Coefficients

In [6]:
# Intercept
reg.intercept_

array([1181.21008618])

In [7]:
# Slope
reg.coef_

array([[1.83935649]])

### Make predictions


Use your model to:

- predict sales from your ads (`X`) values with the method `.predict()`

- Save the result as `y_pred`

Hint:

---

```pyhton
___ = ___.___(___)
```

---



In [45]:
### BEGIN SOLUTION
y_pred = reg.predict(X)
### END SOLUTION

In [47]:
y_pred[0]

2836.6309279665047

In [48]:
### Check your code
assert 2836 < y_pred[0] <  2840

### Evaluation

#### Mean squared error

Use the scikit-learn function `mean_squared_error()` to calculate the mean squared error and save the result as `mse`

Hint:

---

```pyhton
___ = ___(___, ___)
```

---

In [50]:
### BEGIN SOLUTION
mse = mean_squared_error(y, y_pred)
### END SOLUTION

In [54]:
print(f'The mean squared error equals {mse:.0f} ')

The mean squared error equals 1160739 


In [51]:
### Check your code
assert 1160738 < mse < 1160740 

#### Root mean squared error

Use the scikit-learn function `mean_squared_error()` to calculate the root mean squared error and save the result as `rmse`.

Hint:

---

```pyhton
___ = ___(___, ___, ___)
```

- use the option `squared=False` 

---

In [58]:
### BEGIN SOLUTION
rmse = mean_squared_error(y, y_pred, squared=False)
### END SOLUTION

In [59]:
print(f'The root mean squared error equals {rmse:.0f} ')

The root mean squared error equals 1077 


In [60]:
### Check your code
assert 1076 < rmse < 1078 

#### Mean absolute error

Use the scikit-learn function `mean_absolute_error()` to calculate the mean absolute error and save the result as `mae`.

Hint:

---

```pyhton
___ = ___(___, ___)
```


---

In [63]:
### BEGIN SOLUTION
mae = mean_absolute_error(y, y_pred)
### END SOLUTION

In [65]:
print(f'The mean absolute error equals {mae:.0f} ')

The mean absolute error equals 886 


In [None]:
### Check your code
assert 885 < mae < 887 