# Introduction to linear regression

## Setup

In [9]:
import pandas as pd
import altair as alt
import warnings

warnings.simplefilter(action='ignore', category=FutureWarning)
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

## Data

We create our own data:

In [10]:
df = pd.DataFrame(
    {'sales': [2500, 4500, 6500, 8500, 10500, 12500, 14500, 16500, 18500, 20500],
      'ads'  : [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]}
)

In [11]:
df

Unnamed: 0,sales,ads
0,2500,1000
1,4500,2000
2,6500,3000
3,8500,4000
4,10500,5000
5,12500,6000
6,14500,7000
7,16500,8000
8,18500,9000
9,20500,10000


## Analysis

Show the relationship between the variables:

In [31]:
chart = alt.Chart(df).mark_point().encode(
   alt.X('ads', axis=alt.Axis(title='Ads (in $)')),
   alt.Y('sales', axis=alt.Axis(title="Sales (in units)"))
)

chart

Let's take a closer look at ad spendings of 2000. What is the sales you would expect?

In [32]:
callout = alt.Chart(df.iloc[1:2]).mark_point(
    color='red', size=300, tooltip="Tooltip text here"
).encode(
    x='ads',
    y='sales'
)

chart + callout

## Custom model

What is your sales prediction for tv ad spendings of 2000?

- calculate a prediction for sales using ad spendings of 2000
- in your code, use a `constant` and add a `number` to multiply with ad spendings
- save the result as `sales_prediction`

Hint:

---

```python

constant = ___
number = ___
ad_spendings = 2000

sales_prediction = constant + number * ad_spendings

```

---

In [21]:
### BEGIN SOLUTION
constant = 500
number = 2
ad_spendings = 2000

sales_prediction = constant + number * ad_spendings
### END SOLUTION

In [25]:
# check your code
assert 4000 <= sales_prediction <= 5000

Next, use your constant and number to make your calculations within pandas:

Hint:

---

```python
df['___'] = ___ + ___ * df['___'] 
```

---

- name the new column 'sales_prediction'


In [13]:
### BEGIN SOLUTION
df['sales_prediction'] = 500 + 2 * df['ads'] 
### END SOLUTION

In [26]:
# Check your code
assert 2000 <= df.iloc[0, 2] <= 3000

In [27]:
df.head()

Unnamed: 0,sales,ads,sales_prediction
0,2500,1000,2500
1,4500,2000,4500
2,6500,3000,6500
3,8500,4000,8500
4,10500,5000,10500


Visualize predictions as a line

In [28]:
chart = alt.Chart(df).mark_point().encode(
    x='ads',
    y='sales'
)

line = alt.Chart(df).mark_line().encode(
         alt.X('ads', axis=alt.Axis(title='Ads (in $)')),
         alt.Y('sales_prediction', axis=alt.Axis(title="Sales (in units)")),
         color=alt.value("#0001F5"))

chart + line