## Continuous supervised learning
We worked with **discrete** outputs (1 or 0, fast or slow, binary outputs), now let's see continuous outputs.

Make an output continuous implies that there's some sort of an ordering to it;

A way to write a result of a supervised learned is by  the regression linear equation

y = 500 (slope)
x = 80
net worth = 500/80 * age + 0 (age is 0 the net worth is zero)
net worth = 6.25 * age

in general:
target = slope * input + intercept

##  Slope and intercept
$y=m*x+\beta$

$m$ is slope that is y/x
$\beta$ is the intercept that is the $y$ value when x is zero

In [2]:
x=36
y = (6.25*x) + 0
y

225.0

In [3]:
y = (6.25*x) + 30
y

255.0

## Using sklearn linear regression

* [Generalized Linear Models](https://scikit-learn.org/stable/modules/linear_model.html)
* [sklearn.linear_model.LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression)
* The `.predict()` function expect a list of values;
* We can see the slope (`.coef_`) and intercept (`.intercept_`);
* The R-square score can be calculated by `.score()` function in the test and train data to know the performance.
* The R-square maximum value is 1, so more near 1 it's better is;

In [None]:
from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(ages_train, net_worths_train)

### get Katie's net worth (she's 27)
### sklearn predictions are returned in an array, so you'll want to index into
### the output to get what you want, e.g. net_worth = predict([[27]])[0][0] (not
### exact syntax, the point is the [0] at the end). In addition, make sure the
### argument to your prediction function is in the expected format - if you get
### a warning about needing a 2d array for your data, a list of lists will be
### interpreted by sklearn as such (e.g. [[27]]).
km_net_worth = reg.predict([[27]])[0][0] ### fill in the line of code to get the right value

### get the slope
### again, you'll get a 2-D array, so stick the [0][0] at the end
slope = reg.coef_[0][0] ### fill in the line of code to get the right value

### get the intercept
### here you get a 1-D array, so stick [0] on the end to access
### the info we want
intercept = reg.intercept_[0] ### fill in the line of code to get the right value


### get the score on test data
test_score = reg.score(ages_test, net_worths_test) ### fill in the line of code to get the right value


### get the score on the training data
training_score = reg.score(ages_train, net_worths_train) ### fill in the line of code to get the right value


## Regression linear errors

error = actual net worth - predicted net worth

In [4]:
age = 35
p_nw = 218.75
a_nw = 200
err = a_nw - p_nw
err

-18.75

## Minimizing the sum of squared errors

The best regression is the one that minimizes the $\sum$ of all training points.
$\sum(actual-predicted)^2$

actual is the training points
predicted is the predictions from regression ($y=m*x+\beta$)

* **ordinary least squares (OLS)** used by sklearn LinearRegression
* **gradient descent**

## What regreession line looks "best"?

* Look for the margins;
* There can be multiple lines that minimize $\sum|error|$, but only one line will minimize $\sum{error}^2$;
* Using the SSE (sum of the square error) also makes implementation much easier;
* SSE isn't perfect, as an evaluation metric. As you add more data the sum of the squared error will almost certainly go up, but it doesn't necessarily mean that your fit is doing worse job.

## R-squared ($r^2$) of a regression

Is a very popular evaluation metric for describing the goodness of fit of a linear regression.

How much of my change in the output ($y$) is explained by the change in my input ($x$)?

$0<r^2<1$

If the number is very small (near 0), generally means that the regression line isn't doing a good job of capturing the trend in the data. If is near 1, the regression line is good describing the relationship between the input ($x$) and the output ($y$).

It is independent of the number of training points, been more reliable than a sum of square errors.

With more features is possible to push up the r-squared value

## Visualize the regression

Use scatter plot

## Comparing classification and regression

| property                     | supervised classification | regression                          |
|------------------------------|---------------------------|-------------------------------------|
| output type                  | discrete (class labels)   | continuous (number)                 |
| what are you trying to find? | decision boundary         | "best fit line"                     |
| evaluation                   | accuracy                  | "sum of squared error" or r-squared |

## Multi-variate regression

## Running script

Input salary to predict the bonus
