Solution to: [Day 8: Least Square Regression Line](https://www.hackerrank.com/challenges/s10-least-square-regression-line/problem)

<h1 id="tocheading">Table of Contents</h1>
<div id="toc"></div>

- Table of Contents
- Notes
    - Regression Line
    - Finding the Value of b
    - Finding the Value of a
    - Sums of Squares
    - Coefficient of Determination (R-squared)
- Example
    - Data
    - Important stats
    - Compute b
    - Compute a
    - Regression
- Solution
    - Imports
    - Input
    - Mean
    - Calculate b
    - Calculate a
    - Predict
    - Scale
    - Main

In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

# Notes

## Regression Line
If our data shows a linear relationship between X and Y, then the straight line which best describes the relationship is the regression line. The regression line is given by 

\begin{equation}
\large
Y = a + bX.
\end{equation}

## Finding the Value of b
The value of b can be calculated using either of the following formulae:

\begin{equation}
\large
b = \frac
{n \sum (x_{i} y_{i}) - (\sum x_{i}) (\sum y_{i})}
{n \sum x_{i} ^{2} - (\sum x_{i}) ^{2}}
\end{equation}

*OR**

\begin{equation}
\large
b = p \frac
{\sigma Y}
{\sigma X}
\end{equation}

- where p is the Pearson Correlation

## Finding the Value of a
\begin{equation}
\large
a = ȳ - b * x̄
\end{equation}

## Sums of Squares
- **Total Sum of Squares** (SST) = $\sum (y_{i} - \bar{y})^{2}$
- **Regression Sum of Squares** (SSR) = $\sum (\hat{y}_{i} - \bar{y})^{2}$
- **Error Sum of Squares** (SSE) = $\sum (\hat{y}_{i} - y)^{2}$


*Notes*:
- ŷ = predicted value
- ȳ = mean value
- yi = i-th value of y


If SSE is small, we can assume that our fit is good.

## Coefficient of Determination (R-squared)

\begin{equation}
\large
R ^{2} = \frac
{SSR}{SST} = 1 - \frac{SSE}{SST}
\end{equation}

$R^{2}$ multiplied by 100  gives the percent of variation attributed to the linear regression between Y and X.

# Example

## Data

    x = [1,2,3,4,5]
    y = [2,1,4,3,5]

## Important stats
- n = 5
- $\sum x$ = 15
- $\bar{X}$ = 3
- $\sum Y$ = 15
- $\bar{y}$ = 3
- $X^{2} = Σ(x**2) = 55$
- XY = $\sum(xy)$ = 53


## Compute b

\begin{equation}
\large
b = \frac
{n \sum (x_{i} * y_{i}) - \sum x_{i} * \sum y_{i}}
{n \sum x_{i}^{2} - (\sum x_{i})^{2}}
\end{equation}

\begin{equation}
\large
b = \frac
{5 * 53 - 15 * 15}
{5 * 55 - 15 * 2}
\end{equation}

\begin{equation}
\large
= 0.8
\end{equation}

## Compute a
\begin{equation}
\large
a = \hat{y} - b * \hat{x}
\end{equation}

\begin{equation}
\large
= 3 - 0.8 * 3 = 0.6
\end{equation}


## Regression 
\begin{equation}
\large
\hat{y} = 0.6 + 0.8 * y
\end{equation}

# Solution

## Imports

In [2]:
from typing import Tuple

## Input

In [3]:
def get_input() -> Tuple[list, list]:
    """Returns tuple representing data points

    Returns:
        Tuple[list, list]: X and Y, respectively
    """
    num_inputs = 5
    x, y = [], []
    for i in range(num_inputs):
        x_val, y_val = [float(val) for val in input().split()]
        x.append(x_val)
        y.append(y_val)

    return (x, y)

## Mean

In [4]:
def calc_mean(x: list) -> float:
    """Returns mean of list

    Args:
        x (list): Input list

    Returns:
        float: mean of list
    """
    return sum(x) / len(x)


## Calculate b

In [5]:
def calc_b(x: list, y: list) -> float:
    """Returns b coefficient for x in simple linear regression

    Args:
        x (list): series 1
        y (list): series 2

    Returns:
        float: b coefficient for x
    """
    n = len(x)
    xy = sum([x[i] * y[i] for i in range(len(x))])
    x_sq_sum = sum([val**2 for val in x])

    return (n * xy - sum(x) * sum(y)) / ( n * x_sq_sum - sum(x) ** 2)

## Calculate a

In [6]:
def calc_a(x: list, y: list, b: float) -> float:
    """Returns `a` coefficient for regression equation.

    Args:
        x (list): predictor variable
        y (list): Outcome variable
        b (float): coefficient for predictor variable

    Returns:
        float: slope intercept
    """
    return calc_mean(y) - b * calc_mean(x)

## Predict 

In [7]:
def predict_val(a: float, b: float, x: int = 80) -> float:
    """Predicts value for simple linear regression equation

    Args:
        a (float): Slope intercept constant
        b (float): Coefficient for x
        x (int): value of X

    Returns:
        float: predicted value of y
    """
    return a + b * x

## Scale

In [8]:
def print_to_scale(num: float) -> None:
    """Prints number to 3 decimal

    Args:
        num (float): Number to print
    """
    print(f"{num :.3f}")

## Main

In [9]:
def main():
    x, y = get_input()

    b = calc_b(x, y)
    a = calc_a(x, y, b)

    prediction = predict_val(a, b)
    print_to_scale(prediction)


if __name__ == "__main__":
    main()

95 85
85 95
80 70
70 65
60 70
78.288
