# Day 8: Least Square Regression Line

https://www.hackerrank.com/challenges/s10-least-square-regression-line/tutorial

$$y = a + b.x$$

### Finding the value of b, in 2 ways:

$$b = \frac {n\sum(x_iy_i) - (\sum{x_i})(\sum{y_i})} {n\sum(x_i^2) - (\sum{x_i)}^2}$$

$$b = \rho . \frac {\sigma_Y} {\sigma_X}$$

Where $\rho$ is the Pearson correlation coefficiant

### Finding the value of a

$$a = \bar {y} - b . \bar {x}$$

In [12]:
# Another way to find a, b by using sklearn

from sklearn import linear_model
import numpy as np
xl = [1, 2, 3, 4, 5]
x = np.asarray(xl).reshape(-1, 1)
y = [2, 1, 4, 3, 5]
lm = linear_model.LinearRegression()
lm.fit(x, y)
print(lm.intercept_)
print(lm.coef_[0])

0.6
0.8


### Exercise

A group of five students enrolls in Statistics immediately after taking a Math aptitude test. Each student's Math aptitude test score, x, and Statistics course grade, y, can be expressed as the following list of points:


In [8]:
l0 = '95 85'
l1 = '85 95'
l2 = '80 70'
l3 = '70 65'
l4 = '60 70'


In [14]:
from sklearn import linear_model
import numpy as np
xl = [95, 85, 80, 70, 60]
x = np.asarray(xl).reshape(-1, 1)
y = [85, 95, 70, 65, 70]
lm = linear_model.LinearRegression()
lm.fit(x, y)
a = lm.intercept_
b = lm.coef_[0]

# If we know math score of student (x = 80) --> stat score (y = a + b*x) == 78.288
math_score = 80
stat_score = a + b * math_score
print(round(stat_score,3))

78.288


In [13]:
# Create a list x, y where math and stat scores of each student are stored

x = [] # math scores of all student
y = [] # stat scores of all student

for i in range(5):
    a = [int(i) for i in input().split()]
    x.append(a[0])
    y.append(a[1])

n = len(x)
mean_x = sum(x) / n
mean_y = sum(y) / n
x_square = sum([i ** 2 for i in x])
sum_xy = sum([x[i] * y[i] for i in range(n)])

# find a and b
math_score = 80
b = (n * sum_xy - sum(x) * sum(y)) / (n * x_square - sum(x) ** 2)
a = mean_y - mean_x * b

# If we know math score of student (x = 80) --> stat score (y = a + b*x)
stat_score = a + b * math_score
print(round(stat_score, 3))

95 85
85 95
80 70
70 65
60 70
78.288


# Day 8: Pearson Correlation Coefficient II

The regression line of y on x is 3x + 4y + 8 = 0 and 
The regression line of x on y is 4x + 3y + 7 = 0
What is the value of the Pearson correlation coefficient?

--> we can rewrite as:
y = -8 + (-3/4).x
x = -7 + (-3/4).y

Using the 2 formula below, we can solve the problem:

$$b = \rho . \frac {\sigma_Y} {\sigma_X}$$

$\rho_{XY}$, The Pearson correlation coefficient

Here we have b1=b2=(-3/4) --> $\rho ^ 2$ = b1 * b2 = $(-3/4) ^ 2$ --> $\rho$ = -3/4 (minus because the reverse in b1 b2)