Solution to: [Day 7: Pearson Correlation Coefficient I](https://www.hackerrank.com/challenges/s10-pearson-correlation-coefficient/problem)

<h1 id="tocheading">Table of Contents</h1>
<div id="toc"></div>

In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

# Notes

## Covariance
This is a measure of how two random variables change together, or the strength of their correlation.

Consider two random variables, X and Y, each with n values (i.e., x1, x2, x3, ... xn  and y1, y2, y3, ... yn ). 
The covariance of X and Y can be found using the following formula:

\begin{equation}
\large
cov(X, Y) = \frac
{1}{n}
\sum_{1}^{n} (x - \bar{x}) (y - \bar{y})
\end{equation}



## Pearson Correlation Coefficient
The Pearson correlation coefficient, ρxy, is given by:

\begin{equation}
\large
\text{p}(xy) = \frac
{cov(X, Y)}
{\sigma x * \sigma y}
\end{equation}

You may also see ρ(xy) written as r(xy).


# Solution

## Imports

In [2]:
from typing import Tuple

## Input

In [3]:
def get_input() -> Tuple[int, list, list]:
	"""Returns input for Pearson Correlation Coefficient I

	Returns:
		Tuple[int, list, list]: num items, X, Y
	"""
	num_items = int(input())
	x = [float(val) for val in input().split()]
	y = [float(val) for val in input().split()]
	return num_items, x, y

## Covariance

In [4]:
def calc_mean(x: list) -> float:
	"""Returns mean of list

	Args:
		x (list): List to calculate mean

	Returns:
		float: Mean of x
	"""
	return sum(x) / len(x)

In [5]:
def calc_sd(x: list) -> float:
	"""Returns standard deviation of list.

	Args:
		x (list): List to calculate sd

	Returns:
		float: Sd of list
	"""
	x_mean = calc_mean(x)
	
	sd_num = 0
	for item in x:
		sd_num += (item - x_mean) ** 2

	return (sd_num / len(x)) ** (1/2)

In [6]:
def calc_cov(num_items: int, x: list, y: list) -> float:
	"""Returns covariance for x and y

	Args:
		num_items (int): length of x and y
		x (list): series 1
		y (list): series 2

	Returns:
		float: covariance between x and y
	"""
	assert len(x) == len(y)
	mean_x, mean_y = calc_mean(x), calc_mean(y)

	cov_total = 0
	for i in range(num_items):
		cov_total += (x[i] - mean_x) * (y[i] - mean_y)
	return cov_total / num_items

In [7]:
def calc_pearson_coef(num_items: int, x: list, y: list) -> float:
	"""Returns pearson's coefficient between two lists."""
	cov = calc_cov(num_items, x, y)
	sd_x, sd_y = calc_sd(x), calc_sd(y)

	return cov / (sd_x * sd_y)

In [8]:
def print_to_scale(num: int) -> None:
	"""Prints number to 3 decimal places.

	Args:
		num (int): Number to print
	"""
	print(f"{num :.3f}")

## Main

In [9]:
def main():
	num_items, x, y = get_input()
	pearson_coef = calc_pearson_coef(num_items, x, y)
	print_to_scale(pearson_coef)

In [10]:
if __name__ == "__main__":
	main()

10
10 9.8 8 7.8 7.7 7 6 5 4 2 
200 44 32 24 22 17 15 12 8 4
0.612
