# What is Correlation?
### Correlation is a statistical term describing the degree to which two variables move in coordination with one another. If the two variables move in the same direction, then those variables are said to have a positive correlation. If they move in opposite directions, then they have a negative correlation.

In [1]:
import pandas as pd
df=pd.read_csv("CorrelationData.csv")

## Formula to find Correlation

$ r= n×(∑(X,Y)−(∑(X)×∑(Y)))/sqrt((n×∑(X^2)−∑(X)^2)×(n×∑(Y^2)−∑(Y)^2)) $

In [2]:
# Where
# xy is product of respective index of each column
df["xy"]=df["Feature1"]*df["Feature2"]

In [3]:
# x^2 squares of data of column1
# y^2 squares of data of column2
df["x^2"]=df["Feature1"]**2
df["y^2"]=df["Feature2"]**2
df

Unnamed: 0,Feature1,Feature2,xy,x^2,y^2
0,34,78,2652,1156,6084
1,56,109,6104,3136,11881
2,43,90,3870,1849,8100
3,55,76,4180,3025,5776
4,23,52,1196,529,2704
5,45,95,4275,2025,9025
6,41,87,3567,1681,7569
7,52,103,5356,2704,10609
8,35,80,2800,1225,6400


In [4]:
# Now we can find correlation using developed DafaFrame
# Let's calculate numerator first
NoofObservations = len(df)
Feature1Sum = df["Feature1"].sum()
Feature2Sum = df["Feature2"].sum()
Sumofxy = df["xy"].sum()
Numerator= (NoofObservations * Sumofxy -( Feature1Sum * Feature2Sum ))
Numerator

10320

In [5]:
# Calculating dinominator
# (7 x 11,534 - 268^2) x (7 x 39,174 - 518^2)
Feature1SquareSum = df["x^2"].sum()
Feature2SquareSum = df["y^2"].sum()
Dinominator =( NoofObservations * Feature1SquareSum - Feature1Sum**2 ) * ( NoofObservations * Feature2SquareSum - Feature2Sum**2 )
Dinominator = Dinominator**0.5

In [6]:
CorrelaionCoefficient=Numerator/Dinominator
CorrelaionCoefficient

0.7824515125385784

In [7]:
# correlation ranges between -1 and 1
# 0-1 means postive correlation i.e. values of column1 are positively related with column2 and vice-versa
# If value of column1 increases then value of column2 will increase
# -1 to 0 means negative correlation i.e. values of column1 are negatively related with column2 and vice-versa
# If value of column1 increases then value of column2 will decrease
# 0 means there is no any correlation between sets