## Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [2]:
import pandas as pd

# create sample database

data = { 'time' :[6,7,8,1,5,4,9,3,11,9],
        'score' : [50,56,89,40,30,15,35,19,76,80]}
df = pd.DataFrame(data)

cor = df.corr('pearson')
print(cor)

           time     score
time   1.000000  0.686754
score  0.686754  1.000000


We can conclude:

- As the values are positive we can say time and score are positively correlated.
- With more time spending on studies, the more score student get

## Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

In [9]:
from scipy.stats import spearmanr

# Example data
sleep = [8, 7, 6, 5, 9, 5, 4]
job_satisfaction = [9,8,7,6,5,4,3]

# Calculate Spearman's rank correlation coefficient and p-value
rho, pval = spearmanr(sleep, job_satisfaction)

print("Spearman's rank correlation coefficient: ", rho)
print("p-value: ", pval)


Spearman's rank correlation coefficient:  0.6126374746329801
p-value:  0.1435885753012859


The Spearman's rank correlation coefficient is 0.612, which indicates a strong, positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. This means that as the amount of sleep increases, so does the job satisfaction level, and vice versa.

## Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [27]:
import numpy as np
from scipy.stats import spearmanr, pearsonr

# create dummy sample
np.random.seed(123)
arr = np.linspace(15, 30, 50)
bmi = np.round(arr, 1)
hours = np.random.randint(0, 6, 50)

# Calculate the Spearman's rank correlation coefficient
spear, p_val = spearmanr(hours, bmi)

# Calculate the Pearson correlation coefficient
pear, p_value = pearsonr(hours, bmi)

print("Pearson correlation coefficient:", pear)
print("Spearman's rank correlation coefficient:", spear)

Pearson correlation coefficient: 0.09892065761030128
Spearman's rank correlation coefficient: 0.10103330620173835


We can conclude:

- Both the Pearson correlation coefficient and the Spearman's rank correlation coefficient are positive but close to zero, which suggests a very weak positive correlation between the number of hours of exercise per week and BMI, which can be neglect. 

- So we can say there is no correlation between the number of hours of exercise per week and BMI

- There is a slight difference between both the correlations, spearman rank correlation coefficient is more than pearson correlation coefficient.

## A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [35]:
import numpy as np
from scipy.stats import pearsonr

# create dummy sample
np.random.seed(50)
activity = np.random.randint(1, 10, 50)
tv_hours = np.random.randint(0, 6, 50)


# Calculate the Pearson correlation coefficient
pear, p_value = pearsonr(tv_hours, activity)

print("Pearson correlation coefficient:", pear)

Pearson correlation coefficient: 0.13635879341904195


We can conclude:

- The Pearson correlation coefficient is positive but close to zero, which suggests a very weak positive correlation between the number of hours and activity, which can be neglect. 

- So we can say there is no correlation between the number of hours of watching tv per day and their level of physical activity.

## A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

#### 'Age': [ 25, 42,37, 19, 31,28], 'Soft drink Preference':['Coke',' Pepsi', 'Mountain dew','Coke','Pepsi','Coke']

In [58]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

data = {
    'Age': [ 25, 42,37, 19, 31,28],
    'Preferences':['Coke','Pepsi', 'MountainDew','Coke','Pepsi','Coke']}

df = pd.DataFrame(data)
encoder = OneHotEncoder()

val = encoder.fit_transform(df[['Preferences']]).toarray()
df1 = pd.DataFrame(val, columns=['pepsi', 'coke', 'Mountaindew'])
df2 = pd.concat([df, df1], axis=1)
df2.drop(columns=['Preferences'], inplace= True)

df2.corr()


Unnamed: 0,Age,pepsi,coke,Mountaindew
Age,1.0,-0.83724,0.394132,0.576439
pepsi,-0.83724,1.0,-0.447214,-0.707107
coke,0.394132,-0.447214,1.0,-0.316228
Mountaindew,0.576439,-0.707107,-0.316228,1.0


## A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [61]:
from scipy.stats import pearsonr
import numpy as np

# generating the data for sales per call ans week 
np.random.seed(123)
sales_call = np.random.randint(20,60,30)
sales_per_week = np.random.randint(50, 80,30)


# Calculate the Pearson correlation coefficient
corr, pval = pearsonr(sales_call, sales_per_week)

print("Pearson correlation coefficient:", corr)

Pearson correlation coefficient: 0.1903819723067468
