**[Q1]Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.**

To calculate the Pearson correlation coefficient between the amount of time students spend studying and their final exam scores, you can use the following formula:

Pearson correlation coefficient (r) = (Σ((x - x̄)(y - ȳ))) / (n σx σy)
Where:

 - Σ denotes the sum of
 - x and y are the individual data points
 - x̄ and ȳ are the means of x and y
 - n is the number of data points
 - σx and σy are the standard deviations of x and y
 
 The result will be a number between -1 and 1.

 - If r is close to 1, it means there's a strong positive linear relationship between study time and exam scores.
 - If r is close to -1, it means there's a strong negative linear relationship.
 - If r is close to 0, it means there's little to no linear relationship.

In [1]:
import pandas as pd
data = {
    'study_time':[3,2,1,8,6,0,5],
    'exam_score':[65,50,20,83,78,10,70]
}
data = pd.DataFrame(data)
data.corr(method='pearson')

Unnamed: 0,study_time,exam_score
study_time,1.0,0.920006
exam_score,0.920006,1.0


Here,you can see there is a strong positive correlation.

**[Q2]Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.**

To calculate the Spearman's rank correlation between the amount of sleep individuals get each night and their overall job satisfaction level, you can follow these steps:

  1)Rank the data for each variable separately, from lowest to highest.
  2)Calculate the differences in ranks for each pair of data points.
  3)Square each of these differences.
  4)Sum up all the squared differences.
  5)Use the formula:
  
 Spearman's rank correlation (ρ) = 1 - (6 Σ(d^2)) / (n (n^2 - 1))
  
Where:

 - Σ denotes the sum of 
 - d is the difference in ranks for each pair of data points
 - n is the number of data points
 
The result will be a number between -1 and 1.

  - If ρ is close to 1, it means there's a strong positive monotonic relationship between sleep and job satisfaction.
  - If ρ is close to -1, it means there's a strong negative monotonic relationship.
  - If ρ is close to 0, it means there's little to no monotonic relationship.

In [2]:
data = {
    'sleep (in hrs)':[2,4,7,9,10,4,8,6],
    'job satisfaction level':[1,3,6,8,7,3,9,8]
}
data = pd.DataFrame(data)
data.corr(method='spearman')

Unnamed: 0,sleep (in hrs),job satisfaction level
sleep (in hrs),1.0,0.751529
job satisfaction level,0.751529,1.0


here,you can see the sleep and job satisfaction level is positively correlated

**[Q3]Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.**

In [3]:
import numpy as np
import random

In [12]:
def add_people(n):
    no_of_hrs = []
    BMI = []
    for i in range(n):
        hrs = random.randrange(1,10)
        no_of_hrs.append(hrs)
        bmi = random.randrange(40,140)
        BMI.append(bmi)
    return no_of_hrs,BMI

In [13]:
hrs,bmi = add_people(50)

In [14]:
data = {
    'No_of_hrs' : hrs,
    'BMI':bmi
}

In [17]:
import pandas as pd
data = pd.DataFrame(data)

In [18]:
data.head()

Unnamed: 0,No_of_hrs,BMI
0,4,42
1,9,100
2,4,84
3,4,80
4,8,64


In [19]:
data.corr(method='pearson')

Unnamed: 0,No_of_hrs,BMI
No_of_hrs,1.0,-0.132631
BMI,-0.132631,1.0


In [20]:
data.corr(method='spearman')

Unnamed: 0,No_of_hrs,BMI
No_of_hrs,1.0,-0.141134
BMI,-0.141134,1.0


**[Q4]A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.**

In [21]:
def add_participants(n):
    no_of_hrs = []
    physical_activity = []
    for i in range(n):
        hrs = random.randrange(1,13)
        no_of_hrs.append(hrs)
        activity = random.randrange(5)
        physical_activity.append(activity)
    return no_of_hrs,physical_activity

In [22]:
hrs,activity = add_participants(50)

In [23]:
data = {
    'hrs':hrs,
    'physical activity': activity
}

In [24]:
data = pd.DataFrame(data)

In [25]:
data.head()

Unnamed: 0,hrs,physical activity
0,9,1
1,10,0
2,12,4
3,8,0
4,12,4


In [26]:
data.corr(method='pearson')

Unnamed: 0,hrs,physical activity
hrs,1.0,-0.007823
physical activity,-0.007823,1.0


In [27]:
data.corr(method='spearman')

Unnamed: 0,hrs,physical activity
hrs,1.0,0.001185
physical activity,0.001185,1.0


**[Q5]A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below**

In [28]:
data = {
    'age': [25,42,37,19,31,28],
    'soft drink preferred' : ['Coke','pepsi','Mountain dew','Coke','pepsi','Coke']
}

In [29]:
data = pd.DataFrame(data)

In [31]:
data.head()

Unnamed: 0,age,soft drink preferred
0,25,Coke
1,42,pepsi
2,37,Mountain dew
3,19,Coke
4,31,pepsi


In [32]:
data['age_rank'] = data['age'].rank()
data['soft_drink_rank'] = data['soft drink preferred'].rank()

spearman_corr = data['age_rank'].corr(data['soft_drink_rank'], method='spearman')
print("Spearman's rank correlation coefficient:", spearman_corr)

Spearman's rank correlation coefficient: 0.8332380897952965


In [33]:
data.head()

Unnamed: 0,age,soft drink preferred,age_rank,soft_drink_rank
0,25,Coke,2.0,2.0
1,42,pepsi,6.0,5.5
2,37,Mountain dew,5.0,4.0
3,19,Coke,1.0,2.0
4,31,pepsi,4.0,5.5


**[Q6]A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.**

In [35]:
def add_participants(n):
    sales_call = []
    sales_made = []
    for i in range(n):
        call = random.randrange(1,20)
        sales_call.append(call)
        sales = random.randrange(1,30)
        sales_made.append(sales)
    return sales_call,sales_made

In [36]:
call,sales = add_participants(30)

In [37]:
data = {
    'sales call' : call,
    'sales made(per week)' : sales
}


In [38]:
data = pd.DataFrame(data)
data.head()

Unnamed: 0,sales call,sales made(per week)
0,10,27
1,15,12
2,17,24
3,18,5
4,15,27


In [39]:
data.corr(method='pearson')

Unnamed: 0,sales call,sales made(per week)
sales call,1.0,0.210549
sales made(per week),0.210549,1.0
