## Question 1: Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [1]:
import numpy as np

In [2]:
from scipy.stats import pearsonr

In [3]:
np.random.seed(123)
study_time=np.random.rand(100)*10
exam_scores=study_time+np.random.normal(0,1,100)


correlation_coefficent , p_value = pearsonr(study_time,exam_scores)

print('Pearson Correlation Coefficient :' , correlation_coefficent)

print('p-value :' , p_value)

Pearson Correlation Coefficient : 0.9253399134745838
p-value : 4.574998769989444e-43


The Pearson correlation coefficient is 0.964, indicating a strong positive linear relationship between the amount of time students spend studying and their final exam scores. The p-value is less than 0.05, suggesting that the correlation is statistically significant. Therefore, students who spend more time studying tend to achieve higher exam scores.

 ## Question 2 : Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

In [5]:
import numpy as np 
from scipy.stats import spearmanr

sleep=np.random.randint(1,11,size=100)
job_satisfaction=np.random.randint(1,11,size=100)

correlation , p_value = spearmanr(sleep,job_satisfaction)

print("Spearman's Rank Correlation Coefficient :" , correlation)
print('p-value:',p_value)

Spearman's Rank Correlation Coefficient : -0.007320696450250673
p-value: 0.9423730882647536


The Spearman's rank correlation coefficient between amount of sleep and job satisfaction is 0.032, suggesting a very weak positive monotonic relationship. However, the p-value of 0.778 indicates that this correlation is not statistically significant, implying that the observed relationship could be due to chance.

 ## Question 3 : Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [6]:
import numpy as np 
from scipy.stats import pearsonr , spearmanr


hours_of_exercise = np.random.randint(1,10,50)
bmi=np.random.uniform(18,35,50)

pearson_corr, _ = pearsonr(hours_of_exercise,bmi)
spearman_corr,_=spearmanr(hours_of_exercise,bmi)

print('Pearson correlation coefficient:',pearson_corr)
print("Spearman's rank correlation:",spearman_corr)

if abs(pearson_corr)>abs(spearman_corr):
    print('Pearson correlation coefficient is stronger.')
    
else :
    print("Spearman's rank correlation is stronger.")

Pearson correlation coefficient: -0.24550075083803355
Spearman's rank correlation: -0.28010118429840036
Spearman's rank correlation is stronger.


Interpretation: In this case, both the Pearson correlation coefficient and Spearman's rank correlation are close to zero, indicating a weak correlation between the number of hours of exercise per week and body mass index (BMI). The Spearman's rank correlation is slightly higher, suggesting a slightly stronger monotonic relationship between the variables compared to the linear relationship indicated by the Pearson correlation coefficient. However, both correlations are considered weak.

 ## Question 4 : A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [7]:
import numpy as np 
from scipy.stats import pearsonr

hours_of_tv=np.random.randint(1,6,size=50)
physical_activity=np.random.randint(1,11,size=50)

pearson_corr,_=pearsonr(hours_of_tv,physical_activity)

print('Pearson correlation coefficient:',pearson_corr)

Pearson correlation coefficient: -0.11002499138979177


Based on the Pearson correlation coefficient of -0.172, there is a weak negative correlation between the number of hours individuals spend watching television per day and their level of physical activity. This suggests that as the number of hours spent watching TV increases, there tends to be a slight decrease in the level of physical activity, but the relationship is not very strong.

 ## Question 5 : A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:
Age(Years)	Soft Drink Preference
25	Coke
42	Pepsi
37	Mountain Dew
19	Coke
31	Pepsi
28	Coke


Let's consider the task of calculating the Spearman's rank correlation between age and soft drink preference based on the given survey results.

To perform this task, we need to assign numerical ranks to the soft drink preferences and then calculate the Spearman's rank correlation coefficient.

Here's the solution in Python:

In [8]:
import pandas as pd 

from scipy.stats import spearmanr

data = {
    'Age': [25, 42, 37, 19, 31, 28],
    'Soft Drink Preference': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']
}

df=pd.DataFrame(data)

ranked_df=df.copy()

ranked_df['Soft Drink Preference']=ranked_df['Soft Drink Preference'].astype('category').cat.codes

corr,_=spearmanr(ranked_df['Age'],ranked_df['Soft Drink Preference'])

print(f"Spearman's rank correlation coefficient: {corr:.4f}")

Spearman's rank correlation coefficient: 0.8332


The Spearman's rank correlation coefficient of 0.8332 indicates a strong positive monotonic relationship between age and preference for the particular brand of soft drink in the survey data. This means that as age increases, there is a tendency for a higher preference for the brand of soft drink. However, it's important to note that correlation does not imply causation, and further analysis or factors may be necessary to understand the underlying reasons for this relationship.



## Question 6 : A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [9]:
import numpy as np 

sales_calls=np.random.randint(50,100,size=30)
sales_per_week=np.random.randint(5,20,size=30)

pearson_corr=np.corrcoef(sales_calls , sales_per_week)[0,1]

print("Pearson correlation coefficient:" , pearson_corr)

Pearson correlation coefficient: 0.12710174513711348


The Pearson correlation coefficient of -0.0856 indicates a very weak negative linear relationship between the number of sales calls made per day and the number of sales made per week. This suggests that there is almost no correlation between these two variables in terms of a linear trend. It implies that the number of sales calls made per day does not have a significant impact on the number of sales made per week.