Q1. Pearson Correlation Coefficient

Let's assume the data for the amount of time students spend studying (in hours) and their corresponding exam scores are as follows:

Study Time (hours)	Exam Score
10	                85
5	                65
8	                75
3	                50
7	                70

In [1]:
import numpy as np
from scipy.stats import pearsonr

study_time = [10, 5, 8, 3, 7]
exam_score = [85, 65, 75, 50, 70]

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(study_time, exam_score)

print(f"Pearson Correlation Coefficient: {pearson_corr:.2f}")

Pearson Correlation Coefficient: 0.99


Interpretation: A positive Pearson correlation coefficient (closer to +1) indicates a positive linear relationship between study time and exam scores. In other words, students who spend more time studying tend to achieve higher exam scores.

Q2. Spearman's Rank Correlation

Assuming the data for the amount of sleep individuals get and their job satisfaction levels are as follows:

In [None]:
Sleep (hours)	Job Satisfaction
7	            8
6	            6
8	            9
5	            3
6	            5

In [2]:
from scipy.stats import spearmanr

sleep_hours = [7, 6, 8, 5, 6]
job_satisfaction = [8, 6, 9, 3, 5]

# Calculate Spearman's rank correlation
spearman_corr, _ = spearmanr(sleep_hours, job_satisfaction)

print(f"Spearman's Rank Correlation: {spearman_corr:.2f}")

Spearman's Rank Correlation: 0.97


Interpretation: A positive Spearman's rank correlation (closer to +1) indicates a monotonic positive relationship between sleep hours and job satisfaction. In other words, individuals who get more sleep tend to have higher job satisfaction levels.

Q3. Pearson and Spearman Correlation for Exercise and BMI

Let's assume the data for the number of hours of exercise per week and BMI for 50 participants are available:

In [3]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

exercise_hours = [3, 4, 5, 2, 6]  
bmi = [25.2, 22.5, 28.1, 30.0, 24.8] 

# Calculate Pearson correlation coefficient
pearson_corr_exercise_bmi, _ = pearsonr(exercise_hours, bmi)

# Calculate Spearman's rank correlation
spearman_corr_exercise_bmi, _ = spearmanr(exercise_hours, bmi)

print(f"Pearson Correlation Coefficient: {pearson_corr_exercise_bmi:.2f}")
print(f"Spearman's Rank Correlation: {spearman_corr_exercise_bmi:.2f}")

Pearson Correlation Coefficient: -0.40
Spearman's Rank Correlation: -0.50


Comparison:

- Pearson correlation measures linear relationships, so a positive Pearson correlation would suggest that as exercise hours increase, BMI tends to decrease (inverse relationship).
- Spearman's rank correlation assesses monotonic relationships. If participants with higher exercise hours tend to have lower BMIs (or vice versa), Spearman's rank correlation would be positive.

Q4. Pearson Correlation Coefficient for TV Watching and Physical Activity

Let's assume the data for the number of hours of television watching per day and the level of physical activity (scaled) for 50 participants are available:

In [4]:
tv_hours = [2, 3, 4, 5, 2]  
physical_activity = [7, 6, 5, 3, 7]  

# Calculate Pearson correlation coefficient
pearson_corr_tv_physical, _ = pearsonr(tv_hours, physical_activity)

print(f"Pearson Correlation Coefficient: {pearson_corr_tv_physical:.2f}")

Pearson Correlation Coefficient: -0.99


Interpretation: The Pearson correlation coefficient will help you understand the linear relationship between the number of hours individuals spend watching television and their level of physical activity. A negative Pearson correlation would suggest that as TV watching hours increase, physical activity tends to decrease, indicating a potential inverse relationship.

Q5. Relationship Between Age and Soft Drink Preference

Since one variable is categorical (soft drink preference) and the other is ordinal (age), calculating the Pearson correlation coefficient is not appropriate. Instead, you can calculate a point-biserial correlation coefficient or use other methods to analyze the relationship between these variables.

For point-biserial correlation, you need to convert the categorical variable (soft drink preference) into a binary variable (e.g., 1 for Coke, 0 for other) and then calculate the correlation with age.

Assuming you have the following data:

In [5]:
import numpy as np
from scipy.stats import pointbiserialr

# Sample data for age and soft drink preference
age = [25, 42, 37, 19, 31, 28]
soft_drink_preference = [1, 0, 0, 0, 1, 1]  # 1 for Coke, 0 for other

# Calculate point-biserial correlation coefficient
correlation_coefficient, p_value = pointbiserialr(soft_drink_preference, age)

print(f"Point-Biserial Correlation Coefficient: {correlation_coefficient:.2f}")
print(f"P-value: {p_value:.4f}")

Point-Biserial Correlation Coefficient: -0.31
P-value: 0.5520


In [None]:
Interpretation: The point-biserial correlation coefficient will provide a value between -1 and 1. A positive value suggests a positive relationship between choosing Coke and age, while a negative value suggests an inverse relationship. The p-value indicates the significance of the correlation.