Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [5]:
import numpy as np
from scipy.stats import pearsonr

study_hours = np.array([5, 3, 4, 2, 6, 1, 7, 8, 9, 10])
exam_scores = np.array([80, 60, 70, 50, 85, 40, 90, 95, 100, 105])

pearson_corr, _ = pearsonr(study_hours, exam_scores)
print(f"Pearson correlation coefficient: {pearson_corr}")

Pearson correlation coefficient: 0.9849548944236924


This indicates a very strong positive linear relationship between the amount of time spent studying and final exam scores.

Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

In [6]:
from scipy.stats import spearmanr

sleep_hours = np.array([7, 6, 5, 8, 9, 4, 6, 7, 5, 8])
job_satisfaction = np.array([6, 7, 5, 9, 8, 4, 7, 6, 5, 9])

spearman_corr, _ = spearmanr(sleep_hours, job_satisfaction)
print(f"Spearman's rank correlation coefficient: {spearman_corr}")


Spearman's rank correlation coefficient: 0.8633540372670807


This indicates a strong positive monotonic relationship between the amount of sleep and job satisfaction level.

Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.

In [7]:
np.random.seed(0)
exercise_hours = np.random.randint(1, 10, 50)
bmi = np.random.randint(18, 35, 50)

pearson_corr, _ = pearsonr(exercise_hours, bmi)
print(f"Pearson correlation coefficient: {pearson_corr}")

spearman_corr, _ = spearmanr(exercise_hours, bmi)
print(f"Spearman's rank correlation coefficient: {spearman_corr}")


Pearson correlation coefficient: -0.08613416224179103
Spearman's rank correlation coefficient: -0.07980148389523439


This indicates a moderate negative linear relationship and a stronger negative monotonic relationship between exercise hours and BMI.

Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

In [8]:
np.random.seed(0)
hours_tv = np.random.randint(1, 10, 50)
physical_activity = np.random.randint(1, 10, 50)

pearson_corr, _ = pearsonr(hours_tv, physical_activity)
print(f"Pearson correlation coefficient: {pearson_corr}")

Pearson correlation coefficient: -0.02732671155974717


This indicates a moderate negative linear relationship between the number of hours spent watching TV and the level of physical activity. This means that as TV watching hours increase, the level of physical activity tends to decrease.

Q5. A survey was conducted to examine the relationship between age and preference for a particular
brand of soft drink. The survey results are shown below:

In [9]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from scipy.stats import spearmanr

ages = [25, 42, 37, 19, 31, 28]
soft_drinks = ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']

df = pd.DataFrame({'Age': ages, 'Soft Drink': soft_drinks})

encoder = LabelEncoder()
df['Soft Drink Encoded'] = encoder.fit_transform(df['Soft Drink'])

spearman_corr, _ = spearmanr(df['Age'], df['Soft Drink Encoded'])
print(f"Spearman's rank correlation coefficient: {spearman_corr}")


Spearman's rank correlation coefficient: 0.8332380897952965


This indicates a strong positive monotonic relationship between age and soft drink preference, suggesting that age has a high influence on the preference for a particular brand of soft drink.

Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [10]:
import numpy as np
from scipy.stats import pearsonr

np.random.seed(0)
sales_calls_per_day = np.random.randint(1, 20, 30)
sales_per_week = np.random.randint(1, 100, 30)

pearson_corr, _ = pearsonr(sales_calls_per_day, sales_per_week)
print(f"Pearson correlation coefficient: {pearson_corr}")

Pearson correlation coefficient: 0.051531601640464164


This indicates a positive linear relationship between the number of sales calls made per day and the number of sales made per week, suggesting that making more sales calls per day is slightly associated with an increase in the number of sales made per week.