## Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

Answer:
The Pearson correlation coefficient (r) measures the strength and direction of a linear relationship between two variables.

In [3]:
import numpy as np
from scipy.stats import pearsonr

study_hours = [2, 4, 6, 8, 10]
exam_scores = [55, 60, 65, 70, 75]

r, p = pearsonr(study_hours, exam_scores)
print("Pearson Correlation Coefficient:", r)
print("P-value:", p)


Pearson Correlation Coefficient: 1.0
P-value: 0.0


Interpretation:
If r is close to 1, it indicates a strong positive correlation: as study time increases, exam scores increase. A p-value < 0.05 suggests the correlation is statistically significant.

## Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

Answer:
Spearman's correlation is used when the relationship is monotonic but not necessarily linear.

In [6]:
from scipy.stats import spearmanr

sleep_hours = [6, 5, 7, 8, 4]
job_satisfaction = [7, 6, 8, 9, 5]

rho, p = spearmanr(sleep_hours, job_satisfaction)
print("Spearman's Rank Correlation:", rho)
print("P-value:", p)


Spearman's Rank Correlation: 0.9999999999999999
P-value: 1.4042654220543672e-24


Interpretation:
If rho is close to 1, there is a strong monotonic relationship. A significant p-value means the relationship is unlikely due to chance.

## Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [9]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

np.random.seed(0)
exercise_hours = np.random.randint(0, 10, 50)
bmi = np.random.normal(25, 3, 50)

r_pearson, _ = pearsonr(exercise_hours, bmi)
r_spearman, _ = spearmanr(exercise_hours, bmi)

print("Pearson:", r_pearson)
print("Spearman:", r_spearman)


Pearson: -0.036502342462108525
Spearman: -0.02008825560455623


Interpretation:
Pearson measures linear correlation; Spearman measures rank/monotonic.

If results differ, there may be a non-linear relationship.

## Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [12]:
tv_hours = np.random.randint(1, 5, 50)
physical_activity = np.random.randint(1, 10, 50)

r, p = pearsonr(tv_hours, physical_activity)
print("Pearson Correlation:", r)


Pearson Correlation: -0.21877569199581479


## Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

In [15]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

data = pd.DataFrame({
    'Age': [25, 42, 37, 19, 31, 28],
    'Preference': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']
})

le = LabelEncoder()
data['Encoded'] = le.fit_transform(data['Preference'])

r, p = pearsonr(data['Age'], data['Encoded'])
print("Pearson Correlation:", r)


Pearson Correlation: 0.7691751415594738


## Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [18]:
sales_calls = np.random.randint(10, 50, 30)
weekly_sales = np.random.randint(5, 25, 30)

r, p = pearsonr(sales_calls, weekly_sales)
print("Pearson Correlation:", r)


Pearson Correlation: -0.19516728015115714


Interpretation:
A positive r suggests that more calls result in more sales. This helps in performance analysis and strategy planning.