#### Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

Ans.

In [1]:
import numpy as np

# Given data: Study Time (X) and Exam Scores (Y)
X = np.array([2, 3, 5, 7, 9])
Y = np.array([50, 60, 80, 90, 95])

# Compute Pearson correlation coefficient
r = np.corrcoef(X, Y)[0, 1]
r

0.9692948572674897

---

#### Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

Ans.

In [2]:
from scipy.stats import spearmanr

# Given data: Sleep Hours (X) and Job Satisfaction (Y)
X = np.array([6, 5, 8, 4, 7, 3])
Y = np.array([7, 6, 9, 3, 8, 2])

# Compute Spearman's rank correlation coefficient
rho, _ = spearmanr(X, Y)
rho


1.0

---

#### Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

Ans.

In [3]:
import numpy as np
from scipy.stats import spearmanr

# Set random seed for reproducibility
np.random.seed(42)

# Generate synthetic data: 50 participants
exercise_hours = np.random.uniform(0, 10, 50)  # Weekly exercise hours (0 to 10 hours)
BMI = 30 - 0.5 * exercise_hours + np.random.normal(0, 1.5, 50)  # BMI inversely related to exercise, with some noise

# Compute Pearson and Spearman correlation coefficients
pearson_r = np.corrcoef(exercise_hours, BMI)[0, 1]
spearman_rho, _ = spearmanr(exercise_hours, BMI)

pearson_r, spearman_rho

(-0.7462834887175546, -0.7503001200480192)

---

#### Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

Ans.

In [4]:
# Generate synthetic data for TV hours and physical activity (inverse relationship with some noise)
np.random.seed(42)
tv_hours = np.random.uniform(0, 6, 50)  # TV hours per day (0 to 6 hours)
physical_activity = 10 - 1.2 * tv_hours + np.random.normal(0, 1, 50)  # Physical activity decreases with TV hours

# Compute Pearson correlation coefficient
pearson_tv_activity = np.corrcoef(tv_hours, physical_activity)[0, 1]
pearson_tv_activity

-0.9195529031742303

---

#### Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:
![image.png](attachment:image.png)

Ans.

In [5]:
import pandas as pd
from scipy.stats import chi2_contingency, f_oneway

# Creating the dataset from the table
data = pd.DataFrame({
    "Age": [25, 42, 37, 19, 31, 28],
    "Soft Drink": ["Coke", "Pepsi", "Mountain Dew", "Coke", "Pepsi", "Coke"]
})

# Defining age groups
bins = [18, 25, 35, 45, 55]  # Age group bins
labels = ["18-25", "26-35", "36-45", "46-55"]
data["Age Group"] = pd.cut(data["Age"], bins=bins, labels=labels, right=True)

# Creating a contingency table for the Chi-Square Test
contingency_table = pd.crosstab(data["Age Group"], data["Soft Drink"])

# Performing the Chi-Square Test
chi2_stat, p_val, dof, expected = chi2_contingency(contingency_table)

# Performing ANOVA to compare age means across drink preferences
groups = [data[data["Soft Drink"] == drink]["Age"] for drink in data["Soft Drink"].unique()]
anova_stat, anova_p = f_oneway(*groups)

chi2_stat, p_val, anova_stat, anova_p

(5.000000000000001, 0.2872974951836456, 3.524390243902438, 0.1631217391539759)

---

#### Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

Ans.

In [6]:
# Generate synthetic data for sales calls per day and sales per week
np.random.seed(42)
sales_calls_per_day = np.random.randint(5, 30, 30)  # Sales calls made per day (5 to 30 calls)
sales_per_week = sales_calls_per_day * 2 + np.random.normal(0, 5, 30)  # Sales per week (dependent with some noise)

# Compute Pearson correlation coefficient
pearson_sales_corr = np.corrcoef(sales_calls_per_day, sales_per_week)[0, 1]
pearson_sales_corr

0.9405127666250009