## Q(1)

The Pearson correlation coefficient, often denoted by \(r\), is a measure of the strength and direction of a linear relationship between two continuous variables. It ranges from -1 to 1, where:

- \(r = 1\) indicates a perfect positive linear relationship,
- \(r = -1\) indicates a perfect negative linear relationship,
- \(r = 0\) indicates no linear relationship.

To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores, you can use the following formula:

\[ r = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n} (X_i - \bar{X})^2 \sum_{i=1}^{n} (Y_i - \bar{Y})^2}} \]

Here's a step-by-step explanation:

1. Calculate the mean (\(\bar{X}\)) and standard deviation (\(s_X\)) of the time spent studying (variable \(X\)).
2. Calculate the mean (\(\bar{Y}\)) and standard deviation (\(s_Y\)) of the final exam scores (variable \(Y\)).
3. For each pair of data points \((X_i, Y_i)\), calculate the deviations from the mean: \((X_i - \bar{X})\) and \((Y_i - \bar{Y})\).
4. Multiply the deviations for each pair and sum them up.
5. Divide the result by the product of the standard deviations of \(X\) and \(Y\).
6. The final result is the Pearson correlation coefficient \(r\).

Interpretation:

- If \(r\) is close to 1, it indicates a strong positive linear relationship. As the amount of time spent studying increases, final exam scores tend to increase.
- If \(r\) is close to -1, it indicates a strong negative linear relationship. As the amount of time spent studying increases, final exam scores tend to decrease.
- If \(r\) is close to 0, it indicates a weak or no linear relationship.

Keep in mind that correlation does not imply causation. Even if there is a strong correlation, it does not necessarily mean that one variable causes the other.

Please provide the actual data or more specific values if you want assistance in calculating the Pearson correlation coefficient for your dataset.

## Q(2)

Spearman's rank correlation coefficient (\(\rho\)) is a non-parametric measure of the strength and direction of the monotonic relationship between two variables. Unlike Pearson correlation, Spearman's correlation does not assume a linear relationship and is based on the ranks of the data points.

The formula for Spearman's rank correlation coefficient is as follows:

\[ \rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)} \]

where:
- \( d_i \) is the difference between the ranks of corresponding pairs of data points.
- \( n \) is the number of data points.

Here's a step-by-step explanation of how to calculate Spearman's rank correlation:

1. Rank the data for both variables independently.
2. Calculate the differences (\( d_i \)) between the ranks of corresponding pairs.
3. Square each difference and sum them up.
4. Use the formula to calculate \(\rho\).

Interpretation of \(\rho\):
- \( \rho = 1 \): Perfect monotonic increasing relationship.
- \( \rho = -1 \): Perfect monotonic decreasing relationship.
- \( \rho = 0 \): No monotonic relationship.

Let's assume you have the following data:



Interpretation:
- If \(\rho\) is close to 1, it indicates a strong monotonic increasing relationship. As the amount of sleep increases, job satisfaction tends to increase.
- If \(\rho\) is close to -1, it indicates a strong monotonic decreasing relationship. As the amount of sleep increases, job satisfaction tends to decrease.
- If \(\rho\) is close to 0, it suggests no clear monotonic relationship.

The p-value is also provided, and a low p-value (typically below a significance level like 0.05) suggests that the correlation is statistically significant.

In [3]:

from scipy.stats import spearmanr

amount_of_sleep = [8, 6, 7, 5, 9]
job_satisfaction = [7, 4, 6, 3, 8]

rho, p_value = spearmanr(amount_of_sleep, job_satisfaction)

print(f"Spearman's Rank Correlation (rho): {rho}")
print(f"P-value: {p_value}")

Spearman's Rank Correlation (rho): 0.9999999999999999
P-value: 1.4042654220543672e-24


## Q(3)

In [4]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

hours_of_exercise = np.random.uniform(1, 10, 50) 
bmi = np.random.normal(25, 5, 50)  


pearson_corr, _ = pearsonr(hours_of_exercise, bmi)


spearman_corr, _ = spearmanr(hours_of_exercise, bmi)

print(f"Pearson Correlation Coefficient: {pearson_corr}")
print(f"Spearman's Rank Correlation: {spearman_corr}")


Pearson Correlation Coefficient: -0.2548349841430553
Spearman's Rank Correlation: -0.29546218487394954


## Q(4)

In [5]:
from scipy.stats import pearsonr
import numpy as np

hours_of_tv = np.random.uniform(1, 5, 50)
physical_activity = np.random.uniform(1, 10, 50) 

pearson_corr, p_value = pearsonr(hours_of_tv, physical_activity)

print(f"Pearson Correlation Coefficient: {pearson_corr}")
print(f"P-value: {p_value}")


Pearson Correlation Coefficient: 0.025961939031553177
P-value: 0.8579648012175253


## Q(6)

In [6]:
from scipy.stats import pearsonr
import numpy as np


sales_calls_per_day = np.random.randint(20, 50, 30)  
sales_per_week = np.random.randint(5, 20, 30) 


pearson_corr, p_value = pearsonr(sales_calls_per_day, sales_per_week)

print(f"Pearson Correlation Coefficient: {pearson_corr}")
print(f"P-value: {p_value}")


Pearson Correlation Coefficient: 0.05508170480619611
P-value: 0.7725087661795392
