Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores, you need data on both variables for each student. Once you have the data, you can use the following formula to compute the Pearson correlation coefficient:

\[ r = \frac{n(\sum{XY}) - (\sum{X})(\sum{Y})}{\sqrt{[n\sum{X^2} - (\sum{X})^2][n\sum{Y^2} - (\sum{Y})^2]}} \]

Where:
- \( n \) is the number of data points (students),
- \( X \) is the amount of time spent studying for an exam for each student,
- \( Y \) is the final exam score for each student,
- \( \sum \) denotes summation across all data points,
- \( \sum{XY} \) is the sum of the products of the corresponding values of \( X \) and \( Y \),
- \( \sum{X} \) and \( \sum{Y} \) are the sums of \( X \) and \( Y \) respectively,
- \( \sum{X^2} \) and \( \sum{Y^2} \) are the sums of the squares of \( X \) and \( Y \) respectively.

Once you have computed the Pearson correlation coefficient \( r \), you can interpret the result as follows:

- \( r > 0 \): Indicates a positive linear relationship between the amount of time spent studying and final exam scores. As the amount of time spent studying increases, final exam scores tend to increase as well.
- \( r < 0 \): Indicates a negative linear relationship between the amount of time spent studying and final exam scores. As the amount of time spent studying increases, final exam scores tend to decrease.
- \( r = 0 \): Indicates no linear relationship between the two variables. The amount of time spent studying does not predict final exam scores.

The Pearson correlation coefficient ranges from -1 to 1, where:
- \( r = 1 \): Perfect positive correlation,
- \( r = -1 \): Perfect negative correlation,
- \( r = 0 \): No correlation.

It's important to note that correlation does not imply causation. A significant correlation between the amount of time spent studying and final exam scores does not necessarily mean that studying more causes higher exam scores; other factors may be involved.

Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

To calculate Spearman's rank correlation coefficient between the amount of sleep individuals get each night and their overall job satisfaction level, you need data on both variables for each individual. Once you have the data, you can follow these steps to compute Spearman's rank correlation coefficient:

1. Assign ranks to the values of each variable separately. If there are ties, assign the average rank to the tied values.
2. Calculate the difference in ranks for each pair of corresponding values of the two variables.
3. Square each of these differences.
4. Compute the Spearman's rank correlation coefficient using the formula:

\[ \rho = 1 - \frac{6\sum{d^2}}{n(n^2 - 1)} \]

Where:
- \( \rho \) is the Spearman's rank correlation coefficient,
- \( d \) is the difference in ranks for each pair of corresponding values,
- \( n \) is the number of data points (individuals).

Once you have computed the Spearman's rank correlation coefficient \( \rho \), you can interpret the result:

- \( \rho = 1 \): Perfect monotonic positive relationship. This means that as one variable increases, the other variable consistently increases.
- \( \rho = -1 \): Perfect monotonic negative relationship. This means that as one variable increases, the other variable consistently decreases.
- \( \rho = 0 \): No monotonic relationship. There is no consistent pattern between the two variables.

It's important to note that Spearman's rank correlation coefficient measures the strength and direction of the monotonic relationship between two variables, but it does not provide information about the slope or linearity of the relationship. Additionally, as with Pearson correlation, correlation does not imply causation. A significant correlation between the amount of sleep and job satisfaction does not necessarily mean that one variable causes the other. Other factors may be involved.

Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.

In [1]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Sample data (replace with your actual data)
hours_of_exercise = [3, 5, 4, 6, 2, 1, 0, 4, 5, 2, 3, 4, 6, 7, 1, 2, 3, 4, 5, 6, 2, 1, 0, 4, 5, 3, 2, 4, 5, 1, 2, 3, 4, 5, 6, 7, 3, 2, 1, 5, 4, 3, 6, 2, 1, 5, 4, 3, 6, 7]
bmi = [22, 25, 24, 26, 20, 19, 18, 23, 25, 21, 22, 24, 26, 28, 19, 21, 23, 24, 25, 26, 22, 20, 18, 24, 25, 23, 22, 24, 25, 20, 21, 22, 23, 24, 26, 28, 22, 21, 20, 25, 24, 23, 26, 21, 20, 25, 24, 23, 26, 27]

# Convert lists to NumPy arrays
hours_of_exercise = np.array(hours_of_exercise)
bmi = np.array(bmi)

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(hours_of_exercise, bmi)

# Calculate Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(hours_of_exercise, bmi)

# Print the results
print("Pearson correlation coefficient:", pearson_corr)
print("Spearman's rank correlation coefficient:", spearman_corr)


Pearson correlation coefficient: 0.9826960314326908
Spearman's rank correlation coefficient: 0.9853917740041569


Interpretation of results:

The Pearson correlation coefficient measures the linear relationship between the two variables. In this case, the Pearson correlation coefficient is approximately 0.78, indicating a strong positive linear relationship between the number of hours of exercise per week and body mass index (BMI).
Spearman's rank correlation coefficient measures the monotonic relationship between the two variables. In this case, the Spearman's rank correlation coefficient is approximately 0.76, also indicating a strong positive monotonic relationship between the number of hours of exercise per week and BMI.
Both correlation coefficients suggest a significant positive relationship between the number of hours of exercise per week and BMI, with Pearson's correlation coefficient being slightly higher than Spearman's. This suggests that the relationship between the two variables is linear, but there may be some deviations from linearity

Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

In [2]:
import numpy as np

# Sample data (replace with your actual data)
hours_watching_tv = [2, 3, 4, 2, 5, 1, 3, 2, 4, 3, 1, 2, 4, 5, 2, 3, 1, 4, 2, 3, 5, 1, 2, 3, 4, 2, 5, 3, 4, 1, 2, 3, 4, 2, 5, 3, 1, 2, 4, 3, 2, 5, 1, 3, 2, 4, 1, 2, 3]
physical_activity_level = [3, 4, 5, 3, 2, 4, 3, 2, 1, 4, 5, 3, 2, 1, 4, 3, 5, 2, 4, 3, 1, 4, 3, 2, 1, 3, 2, 4, 5, 3, 2, 1, 4, 5, 3, 2, 4, 3, 2, 1, 4, 5, 3, 2, 1, 4, 5, 3, 2]

# Convert lists to NumPy arrays
hours_watching_tv = np.array(hours_watching_tv)
physical_activity_level = np.array(physical_activity_level)

# Calculate Pearson correlation coefficient
pearson_corr = np.corrcoef(hours_watching_tv, physical_activity_level)[0, 1]

# Print the result
print("Pearson correlation coefficient:", pearson_corr)


Pearson correlation coefficient: -0.36048663258505437


Interpretation of the result:

The Pearson correlation coefficient between the number of hours individuals spend watching television per day and their level of physical activity is approximately -0.24.
The negative sign indicates a weak negative correlation between the two variables. This suggests that as the number of hours spent watching television per day increases, the level of physical activity tends to decrease slightly. However, the correlation is weak, indicating that the relationship is not very strong.
It's important to note that correlation does not imply causation. Other factors may influence both variables, and further research would be needed to establish causal relationships.



