In [None]:
#question 1

To calculate the Pearson correlation coefficient (r) between two variables, you can use the following formula:

\[ r = \frac{\sum{(X_i - \bar{X})(Y_i - \bar{Y})}}{\sqrt{\sum{(X_i - \bar{X})^2} \cdot \sum{(Y_i - \bar{Y})^2}}} \]

Where:
- \(X_i\) and \(Y_i\) are individual data points.
- \(\bar{X}\) and \(\bar{Y}\) are the means of the X and Y variables, respectively.

The Pearson correlation coefficient ranges from -1 to 1:

- \(r = 1\) implies a perfect positive linear relationship.
- \(r = -1\) implies a perfect negative linear relationship.
- \(r = 0\) implies no linear relationship.

Now, let's interpret the result:

- If \(r > 0\): There is a positive correlation, meaning that as the amount of time students spend studying increases, their final exam scores tend to increase.
  
- If \(r < 0\): There is a negative correlation, meaning that as the amount of time students spend studying increases, their final exam scores tend to decrease.

- If \(r = 0\): There is no linear correlation between the two variables.

The strength of the correlation is determined by the absolute value of \(r\):

- \(|r| \approx 1\): Strong correlation.
- \(|r| \approx 0\): Weak or no correlation.

It's important to note that correlation does not imply causation. Even if there is a correlation between study time and exam scores, it doesn't necessarily mean that studying more causes higher scores; there could be other factors at play.

In [None]:
#question 2

Spearman's rank correlation coefficient (\(\rho\)) is used to assess the strength and direction of the monotonic relationship between two variables. Unlike the Pearson correlation coefficient, Spearman's correlation does not assume a linear relationship; it focuses on whether the variables tend to increase or decrease together, regardless of the exact mathematical relationship.

To calculate Spearman's rank correlation, follow these steps:

1. Rank the data for each variable separately.
2. Calculate the differences in ranks (\(d_i\)) for each pair of corresponding data points.
3. Square each difference (\(d_i^2\)).
4. Sum up the squared differences.
5. Use the formula:

\[ \rho = 1 - \frac{6 \sum{d_i^2}}{n(n^2 - 1)} \]

Where \(n\) is the number of data points.

Interpretation:

- If \(\rho = 1\): Perfect monotonic positive relationship (as one variable increases, the other also increases).
  
- If \(\rho = -1\): Perfect monotonic negative relationship (as one variable increases, the other decreases).

- If \(\rho = 0\): No monotonic relationship.

The strength of the correlation is determined by the absolute value of \(\rho\):

- \(|\rho| \approx 1\): Strong monotonic correlation.
- \(|\rho| \approx 0\): Weak or no monotonic correlation.

In your case, if the Spearman's rank correlation coefficient is significantly different from zero, it would suggest a monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. Positive \(\rho\) would imply that as the amount of sleep increases, job satisfaction tends to increase, while negative \(\rho\) would imply the opposite. If \(\rho\) is close to zero, there may not be a strong monotonic relationship between the two variables.

In [3]:
#question 3

import numpy as np
from scipy.stats import pearsonr, spearmanr

# Sample data (replace with your actual data)
hours_of_exercise = np.array([3, 5, 2, 4, 6, 1, 3, 2, 5, 4, 3, 1, 6, 2, 4, 5, 1, 3, 6, 4, 2, 5, 1, 3, 4, 6, 2, 5, 1, 3, 4, 6, 2, 5, 1, 3, 4, 6, 2, 5, 1, 3, 4, 2, 5, 1, 3, 4, 6])

bmi = np.array([22, 25, 20, 24, 26, 18, 22, 19, 25, 23, 21, 17, 27, 20, 24, 26, 18, 22, 28, 24, 20, 27, 17, 23, 25, 28, 19, 26, 17, 23, 25, 28, 20, 27, 17, 23, 25, 28, 19, 26, 17, 23, 25, 20, 27, 18, 23, 25, 28, 19, 26])

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(hours_of_exercise, bmi)

# Calculate Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(hours_of_exercise, bmi)

print(f"Pearson correlation coefficient: {pearson_corr:.4f}")
print(f"Spearman's rank correlation coefficient: {spearman_corr:.4f}")


ValueError: x and y must have the same length.

In [2]:
#question 4

import numpy as np
from scipy.stats import pearsonr

# Sample data (replace with your actual data)
hours_watching_tv = np.array([2, 1, 3, 4, 2, 3, 1, 4, 2, 3, 1, 4, 2, 3, 1, 4, 2, 3, 1, 4, 2, 3, 1, 4, 2, 3, 1, 4, 2, 3, 1, 4, 2, 3, 1, 4, 2, 3, 1, 4, 2, 3, 1, 4, 2, 3, 1, 4])
physical_activity = np.array([5, 8, 3, 2, 6, 4, 7, 2, 5, 3, 8, 2, 5, 3, 7, 2, 6, 4, 8, 2, 5, 3, 7, 2, 6, 4, 8, 2, 5, 3, 7, 2, 6, 4, 8, 2, 5, 3, 7, 2, 6, 4, 8, 2, 5, 3, 7, 2, 6])

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(hours_watching_tv, physical_activity)

print(f"Pearson correlation coefficient: {pearson_corr:.4f}")


ValueError: x and y must have the same length.

In [1]:
#question 5


import numpy as np
from scipy.stats import pearsonr

# Sample data (replace with your actual data)
sales_calls_per_day = np.array([10, 12, 8, 15, 11, 9, 13, 14, 10, 12, 8, 15, 11, 9, 13, 14, 10, 12, 8, 15, 11, 9, 13, 14, 10, 12, 8, 15, 11, 9])
sales_per_week = np.array([30, 35, 25, 40, 33, 28, 37, 39, 31, 36, 27, 42, 34, 29, 38, 41, 32, 37, 26, 43, 35, 30, 39, 42, 33, 38, 28, 44, 36, 31])

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(sales_calls_per_day, sales_per_week)

print(f"Pearson correlation coefficient: {pearson_corr:.4f}")


Pearson correlation coefficient: 0.9722
