# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.


![image.png](attachment:image.png)

In [1]:
import numpy as np

# Example dataset (replace with actual data)
time_spent_studying = [10, 20, 15, 25, 30]  # Time spent studying in hours
exam_scores = [70, 80, 75, 85, 90]  # Final exam scores

# Calculate means
mean_time_spent = np.mean(time_spent_studying)
mean_exam_scores = np.mean(exam_scores)

# Calculate Pearson correlation coefficient
numerator = np.sum((np.array(time_spent_studying) - mean_time_spent) * (np.array(exam_scores) - mean_exam_scores))
denominator = np.sqrt(np.sum((np.array(time_spent_studying) - mean_time_spent) ** 2) * np.sum((np.array(exam_scores) - mean_exam_scores) ** 2))
correlation_coefficient = numerator / denominator

print("Pearson correlation coefficient:", correlation_coefficient)

Pearson correlation coefficient: 1.0


Interpretation of result:
- The Pearson correlation coefficient \( r \) ranges from -1 to 1.
- A value of 1 indicates a perfect positive linear relationship, where higher values of one variable are associated with higher values of the other variable.
- A value of -1 indicates a perfect negative linear relationship, where higher values of one variable are associated with lower values of the other variable.
- A value of 0 indicates no linear relationship between the variables.
- In the context of the example:
  - If the correlation coefficient is close to 1, it suggests that there is a strong positive linear relationship between the amount of time spent studying and the final exam scores. Students who spend more time studying tend to achieve higher exam scores.
  - If the correlation coefficient is close to -1, it suggests that there is a strong negative linear relationship, indicating that students who spend more time studying tend to achieve lower exam scores.
  - If the correlation coefficient is close to 0, it suggests that there is no linear relationship between the two variables, and the amount of time spent studying does not significantly impact the final exam scores.

# Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.


To calculate Spearman's rank correlation coefficient between the amount of sleep individuals get each night and their overall job satisfaction level, follow these steps:

1. Rank the values of each variable separately.
2. Calculate the difference between the ranks for each pair of observations.
3. Square the differences and sum them to get the sum of squared differences.
4. Use the formula to calculate Spearman's rank correlation coefficient.

Here's how you can calculate it using Python with NumPy:

In [2]:
import numpy as np
from scipy.stats import spearmanr

# Example dataset (replace with actual data)
amount_of_sleep = [7, 8, 6, 5, 7]  # Amount of sleep each night
job_satisfaction = [8, 9, 6, 5, 7]  # Job satisfaction level (on a scale of 1 to 10)

# Rank the values of each variable
rank_sleep = np.argsort(np.argsort(amount_of_sleep))
rank_satisfaction = np.argsort(np.argsort(job_satisfaction))

# Calculate Spearman's rank correlation coefficient
correlation_coefficient, p_value = spearmanr(rank_sleep, rank_satisfaction)

print("Spearman's rank correlation coefficient:", correlation_coefficient)

Spearman's rank correlation coefficient: 0.8999999999999998


Interpretation of result:
- ![image.png](attachment:image.png)
- A value of 1 indicates a perfect positive monotonic relationship, where higher values of one variable are associated with higher ranks of the other variable.
- A value of -1 indicates a perfect negative monotonic relationship, where higher values of one variable are associated with lower ranks of the other variable.
- A value of 0 indicates no monotonic relationship between the variables.
- In the context of the example:
  - If the correlation coefficient is close to 1, it suggests that there is a strong positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. Individuals who get more sleep tend to have higher job satisfaction levels.
  - If the correlation coefficient is close to -1, it suggests that there is a strong negative monotonic relationship, indicating that individuals who get less sleep tend to have higher job satisfaction levels.
  - If the correlation coefficient is close to 0, it suggests that there is no monotonic relationship between the two variables, and the amount of sleep individuals get does not significantly impact their overall job satisfaction level.

# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.


To calculate both the Pearson correlation coefficient and Spearman's rank correlation coefficient between the number of hours of exercise per week and body mass index (BMI) in a sample of adults, you can follow these steps:

1. Collect data for both variables (number of hours of exercise per week and BMI) for 50 participants.
2. Calculate the Pearson correlation coefficient using the formula for Pearson correlation.
3. Calculate the Spearman's rank correlation coefficient by first ranking the values of each variable and then using the formula for Spearman's rank correlation.

Here's how you can calculate both coefficients using Python with NumPy and SciPy:

In [5]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Example dataset (replace with actual data)
hours_of_exercise = [3, 4, 5, 2, 3, 6, 7, 4, 5, 2, 3, 4, 5, 6, 7, 3, 4, 5, 2, 3, 4, 5, 6, 7, 3,
                     4, 5, 2, 3, 4, 5, 6, 7, 3, 4, 5, 2, 3, 4, 5, 6, 7, 3, 4, 5, 2, 3, 4, 5]  # Hours of exercise per week
bmi = [25, 27, 30, 23, 26, 29, 32, 28, 31, 24, 26, 29, 31, 33, 34, 27, 29, 31, 24, 26, 29, 31, 33, 34, 27,
       29, 31, 24, 26, 29, 31, 33, 34, 27, 29, 31, 24, 26, 29, 31, 33, 34, 27, 29, 31, 24, 26, 29, 31, 33]  # BMI

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(hours_of_exercise, bmi)

# Calculate Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(hours_of_exercise, bmi)

print("Pearson correlation coefficient:", pearson_corr)
print("Spearman's rank correlation coefficient:", spearman_corr)

ValueError: x and y must have the same length.

Interpretation:
- Pearson correlation coefficient measures the strength and direction of the linear relationship between two variables.
- Spearman's rank correlation coefficient measures the monotonic relationship between two variables, which is based on the ranks of the data points rather than their actual values.
- If both coefficients are close to 1, it indicates a strong positive correlation between the number of hours of exercise per week and BMI.
- If both coefficients are close to -1, it indicates a strong negative correlation.
- If both coefficients are close to 0, it indicates no correlation between the variables.
- If the Pearson correlation coefficient is higher than the Spearman's rank correlation coefficient, it suggests that the relationship between the variables is more linear. If the Spearman's rank correlation coefficient is higher, it suggests that the relationship is more monotonic.

# Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.


To calculate the Pearson correlation coefficient between the number of hours individuals spend watching television per day and their level of physical activity, you'll need to have the data for both variables for the 50 participants. Once you have the data, you can use Python with NumPy to compute the Pearson correlation coefficient.

Here's how you can calculate it:

In [6]:
import numpy as np

# Example dataset (replace with actual data)
hours_of_tv = [3, 4, 5, 2, 3, 6, 7, 4, 5, 2, 3, 4, 5, 6, 7, 3, 4, 5, 2, 3, 4, 5, 6, 7, 3,
               4, 5, 2, 3, 4, 5, 6, 7, 3, 4, 5, 2, 3, 4, 5, 6, 7, 3, 4, 5, 2, 3, 4, 5]  # Hours of TV per day
physical_activity_level = [2, 3, 4, 5, 2, 3, 4, 5, 6, 7, 3, 4, 5, 2, 3, 4, 5, 6, 7, 3, 4, 5, 2, 3, 4, 5,
                          6, 7, 3, 4, 5, 2, 3, 4, 5, 6, 7, 3, 4, 5, 2, 3, 4, 5, 6, 7, 3, 4, 5]  # Physical activity level

# Calculate Pearson correlation coefficient
pearson_corr = np.corrcoef(hours_of_tv, physical_activity_level)[0, 1]

print("Pearson correlation coefficient:", pearson_corr)

Pearson correlation coefficient: -0.3853889943074005


Interpretation:
- The Pearson correlation coefficient ranges from -1 to 1.
- A value of 1 indicates a perfect positive linear relationship, where higher values of one variable are associated with higher values of the other variable.
- A value of -1 indicates a perfect negative linear relationship, where higher values of one variable are associated with lower values of the other variable.
- A value of 0 indicates no linear relationship between the variables.
- In the context of the example, the Pearson correlation coefficient will provide a measure of the strength and direction of the linear relationship between the number of hours individuals spend watching television per day and their level of physical activity.

# Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

![image.png](attachment:image.png)

# Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.


To calculate the Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week, you'll need to have the data for both variables for the 30 sales representatives. Once you have the data, you can use Python with NumPy to compute the Pearson correlation coefficient.

Here's how you can calculate it:

In [7]:
import numpy as np

# Example dataset (replace with actual data)
sales_calls_per_day = [10, 12, 15, 8, 11, 14, 9, 13, 16, 10, 12, 15, 8, 11, 14, 9, 13, 16,
                       10, 12, 15, 8, 11, 14, 9, 13, 16, 10, 12, 15]  # Sales calls made per day
sales_per_week = [50, 60, 75, 40, 55, 70, 45, 65, 80, 50, 60, 75, 40, 55, 70, 45, 65, 80,
                  50, 60, 75, 40, 55, 70, 45, 65, 80, 50, 60, 75]  # Sales made per week

# Calculate Pearson correlation coefficient
pearson_corr = np.corrcoef(sales_calls_per_day, sales_per_week)[0, 1]

print("Pearson correlation coefficient:", pearson_corr)

Pearson correlation coefficient: 1.0


Interpretation:
- The Pearson correlation coefficient ranges from -1 to 1.
- A value of 1 indicates a perfect positive linear relationship, where higher values of one variable are associated with higher values of the other variable.
- A value of -1 indicates a perfect negative linear relationship, where higher values of one variable are associated with lower values of the other variable.
- A value of 0 indicates no linear relationship between the variables.
- In the context of the example, the Pearson correlation coefficient will provide a measure of the strength and direction of the linear relationship between the number of sales calls made per day and the number of sales made per week by the sales representatives.