## Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

To calculate the Pearson correlation coefficient between two variables, in this case, the amount of time students spend studying for an exam and their final exam scores, follow these steps:

Step 1: Organize the data
First, you need to have pairs of data points for each student, with one value representing the time spent studying and the other representing the corresponding final exam score.

Step 2: Compute the means
Calculate the mean (average) of both the time spent studying and the final exam scores.

Step 3: Calculate the covariance
Compute the covariance between the time spent studying (X) and the final exam scores (Y) using the formula:

cov(X, Y) = Σ[(X_i - X̄)(Y_i - Ȳ)] / (n - 1)

Where:
- X_i is each individual value of time spent studying,
- X̄ is the mean of time spent studying,
- Y_i is each individual value of final exam scores,
- Ȳ is the mean of final exam scores, and
- n is the number of data points (students).

Step 4: Calculate the standard deviations
Compute the standard deviation of time spent studying (S_x) and the standard deviation of final exam scores (S_y).

Step 5: Calculate the Pearson correlation coefficient
Now, use the formula for the Pearson correlation coefficient (r):

r = cov(X, Y) / (S_x * S_y)

Interpretation of the result:
The Pearson correlation coefficient (r) ranges from -1 to 1. Here's how to interpret the value of r:

- r ≈ 1: There is a strong positive linear relationship between time spent studying and final exam scores. As students spend more time studying, their exam scores tend to increase.

- r ≈ -1: There is a strong negative linear relationship between time spent studying and final exam scores. As students spend more time studying, their exam scores tend to decrease.

- r ≈ 0: There is little to no linear relationship between time spent studying and final exam scores. Studying time doesn't appear to significantly influence exam scores.

Keep in mind that correlation does not imply causation. Even if there is a strong correlation between the two variables, it doesn't necessarily mean that studying causes higher exam scores or vice versa. There may be other factors at play that need to be considered before making any causal claims.

## Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

To calculate Spearman's rank correlation coefficient between two variables, in this case, the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10, follow these steps:

Step 1: Organize the data
First, you need to have pairs of data points for each individual, with one value representing the amount of sleep and the other representing the corresponding job satisfaction level.

Step 2: Rank the data
For each variable (amount of sleep and job satisfaction level), rank the data from the lowest value to the highest value. If there are ties (i.e., two or more individuals with the same value), assign them the average rank.

Step 3: Calculate the differences in ranks
For each individual, calculate the difference between their ranks in both variables (amount of sleep and job satisfaction).

Step 4: Square the rank differences
Square each of the rank differences calculated in the previous step.

Step 5: Calculate the Spearman's rank correlation coefficient
Use the following formula to calculate Spearman's rank correlation coefficient (ρ):

ρ = 1 - (6 * Σd^2) / (n * (n^2 - 1))

Where:
- Σd^2 is the sum of squared rank differences,
- n is the number of data points (individuals).

Interpretation of the result:
The Spearman's rank correlation coefficient (ρ) ranges from -1 to 1. Here's how to interpret the value of ρ:

- ρ ≈ 1: There is a strong positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. As the amount of sleep increases, job satisfaction tends to increase.

- ρ ≈ -1: There is a strong negative monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. As the amount of sleep decreases, job satisfaction tends to decrease.

- ρ ≈ 0: There is little to no monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. Sleep duration doesn't appear to have a significant influence on job satisfaction.

The Spearman's rank correlation measures the monotonic relationship, which means it assesses whether the variables tend to move in the same direction (either both increase or decrease) without assuming a linear relationship. Similar to Pearson correlation, correlation does not imply causation, so be cautious when interpreting the results and consider other factors that may influence job satisfaction and sleep patterns.

## Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

To calculate the Pearson correlation coefficient and Spearman's rank correlation coefficient between the number of hours of exercise per week and body mass index (BMI) in a sample of 50 participants, follow these steps:

Step 1: Organize the data
You should have pairs of data points for each participant, with one value representing the number of hours of exercise per week and the other representing the corresponding BMI.

Step 2: Compute the means (if needed)
Calculate the mean (average) of both the number of hours of exercise per week and the BMI if you plan to use Pearson correlation.

Step 3: Compute the Pearson correlation coefficient (r)
Use the formula for the Pearson correlation coefficient:

r = Σ[(X_i - X̄)(Y_i - Ȳ)] / √[Σ(X_i - X̄)^2 * Σ(Y_i - Ȳ)^2]

Where:
- X_i is each individual value of hours of exercise,
- X̄ is the mean of hours of exercise (if calculated),
- Y_i is each individual value of BMI,
- Ȳ is the mean of BMI (if calculated).

Step 4: Compute the Spearman's rank correlation coefficient (ρ)
First, rank the data for both variables. Then, use the formula for Spearman's rank correlation coefficient:

ρ = 1 - (6 * Σd^2) / (n * (n^2 - 1))

Where:
- Σd^2 is the sum of squared rank differences,
- n is the number of data points (participants).

Compare the results:
Now that you have both the Pearson correlation coefficient (r) and the Spearman's rank correlation coefficient (ρ), compare the two results.

- If both r and ρ are close to 1, it indicates a strong positive monotonic relationship between exercise hours and BMI. It means that as the number of hours of exercise per week increases, BMI tends to decrease, or vice versa.

- If both r and ρ are close to -1, it indicates a strong negative monotonic relationship. It means that as the number of hours of exercise per week increases, BMI tends to increase, or vice versa.

- If the Pearson correlation coefficient (r) is close to 1 or -1, while the Spearman's rank correlation coefficient (ρ) is significantly lower (closer to 0), it suggests a strong linear relationship between the variables, but the relationship might not be monotonic. In other words, the correlation is strong in terms of linear association, but not in terms of ranks.

- If the Pearson correlation coefficient (r) is close to 0, while the Spearman's rank correlation coefficient (ρ) is significantly different from 0, it indicates that there might be a non-linear monotonic relationship between the variables. In this case, the Pearson correlation does not capture the monotonic trend.

Comparing both correlation coefficients can provide insights into the nature of the relationship between the two variables, whether it is linear or non-linear and monotonic.

## Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

To calculate the Pearson correlation coefficient between the number of hours individuals spend watching television per day and their level of physical activity in a sample of 50 participants, follow these steps:

Step 1: Organize the data
You should have pairs of data points for each participant, with one value representing the number of hours spent watching television per day and the other representing the level of physical activity.

Step 2: Compute the means
Calculate the mean (average) of both the number of hours spent watching television per day and the level of physical activity.

Step 3: Calculate the covariance
Compute the covariance between the number of hours spent watching television per day (X) and the level of physical activity (Y) using the formula:

cov(X, Y) = Σ[(X_i - X̄)(Y_i - Ȳ)] / (n - 1)

Where:
- X_i is each individual value of hours spent watching television,
- X̄ is the mean of hours spent watching television,
- Y_i is each individual value of physical activity level,
- Ȳ is the mean of the physical activity level, and
- n is the number of data points (participants).

Step 4: Calculate the standard deviations
Compute the standard deviation of hours spent watching television (S_x) and the standard deviation of the physical activity level (S_y).

Step 5: Calculate the Pearson correlation coefficient (r)
Now, use the formula for the Pearson correlation coefficient:

r = cov(X, Y) / (S_x * S_y)

This will give you the Pearson correlation coefficient, which measures the linear relationship between the number of hours spent watching television per day and the level of physical activity.

Interpretation of the result:
The Pearson correlation coefficient (r) ranges from -1 to 1. Here's how to interpret the value of r:

- r ≈ 1: There is a strong positive linear relationship between the number of hours spent watching television and the level of physical activity. As individuals spend more time watching TV, their level of physical activity tends to increase.

- r ≈ -1: There is a strong negative linear relationship between the number of hours spent watching television and the level of physical activity. As individuals spend more time watching TV, their level of physical activity tends to decrease.

- r ≈ 0: There is little to no linear relationship between the number of hours spent watching television and the level of physical activity. Watching TV doesn't appear to significantly influence the level of physical activity.

Keep in mind that correlation does not imply causation, and there may be other factors at play that affect the relationship between television watching and physical activity.

## Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

| Age (Years) | Soft Drink Preference |
|------------|----------------------|
| 25         | Coke                 |
| 42         | Pepsi                |
| 37         | Mountain Dew         |
| 19         | Coke                 |
| 31         | Pepsi                |
| 28         | Coke                 |

To calculate the correlation between age and soft drink preference using the given data, we will use the Pearson correlation coefficient formula. However, since one variable (soft drink preference) is categorical and represented by numerical values, and the other variable (age) is continuous, we should use the point-biserial correlation coefficient, which is a variant of the Pearson correlation coefficient for such cases.

Let's proceed with the calculations:

Step 1: Assign numerical values to the soft drink preference:
Coke = 1
Pepsi = 2
Mountain Dew = 3

| Age (Years) | Soft Drink Preference |
|------------|----------------------|
| 25         | 1                    |
| 42         | 2                    |
| 37         | 3                    |
| 19         | 1                    |
| 31         | 2                    |
| 28         | 1                    |

Step 2: Calculate the means of age and soft drink preference:
Mean age = (25 + 42 + 37 + 19 + 31 + 28) / 6 = 182 / 6 = 30.33
Mean soft drink preference = (1 + 2 + 3 + 1 + 2 + 1) / 6 = 10 / 6 = 1.67 (rounded to 2 decimal places)

Step 3: Calculate the sum of products of the differences from the means:
Sum of products = Σ((Age - Mean age) * (Preference - Mean preference))
               = (25 - 30.33) * (1 - 1.67) + (42 - 30.33) * (2 - 1.67) + (37 - 30.33) * (3 - 1.67) + (19 - 30.33) * (1 - 1.67) + (31 - 30.33) * (2 - 1.67) + (28 - 30.33) * (1 - 1.67)
               = (-5.33) * (-0.67) + (11.67) * (0.33) + (6.67) * (1.33) + (-11.33) * (-0.67) + (0.67) * (0.33) + (-2.33) * (-0.67)
               = 3.56

Step 4: Calculate the standard deviation of age and soft drink preference:
Standard deviation of age = √[Σ(Age - Mean age)^2 / (n - 1)]
                          = √[(7.56^2 + 11.67^2 + 6.67^2 + 11.33^2 + 0.67^2 + 2.33^2) / 5]
                          = √[(57.07) / 5]
                          = √[11.41]
                          = 3.38 (rounded to 2 decimal places)

Standard deviation of soft drink preference = √[Σ(Preference - Mean preference)^2 / (n - 1)]
                                           = √[(0.11^2 + 0.11^2 + 1.11^2) / 5]
                                           = √[(1.43) / 5]
                                           = √[0.29]
                                           = 0.54 (rounded to 2 decimal places)

Step 5: Calculate the point-biserial correlation coefficient:

r_pb = Sum of products / (Standard deviation of age * Standard deviation of soft drink preference)
     = 3.56 / (3.38 * 0.54)
     = 3.56 / 1.82
     ≈ 1.96 (rounded to 2 decimal places)

Interpretation of the result:
The point-biserial correlation coefficient (r_pb) is approximately 1.96. The point-biserial correlation indicates a weak positive relationship between age and soft drink preference in this particular survey data. However, since the soft drink preference is a categorical variable represented by numerical values, the interpretation should be taken with caution. Additionally, the small sample size of 6 participants may limit the generalizability of the results. Larger sample sizes and more diverse soft drink options could provide more reliable insights into the relationship between age and soft drink preference.

## Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

To calculate the Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week in a sample of 30 sales representatives, follow these steps:

Step 1: Organize the data
You should have pairs of data points for each sales representative, with one value representing the number of sales calls made per day and the other representing the corresponding number of sales made per week.

Step 2: Compute the means
Calculate the mean (average) of both the number of sales calls made per day and the number of sales made per week.

Step 3: Calculate the covariance
Compute the covariance between the number of sales calls made per day (X) and the number of sales made per week (Y) using the formula:

cov(X, Y) = Σ[(X_i - X̄)(Y_i - Ȳ)] / (n - 1)

Where:
- X_i is each individual value of sales calls made per day,
- X̄ is the mean of sales calls made per day,
- Y_i is each individual value of sales made per week,
- Ȳ is the mean of sales made per week, and
- n is the number of data points (sales representatives).

Step 4: Calculate the standard deviations
Compute the standard deviation of sales calls made per day (S_x) and the standard deviation of sales made per week (S_y).

Step 5: Calculate the Pearson correlation coefficient (r)
Now, use the formula for the Pearson correlation coefficient:

r = cov(X, Y) / (S_x * S_y)

This will give you the Pearson correlation coefficient, which measures the linear relationship between the number of sales calls made per day and the number of sales made per week in the sample of 30 sales representatives.

Interpretation of the result:
The Pearson correlation coefficient (r) ranges from -1 to 1. Here's how to interpret the value of r:

- r ≈ 1: There is a strong positive linear relationship between the number of sales calls made per day and the number of sales made per week. As the number of sales calls increases, the number of sales made tends to increase as well.

- r ≈ -1: There is a strong negative linear relationship between the number of sales calls made per day and the number of sales made per week. As the number of sales calls increases, the number of sales made tends to decrease.

- r ≈ 0: There is little to no linear relationship between the number of sales calls made per day and the number of sales made per week. The two variables do not appear to be strongly related in a linear manner.

Keep in mind that correlation does not imply causation, and there may be other factors at play that affect the relationship between the number of sales calls and the number of sales made. Additionally, with a larger sample size, the correlation estimate is likely to be more reliable and representative of the population of sales representatives.