### Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

Ans. To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores, you can use the following formula:

\[ r = \frac{{\sum (x_i - \bar{x})(y_i - \bar{y})}}{{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}} \]

Where:
- \( r \) is the Pearson correlation coefficient.
- \( x_i \) and \( y_i \) are the individual data points for the two variables.
- \( \bar{x} \) and \( \bar{y} \) are the means of the two variables.

Here's an example Python code using NumPy to calculate the Pearson correlation coefficient:

```python
import numpy as np

# Sample data for demonstration purposes
study_time = [5, 8, 10, 3, 7]  # Hours spent studying
exam_scores = [70, 85, 90, 60, 80]  # Final exam scores

# Calculate the means of the variables
mean_study_time = np.mean(study_time)
mean_exam_scores = np.mean(exam_scores)

# Calculate the numerator of the correlation coefficient
numerator = np.sum((np.array(study_time) - mean_study_time) * (np.array(exam_scores) - mean_exam_scores))

# Calculate the denominators of the correlation coefficient
denominator_study = np.sqrt(np.sum((np.array(study_time) - mean_study_time) ** 2))
denominator_scores = np.sqrt(np.sum((np.array(exam_scores) - mean_exam_scores) ** 2))

# Calculate the correlation coefficient
pearson_corr_coefficient = numerator / (denominator_study * denominator_scores)

print("Pearson Correlation Coefficient:", pearson_corr_coefficient)
```

**Interpretation of Results:**

- The Pearson correlation coefficient \( r \) ranges from -1 to 1.
- A correlation coefficient of 1 indicates a perfect positive linear relationship, meaning that as one variable increases, the other variable increases proportionally.
- A correlation coefficient of -1 indicates a perfect negative linear relationship, meaning that as one variable increases, the other variable decreases proportionally.
- A correlation coefficient of 0 indicates no linear relationship between the variables.
- In this example, if the Pearson correlation coefficient is positive and close to 1, it suggests a strong positive linear relationship between the amount of time students spend studying and their final exam scores. Conversely, if it is negative and close to -1, it suggests a strong negative linear relationship. If it is close to 0, there is little to no linear relationship between the two variables.

### Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

Ans. To calculate Spearman's rank correlation coefficient between the amount of sleep individuals get each night and their overall job satisfaction level, you can use the following steps:

1. Rank the data for each variable separately, from lowest to highest.
2. Assign ranks to each data point, with the lowest value receiving a rank of 1, the next lowest a rank of 2, and so on.
3. Calculate the difference between the ranks for each pair of corresponding data points.
4. Square each of these differences.
5. Calculate Spearman's rank correlation coefficient using the formula:

\[ \rho = 1 - \frac{{6 \sum d_i^2}}{{n(n^2 - 1)}} \]

Where:
- \( \rho \) is the Spearman's rank correlation coefficient.
- \( d_i \) is the difference between the ranks of corresponding data points.
- \( n \) is the number of data points.

Here's how you can calculate Spearman's rank correlation coefficient using Python:

```python
import numpy as np

# Sample data for demonstration purposes
sleep_hours = [7, 6, 8, 5, 6]
job_satisfaction = [8, 6, 9, 4, 7]

# Calculate the ranks for each variable
sleep_ranks = np.argsort(np.argsort(sleep_hours)) + 1
job_satisfaction_ranks = np.argsort(np.argsort(job_satisfaction)) + 1

# Calculate the differences between ranks
rank_diff = sleep_ranks - job_satisfaction_ranks

# Calculate the squared differences
squared_diff = rank_diff ** 2

# Calculate the Spearman's rank correlation coefficient
n = len(sleep_hours)
spearman_corr_coefficient = 1 - (6 * np.sum(squared_diff)) / (n * (n**2 - 1))

print("Spearman's Rank Correlation Coefficient:", spearman_corr_coefficient)
```

**Interpretation of Results:**

- Spearman's rank correlation coefficient \( \rho \) ranges from -1 to 1.
- A correlation coefficient of 1 indicates a perfect monotonic relationship, meaning that as one variable increases, the other variable increases monotonically.
- A correlation coefficient of -1 indicates a perfect inverse monotonic relationship, meaning that as one variable increases, the other variable decreases monotonically.
- A correlation coefficient of 0 indicates no monotonic relationship between the variables.
- In this example, if the Spearman's rank correlation coefficient is positive and close to 1, it suggests a strong positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. Conversely, if it is negative and close to -1, it suggests a strong negative monotonic relationship. If it is close to 0, there is little to no monotonic relationship between the two variables.

### Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [1]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Sample data for demonstration purposes
exercise_hours = [3, 5, 2, 4, 6, 1, 5, 2, 3, 4,
                  2, 5, 3, 6, 4, 2, 5, 3, 4, 1,
                  3, 5, 2, 4, 6, 1, 5, 2, 3, 4,
                  2, 5, 3, 6, 4, 2, 5, 3, 4, 1,
                  3, 5, 2, 4, 6, 1, 5, 2, 3, 4]

bmi = [24, 26, 22, 28, 30, 20, 25, 21, 23, 27,
       22, 25, 23, 29, 26, 21, 24, 23, 27, 19,
       24, 26, 22, 28, 30, 20, 25, 21, 23, 27,
       22, 25, 23, 29, 26, 21, 24, 23, 27, 19,
       24, 26, 22, 28, 30, 20, 25, 21, 23, 27]

# Calculate Pearson correlation coefficient
pearson_corr_coefficient, _ = pearsonr(exercise_hours, bmi)

# Calculate Spearman's rank correlation coefficient
spearman_corr_coefficient, _ = spearmanr(exercise_hours, bmi)

print("Pearson Correlation Coefficient:", pearson_corr_coefficient)
print("Spearman's Rank Correlation Coefficient:", spearman_corr_coefficient)

Pearson Correlation Coefficient: 0.8890552073277832
Spearman's Rank Correlation Coefficient: 0.8864776381536708


### Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [3]:
import numpy as np
from scipy.stats import pearsonr

# Sample data for demonstration purposes
tv_hours_per_day = [3, 2, 4, 1, 5, 3, 2, 4, 1, 5,
                    3, 2, 4, 1, 5, 3, 2, 4, 1, 5,
                    3, 2, 4, 1, 5, 3, 2, 4, 1, 5,
                    3, 2, 4, 1, 5, 3, 2, 4, 1, 5,
                    3, 2, 4, 1, 5, 3, 2, 4, 1, 5]

physical_activity_level = [1, 3, 2, 4, 1, 3, 2, 4, 1, 3,
                           2, 4, 1, 3, 2, 4, 1, 3, 2, 4,
                           1, 3, 2, 4, 1, 3, 2, 4, 1, 3,
                           2, 4, 1, 3, 2, 4, 1, 3, 2, 4,
                           1, 3, 2, 4, 1, 3, 2, 4, 1, 3]

# Calculate Pearson correlation coefficient
pearson_corr_coefficient, _ = pearsonr(tv_hours_per_day, physical_activity_level)

print("Pearson Correlation Coefficient:", pearson_corr_coefficient)

Pearson Correlation Coefficient: -0.012651134984231444


### Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

In [5]:
import pandas as pd
from scipy.stats import pearsonr, spearmanr

# Create a DataFrame with the provided data
data = {
    'Age (Years)': [25, 42, 37, 19, 31, 28],
    'Soft drink Preference': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']
}

df = pd.DataFrame(data)

# Map soft drink preferences to numerical values
preference_mapping = {'Coke': 0, 'Pepsi': 1, 'Mountain Dew': 2}
df['Preference_Numeric'] = df['Soft drink Preference'].map(preference_mapping)

# Calculate Pearson correlation coefficient
pearson_corr_coefficient, _ = pearsonr(df['Age (Years)'], df['Preference_Numeric'])

# Calculate Spearman's rank correlation coefficient
spearman_corr_coefficient, _ = spearmanr(df['Age (Years)'], df['Preference_Numeric'])

print("Pearson Correlation Coefficient:", pearson_corr_coefficient)
print("Spearman's Rank Correlation Coefficient:", spearman_corr_coefficient)

Pearson Correlation Coefficient: 0.7587035441865055
Spearman's Rank Correlation Coefficient: 0.8332380897952965


### Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [6]:
import numpy as np
from scipy.stats import pearsonr

# Sample data for demonstration purposes
sales_calls_per_day = [20, 25, 30, 15, 18, 22, 28, 17, 19, 21,
                       23, 26, 29, 16, 24, 27, 31, 14, 18, 20,
                       25, 30, 15, 18, 22, 28, 17, 19, 21, 23]

sales_per_week = [100, 120, 140, 80, 90, 110, 130, 85, 95, 105,
                   115, 125, 135, 75, 105, 125, 145, 70, 90, 100,
                   120, 140, 80, 90, 110, 130, 85, 95, 105, 115]

# Calculate Pearson correlation coefficient
pearson_corr_coefficient, _ = pearsonr(sales_calls_per_day, sales_per_week)

print("Pearson Correlation Coefficient:", pearson_corr_coefficient)

Pearson Correlation Coefficient: 0.9896157270271363
