# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

>To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores, you would need to use a statistical software or a calculator. However, I can provide you with the steps and interpretation of the result.

- Collect the data on the amount of time students spend studying for an exam and their final exam scores.

- Calculate the mean and standard deviation for each variable.

- Calculate the covariance between the two variables using the following formula: Cov(X,Y) = Σ((Xi - X̄)(Yi - Ȳ))/n-1, where Xi and Yi are the individual values, X̄ and Ȳ are the means, and n is the sample size.

- Calculate the correlation coefficient using the following formula: r = Cov(X,Y) / (SD(X) * SD(Y)), where SD(X) and SD(Y) are the standard deviations of X and Y, respectively.

> The resulting value of r will range from -1 to +1. A value of -1 indicates a perfect negative correlation, 0 indicates no correlation, and +1 indicates a perfect positive correlation.

> Interpreting the Pearson correlation coefficient value:

- If r is close to 1, there is a strong positive correlation between the two variables, indicating that as the amount of time spent studying increases, the final exam scores also increase.
- If r is close to -1, there is a strong negative correlation between the two variables, indicating that as the amount of time spent studying increases, the final exam scores decrease.
- If r is close to 0, there is no significant correlation between the two variables.
>It is important to note that correlation does not imply causation, and there may be other factors that contribute to the final exam scores.


# Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

>To calculate the Spearman's rank correlation between the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10, you would need to use a statistical software or a calculator. However, I can provide you with the steps and interpretation of the result.

>Collect the data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10.

>Rank the data for each variable separately from lowest to highest, with the lowest value being assigned a rank of 1 and the highest value being assigned a rank of n, where n is the sample size.

- Calculate the difference between the ranks for each pair of observations.

- Calculate the Spearman's rank correlation coefficient using the following formula: r_s = 1 - (6Σd^2)/(n(n^2-1)), where d is the difference in ranks for each pair of observations and n is the sample size.

>The resulting value of r_s will range from -1 to +1. A value of -1 indicates a perfect negative correlation, 0 indicates no correlation, and +1 indicates a perfect positive correlation.

>Interpreting the Spearman's rank correlation coefficient value:

- If r_s is close to 1, there is a strong positive monotonic correlation between the two variables, indicating that as the amount of sleep individuals get each night increases, their overall job satisfaction level also increases.
- If r_s is close to -1, there is a strong negative monotonic correlation between the two variables, indicating that as the amount of sleep individuals get each night increases, their overall job satisfaction level decreases.
- If r_s is close to 0, there is no significant monotonic correlation between the two variables.
>It is important to note that while the Spearman's rank correlation coefficient measures the monotonic relationship between two variables, it does not indicate causation, and there may be other factors that contribute to an individual's job satisfaction level.


# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

To calculate the Pearson correlation coefficient and the Spearman's rank correlation coefficient, you will need to follow these steps:

Step 1: Organize the data

Organize the data into two columns: one for the number of hours of exercise per week and one for the corresponding BMI values for each participant.

Step 2: Calculate the Pearson correlation coefficient

To calculate the Pearson correlation coefficient, you will need to use the following formula:

r = (Σ(x - x̄)(y - ȳ)) / (√Σ(x - x̄)²Σ(y - ȳ)²)

where:
x̄ is the mean of the number of hours of exercise per week
ȳ is the mean of the BMI values
x is the number of hours of exercise per week for each participant
y is the BMI value for each participant

Using this formula, you can calculate the Pearson correlation coefficient for your data set. The Pearson correlation coefficient ranges from -1 to 1, with values closer to -1 indicating a strong negative correlation, values closer to 1 indicating a strong positive correlation, and values closer to 0 indicating no correlation.

Step 3: Calculate the Spearman's rank correlation coefficient

To calculate the Spearman's rank correlation coefficient, you will need to use the following formula:

ρ = 1 - (6∑d²) / (n(n² - 1))

where:
d is the difference between the ranks of x and y for each participant
n is the number of participants

Using this formula, you can calculate the Spearman's rank correlation coefficient for your data set. The Spearman's rank correlation coefficient ranges from -1 to 1, with values closer to -1 indicating a strong negative correlation, values closer to 1 indicating a strong positive correlation, and values closer to 0 indicating no correlation.

Step 4: Compare the results

Once you have calculated both coefficients, you can compare the results. If the Pearson correlation coefficient and the Spearman's rank correlation coefficient are both close to 1 or -1, this indicates a strong correlation between the two variables. If the Pearson correlation coefficient is close to 0 and the Spearman's rank correlation coefficient is not close to 1 or -1, this indicates a weak or no correlation between the two variables. If the Pearson correlation coefficient is close to 1 or -1 but the Spearman's rank correlation coefficient is not, this may indicate that there are outliers or nonlinear relationships in the data that are affecting the Pearson coefficient.

Note that the Spearman's rank correlation coefficient is less sensitive to outliers and nonlinear relationships than the Pearson correlation coefficient, so it may be a better choice if you suspect that your data may have these characteristic.


In [1]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Create sample data
hours_of_exercise = np.array([4, 6, 3, 5, 2, 1, 7, 5, 4, 6, 2, 1, 3, 5, 7, 2, 1, 6, 4, 3, 
                              5, 2, 1, 6, 4, 3, 5, 2, 1, 7, 5, 4, 6, 2, 1, 3, 5, 7, 2, 1, 
                              6, 4, 3, 5, 2, 1, 6, 4, 3, 5, 2])
bmi = np.array([23.3, 24.1, 22.8, 23.9, 21.7, 20.5, 25.4, 24.1, 23.1, 24.6, 21.2, 20.4, 
                22.9, 24.5, 26.1, 21.6, 20.5, 24.7, 23.6, 22.4, 24.3, 21.8, 20.6, 24.8, 
                23.7, 22.5, 24.2, 21.9, 20.7, 25.7, 24.3, 23.3, 25.1, 21.3, 20.5, 22.8, 
                24.4, 26.2, 21.7, 20.6, 24.9, 23.8, 22.6, 24.4, 22.0, 20.8, 25.5, 24.1, 
                23.2, 24.9, 22.1])

# Calculate Pearson correlation coefficient
pearson_coef, p_value = pearsonr(hours_of_exercise, bmi)
print(f"Pearson correlation coefficient: {pearson_coef:.3f}")

# Calculate Spearman's rank correlation coefficient
spearman_coef, p_value = spearmanr(hours_of_exercise, bmi)
print(f"Spearman's rank correlation coefficient: {spearman_coef:.3f}")


Pearson correlation coefficient: 0.979
Spearman's rank correlation coefficient: 0.978


# Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [10]:
import numpy as np
from scipy.stats import pearsonr

# Create sample data
tv_hours = np.array([2, 3, 1, 4, 2, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
                     1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
                     1, 2, 3, 4, 1, 2, 3, 4])
physical_activity = np.array([3, 4, 2, 4, 3, 2, 4, 3, 2, 3, 4, 3, 2, 3, 4, 3, 2, 3, 4, 3, 
                              2, 3, 4, 3, 2, 3, 4, 3, 2, 3, 4, 3, 2, 3, 4, 3, 2, 3, 4, 3, 
                              2, 3, 4, 3, 2, 3, 4, 3])

# Calculate Pearson correlation coefficient
pearson_coef, p_value = pearsonr(tv_hours, physical_activity)
print(f"Pearson correlation coefficient: {pearson_coef:.3f}")



Pearson correlation coefficient: 0.643


# Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

In [11]:
import pandas as pd
data = {
    'Age (Years)': [25, 42, 37, 19, 31, 28],
    'Soft drink Preference': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']
}

df = pd.DataFrame(data)
df



Unnamed: 0,Age (Years),Soft drink Preference
0,25,Coke
1,42,Pepsi
2,37,Mountain Dew
3,19,Coke
4,31,Pepsi
5,28,Coke


In [16]:
df_new = pd.crosstab(df['Age (Years)'], df['Soft drink Preference'])
df_new

Soft drink Preference,Coke,Mountain Dew,Pepsi
Age (Years),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
19,1,0,0
25,1,0,0
28,1,0,0
31,0,0,1
37,0,1,0
42,0,0,1


In [17]:
df_new.corr(method='pearson')

Soft drink Preference,Coke,Mountain Dew,Pepsi
Soft drink Preference,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Coke,1.0,-0.447214,-0.707107
Mountain Dew,-0.447214,1.0,-0.316228
Pepsi,-0.707107,-0.316228,1.0


In [18]:
df_new.corr(method='spearman')


Soft drink Preference,Coke,Mountain Dew,Pepsi
Soft drink Preference,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Coke,1.0,-0.447214,-0.707107
Mountain Dew,-0.447214,1.0,-0.316228
Pepsi,-0.707107,-0.316228,1.0


# Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [21]:
import pandas as pd

data = {
    'Sales Calls per Day': [20, 25, 30, 15, 18, 23, 28, 12, 17, 22, 27, 10, 14, 19, 24, 29, 13, 16, 21, 26, 9, 11, 31, 32, 33, 34, 35, 36, 37, 38],
    'Sales per Week': [5, 7, 10, 3, 4, 6, 9, 2, 3, 5, 8, 1, 2, 4, 6, 8, 1, 2, 4, 7, 1, 1, 11, 12, 13, 14, 15, 16, 17, 18]
}

df = pd.DataFrame(data)

pearson_corr = df['Sales Calls per Day'].corr(df['Sales per Week'], method='pearson')

print("Pearson correlation coefficient:", pearson_corr)


Pearson correlation coefficient: 0.9696873717477733
