In [None]:
Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

Ans. To calculate the Pearson correlation coefficient, we need to have pairs of observations for both variables.
Assuming we have collected data on the amount of time students spend studying for an exam (X) and their final exam scores
(Y) for n students, we can calculate the Pearson correlation coefficient as follows:

Calculate the mean of X and Y:

$ \bar{X} = \frac{1}{n} \sum_{i=1}^n X_i $

$ \bar{Y} = \frac{1}{n} \sum_{i=1}^n Y_i $

Calculate the standard deviation of X and Y:

$ S_X = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2} $

$ S_Y = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar{Y})^2} $

Calculate the covariance between X and Y:

$ Cov(X,Y) = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X}) (Y_i - \bar{Y}) $

Calculate the Pearson correlation coefficient:

$ r = \frac{Cov(X,Y)}{S_X S_Y} $

The Pearson correlation coefficient (r) ranges from -1 to 1, where 1 indicates a perfect positive linear relationship, 
0 indicates no linear relationship, and -1 indicates a perfect negative linear relationship. A value of r close to 0 indicates
that there is no linear relationship between the two variables.

For example, if we calculate the Pearson correlation coefficient between the amount of time students spend studying for 
an exam and their final exam scores and obtain a value of 0.7, we can interpret this as a strong positive linear relationship
between the two variables. This means that as the amount of time students spend studying increases, their final exam scores
tend to increase as well.

In [None]:
Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

Ans. To calculate the Spearman's rank correlation coefficient between the amount of sleep individuals get
each night and their overall job satisfaction level, we need to rank both variables and then calculate the Pearson 
correlation coefficient on the ranked data. Here's how we can do it:

Rank the amount of sleep and the job satisfaction level separately, from lowest to highest.
Calculate the difference between the ranks for each individual.
Square the difference for each individual and sum the values.
Use the formula for Spearman's rank correlation coefficient:
rho = 1 - (6 * sum of squared differences) / (n * (n^2 - 1))

where n is the number of data points.

Here's an example calculation using a sample dataset:

Amount of Sleep	Job Satisfaction
7	5
6	2
8	7
7	6
5	3
6	4
Ranking the amount of sleep:

Amount of Sleep	Rank
5	1
6	2.5
7	4
8	5
Ranking the job satisfaction level:

Job Satisfaction	Rank
2	1
3	2
4	3
5	4.5
6	6
7	7
Calculating the difference between the ranks:

Amount of Sleep	Rank	Job Satisfaction	Rank	Difference
7	4	5	4.5	0.5
6	2.5	2	1	1.5
8	5	7	7	0
7	4	6	6	0
5	1	3	2	1
6	2.5	4	3	0.5
Summing the squared differences:

1^2 + 1.5^2 + 0^2 + 0^2 + 1^2 + 0.5^2 = 4.5

Using the formula for Spearman's rank correlation coefficient:

rho = 1 - (6 * sum of squared differences) / (n * (n^2 - 1))
= 1 - (6 * 4.5) / (6 * 35)
= 0.46

Interpreting the result: The Spearman's rank correlation coefficient between the amount of sleep individuals 
get each night and their overall job satisfaction level is 0.46, which indicates a moderate positive monotonic
relationship between the two variables. This means that as the amount of sleep increases, the overall job satisfaction
level tends to increase as well, and vice versa.

In [None]:
Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.

In [1]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# example data
hours_exercise = [4, 6, 2, 3, 5, 2, 1, 3, 7, 5, 6, 4, 2, 3, 4, 5, 2, 1, 6, 7, 3, 4, 5, 6, 7, 2, 3, 4, 1, 6, 5, 4, 3, 2, 1, 7, 5, 6, 4, 2, 3, 5, 6, 7, 1, 3, 5, 2, 4, 6, 5]
bmi = [22.5, 26.2, 28.1, 29.3, 24.9, 31.2, 33.5, 27.6, 23.8, 25.7, 27.9, 30.1, 26.5, 28.2, 29.8, 24.6, 31.5, 33.7, 27.1, 23.4, 25.9, 27.3, 29.4, 24.7, 31.3, 33.9, 26.7, 28.3, 29.1, 24.2, 31.1, 34.0, 27.5, 23.6, 25.5, 27.8, 29.5, 24.8, 31.4, 33.2, 27.2, 23.9, 25.8, 27.4, 29.0, 24.4, 31.6, 33.4, 27.7, 23.2, 25.6]

# Pearson correlation coefficient
corr_coef, p_value = pearsonr(hours_exercise, bmi)
print("Pearson correlation coefficient:", corr_coef)
print("p-value:", p_value)

# Spearman's rank correlation coefficient
spearman_coef, p_value = spearmanr(hours_exercise, bmi)
print("Spearman's rank correlation coefficient:", spearman_coef)
print("p-value:", p_value)

Pearson correlation coefficient: -0.42344699237406
p-value: 0.0019606070692678845
Spearman's rank correlation coefficient: -0.40818114798909094
p-value: 0.0029449608559565564


In [None]:
Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

Ans. x = [3, 2, 4, 1, 5, 6, 7, 4, 2, 3, 5, 1, 2, 6, 7, 5, 4, 3, 2, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1,
          2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 2, 4, 6, 1, 3]

y = [4, 5, 3, 6, 2, 1, 1, 2, 5, 4, 2, 6, 5, 1, 1, 2, 3, 4, 5, 6, 3, 2, 1, 5, 4, 3, 6, 5, 4, 3, 2, 1, 7, 6,
     5, 4, 3, 2, 1, 7, 6, 5, 4, 3, 2, 7, 6, 5, 4, 3, 2, 7, 4, 1, 5, 6]

corr, p_value = pearsonr(x, y)
print("Pearson correlation coefficient:", corr)

In [None]:
Q5. A survey was conducted to examine the relationship between age and preference for a particular
brand of soft drink. The survey results are shown below:
    
Age(Years) Soft Drink Prefernces
25             Coke
42             Pepsi
37             Mountain dew
19             Coke
31             Pepsi
28             Coke

In [3]:
import numpy as np

age = [25, 42, 37, 19, 31, 28]
preferences = [0, 1, 2, 0, 1, 0]

correlation = np.corrcoef(age, preferences)[0, 1]
print("Correlation between age and soft drink preferences:", correlation)

Correlation between age and soft drink preferences: 0.7587035441865058


In [None]:
Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

Ans. from scipy.stats import pearsonr

corr_coef, p_value = pearsonr(sales_calls, sales_made)
print("Pearson correlation coefficient:", corr_coef)