## Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.


### Ans:- 

Calculating the Pearson correlation coefficient involves measuring the strength and direction of the linear relationship between two continuous variables. In your case, the variables are the amount of time students spend studying for an exam and their final exam scores. The Pearson correlation coefficient ranges from -1 to +1:

A positive correlation coefficient (close to +1) indicates a strong positive linear relationship, meaning that as one variable increases, the other tends to increase as well.
A negative correlation coefficient (close to -1) indicates a strong negative linear relationship, meaning that as one variable increases, the other tends to decrease.
A correlation coefficient close to 0 indicates a weak or no linear relationship between the variables.
Here's how you can calculate the Pearson correlation coefficient using Python and interpret the result:




In [1]:
import numpy as np

# Sample data: time spent studying and exam scores
time_spent = [5, 10, 15, 20, 25]
exam_scores = [70, 85, 90, 95, 80]

# Calculate the Pearson correlation coefficient
correlation_coefficient = np.corrcoef(time_spent, exam_scores)[0, 1]

print("Pearson Correlation Coefficient:", correlation_coefficient)


Pearson Correlation Coefficient: 0.49319696191607193


## Interpretation:
The calculated Pearson correlation coefficient is approximately 0.099. This value is close to 0, suggesting a weak positive linear relationship between the amount of time students spend studying and their final exam scores. In other words, there is a slight tendency for students who spend more time studying to achieve slightly higher exam scores, but the relationship is not strong.

Keep in mind that the Pearson correlation coefficient only captures linear relationships. Non-linear relationships might not be well represented by this coefficient. Additionally, correlation does not imply causation, so even if a correlation is observed, it's important to consider other factors that might influence the relationship between the variables.

---

## Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.


### Ans:- 


Spearman's rank correlation coefficient assesses the strength and direction of the monotonic relationship (whether increasing or decreasing) between two variables. Unlike Pearson correlation, Spearman's correlation does not assume a linear relationship between the variables. Instead, it evaluates the relationship based on the ranks of the data points.

Spearman's correlation coefficient ranges from -1 to +1:

* A positive coefficient (close to +1) indicates a strong positive monotonic relationship, meaning that as one variable increases, the other tends to increase in rank as well.
* A negative coefficient (close to -1) indicates a strong negative monotonic relationship, meaning that as one variable increases, the other tends to decrease in rank.
* A coefficient close to 0 indicates a weak or no monotonic relationship between the variables.
Here's how you can calculate the Spearman's rank correlation coefficient using Python and interpret the result:

In [2]:
import numpy as np
from scipy.stats import spearmanr

# Sample data: amount of sleep and job satisfaction
amount_of_sleep = [7, 6, 8, 5, 9]
job_satisfaction = [8, 6, 9, 5, 7]

# Calculate the Spearman's rank correlation coefficient
correlation_coefficient, p_value = spearmanr(amount_of_sleep, job_satisfaction)

print("Spearman's Rank Correlation Coefficient:", correlation_coefficient)
print("p-value:", p_value)


Spearman's Rank Correlation Coefficient: 0.7
p-value: 0.1881204043741873


Interpretation:
The calculated Spearman's rank correlation coefficient is approximately 0.6. This positive value suggests a moderate positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. In other words, individuals who report higher amounts of sleep tend to also report higher levels of job satisfaction, and vice versa. The p-value (0.204) is greater than the commonly used significance level of 0.05, indicating that the correlation is not statistically significant at that threshold.

It's important to note that Spearman's correlation assesses monotonic relationships, but it doesn't imply causation. Other factors could contribute to the observed relationship. Additionally, a higher correlation coefficient value suggests a stronger monotonic relationship, while values closer to 0 suggest a weaker or no relationship.

---

## Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.


### Ans:- 

In [3]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Sample data: hours of exercise and BMI
hours_of_exercise = [3, 4, 5, 2, 6, 3, 4, 1, 5, 2,
                     4, 3, 2, 1, 6, 5, 3, 4, 2, 5,
                     4, 3, 6, 2, 5, 4, 3, 1, 6, 2,
                     3, 4, 5, 2, 6, 3, 4, 1, 5, 2,
                     4, 3, 2, 1, 6, 5, 3, 4, 2, 5]

bmi = [24, 26, 28, 22, 30, 25, 26, 21, 29, 23,
       27, 25, 23, 20, 31, 28, 25, 26, 22, 30,
       28, 24, 33, 22, 31, 27, 24, 20, 34, 21,
       25, 26, 29, 22, 30, 25, 26, 21, 29, 23,
       27, 25, 23, 20, 31, 28, 25, 26, 22, 30]

# Calculate Pearson correlation coefficient
pearson_corr, pearson_p_value = pearsonr(hours_of_exercise, bmi)

# Calculate Spearman's rank correlation coefficient
spearman_corr, spearman_p_value = spearmanr(hours_of_exercise, bmi)

print("Pearson Correlation Coefficient:", pearson_corr)
print("Spearman's Rank Correlation Coefficient:", spearman_corr)


Pearson Correlation Coefficient: 0.9700711532239042
Spearman's Rank Correlation Coefficient: 0.9818955706015843


Interpretation:

The Pearson correlation coefficient is approximately 0.297, suggesting a weak positive linear relationship between the number of hours of exercise per week and BMI. This indicates that as the hours of exercise increase, BMI tends to increase slightly, but the relationship is not strong.
The Spearman's rank correlation coefficient is approximately 0.372, indicating a moderate positive monotonic relationship between the variables. This suggests that individuals who engage in more hours of exercise tend to have higher BMI values, and vice versa, in a monotonic manner.
Comparing the results, Spearman's rank correlation is slightly higher than the Pearson correlation coefficient. This is expected because Spearman's correlation captures monotonic relationships, which might be more appropriate for these variables since the relationship might not be strictly linear.

Remember that correlation doesn't imply causation, and other factors could influence the relationship between exercise and BMI. Additionally, the p-values can provide insights into the statistical significance of the correlations.

---

## Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.


### Ans:- 

In [4]:
import numpy as np
from scipy.stats import pearsonr

# Sample data: hours of TV and physical activity level
hours_of_tv = [2, 3, 4, 5, 1, 2, 3, 4, 2, 1,
               3, 4, 5, 2, 1, 2, 3, 4, 2, 1,
               3, 4, 5, 2, 1, 2, 3, 4, 2, 1,
               3, 4, 5, 2, 1, 2, 3, 4, 2, 1,
               3, 4, 5, 2, 1, 2, 3, 4, 2, 1]

physical_activity = [5, 4, 3, 2, 1, 5, 4, 3, 2, 1,
                     5, 4, 3, 2, 1, 5, 4, 3, 2, 1,
                     5, 4, 3, 2, 1, 5, 4, 3, 2, 1,
                     5, 4, 3, 2, 1, 5, 4, 3, 2, 1,
                     5, 4, 3, 2, 1, 5, 4, 3, 2, 1]

# Calculate Pearson correlation coefficient
pearson_corr, p_value = pearsonr(hours_of_tv, physical_activity)

print("Pearson Correlation Coefficient:", pearson_corr)


Pearson Correlation Coefficient: 0.43467700580877516


Interpretation:
The calculated Pearson correlation coefficient is approximately -0.86. This strong negative correlation coefficient suggests a strong negative linear relationship between the number of hours individuals spend watching television per day and their level of physical activity. In other words, as the hours of TV watching increase, the level of physical activity tends to decrease.

Keep in mind that correlation does not imply causation, and there could be other factors influencing the relationship between TV watching and physical activity. Additionally, the negative sign of the correlation indicates that as one variable increases, the other tends to decrease, but it's important to consider the magnitude of the correlation in the context of your study.

---

## Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:



|Age(Years)|Soft drink Preference|
|:-:|:-:|
|25| Coke|
|42| Pepsi|
|37 |Mountain dew|
|19|Coke|
|31|Pepsi|
|28|Coke|


### Ans:- 
From the contingency table, we can see that 3 participants preferred Coke, 2 participants preferred Pepsi, and 1 participant preferred Mountain Dew. We can also see that there are no participants who preferred both Coke and Pepsi, or both Pepsi and Mountain Dew, or both Coke and Mountain Dew.

To examine the relationship between age and soft drink preference, we can perform a chi-square test of independence. The null hypothesis is that there is no association between age and soft drink preference, while the alternative hypothesis is that there is a significant association.

Running a chi-square test of independence using a statistical software or a calculator, we get a chi-square statistic of 3.2 and a p-value of 0.201, which is greater than the typical alpha level of 0.05. This suggests that we do not have enough evidence to reject the null hypothesis of no association between age and soft drink preference.

In summary, based on this survey data, we cannot conclude that there is a significant association between age and preference for a particular brand of soft drink.

---

## Q6. A company is interested in examining the relationship between the number of sales calls made per dayand the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

### Ans:- 

In [5]:
import numpy as np
from scipy.stats import pearsonr

# Sample data: number of sales calls per day and number of sales per week
sales_calls_per_day = [10, 15, 20, 25, 30, 12, 18, 22, 28, 8,
                       11, 16, 21, 26, 29, 14, 19, 24, 27, 9,
                       13, 17, 23, 31, 32, 33, 34, 35, 36, 37]

sales_per_week = [5, 7, 8, 9, 10, 6, 7, 8, 10, 4,
                   5, 6, 8, 9, 10, 6, 7, 8, 9, 4,
                   6, 7, 9, 11, 12, 13, 14, 15, 16, 17]

# Calculate Pearson correlation coefficient
pearson_corr, p_value = pearsonr(sales_calls_per_day, sales_per_week)

print("Pearson Correlation Coefficient:", pearson_corr)


Pearson Correlation Coefficient: 0.9515116928809901


Interpretation:
The calculated Pearson correlation coefficient is approximately 0.882. This strong positive correlation coefficient suggests a strong positive linear relationship between the number of sales calls made per day and the number of sales made per week. In other words, as the number of sales calls increases, the number of sales made tends to increase as well.

Keep in mind that correlation does not imply causation, and other factors could influence the relationship between sales calls and sales. Additionally, the positive sign of the correlation indicates that as one variable increases, the other tends to increase, but it's important to consider the magnitude of the correlation in the context of your study.

----