# 1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

- **The Pearson correlation coefficient** measures the strength and direction of the linear relationship between two continuous variables.
- In our case, we want to calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores.

In [3]:
import numpy as np
from scipy.stats import pearsonr

# Sample data: Study time (in hours) and Exam scores
study_time = [10, 15, 8, 12, 20]
exam_scores = [85, 90, 75, 80, 95]

# Calculate the Pearson correlation coefficient and p-value
corr_coefficient, p_value = pearsonr(study_time, exam_scores)

print(f'Pearson Correlation Coefficient: {corr_coefficient:.2f}')
print(f'p-value: {p_value:.4f}')

Pearson Correlation Coefficient: 0.91
p-value: 0.0319


- # **Pearson Correlation Coefficient (r):**
> - > - The Pearson correlation coefficient ranges from -1 to 1.
> - > - A positive value (e.g., 0.80) indicates a positive linear relationship, meaning that as students spend more time studying, their exam scores tend to increase. 
> - > - A negative value (e.g., -0.80) would indicate a negative linear relationship, implying that more study time is associated with lower exam scores. 
> - > - In this case, a value of 0.80 suggests a strong positive linear relationship, meaning that students who study more tend to have higher exam scores.
- # **p-value:**
> - > - The p-value (p-value: 0.0365) indicates the probability of observing the correlation by random chance. 
> - > - If the p-value is less than a chosen significance level (e.g., 0.05), it suggests that the correlation is statistically significant.
> - > - In this case, the p-value is less than 0.05, suggesting that the correlation between study time and exam scores is statistically significant.

- > - Therefore, we can conclude that there is a statistically significant and strong positive linear relationship between the amount of time students spend studying for an exam and their final exam scores.
- > - In simple terms, students who study more tend to perform better on their exams.

# 2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

- **Spearman's rank correlation** is a measure of the monotonic relationship between two variables, which means it assesses whether there's a consistent trend in the data, even if it's not strictly linear.
- In our case, we want to calculate the Spearman's rank correlation between the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. 

In [4]:
import numpy as np
from scipy.stats import spearmanr

# Sample data -  amount of sleep and job satisfaction (on a scale of 1 to 10)
amount_of_sleep = [7, 6, 8, 5, 7, 8, 6]
job_satisfaction = [8, 6, 9, 4, 7, 9, 5]

# Calculating the Spearman's rank correlation coefficient and p-value
corr_coefficient, p_value = spearmanr(amount_of_sleep, job_satisfaction)

print(f"Spearman's Rank Correlation Coefficient: {corr_coefficient:.2f}")
print(f'p-value: {p_value:.4f}')

Spearman's Rank Correlation Coefficient: 0.98
p-value: 0.0001


- # **Spearman's Rank Correlation Coefficient (ρ):**
- > - The Spearman's rank correlation coefficient ranges from -1 to 1.
- > - A positive value (e.g., 0.86) indicates a positive monotonic relationship, meaning that as individuals get more sleep, their job satisfaction tends to increase. 
- > - A negative value (e.g., -0.86) would indicate a negative monotonic relationship, suggesting that more sleep is associated with lower job satisfaction. 
- > - In this case, a value of 0.86 suggests a strong positive monotonic relationship, meaning that individuals who get more sleep tend to report higher job satisfaction.
- # **p-value:**
- > - The p-value (p-value: 0.0129) indicates the probability of observing the correlation by random chance.
- > - If the p-value is less than a chosen significance level (e.g., 0.05), it suggests that the correlation is statistically significant.
- > - In this case, the p-value is less than 0.05, indicating that the correlation between the amount of sleep and job satisfaction is statistically significant.

> - => Hence, we can conclude that there is a statistically significant and strong positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction.
> - => In simple terms, individuals who get more sleep tend to report higher job satisfaction, and this relationship is not necessarily linear but follows a consistent trend.

# 3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [1]:
import numpy as np 
from scipy.stats import pearsonr, spearmanr 

# Sampled data : - Hours of exercise per week and BMI for 50 participants 
hours_of_exercise = [3, 4, 2, 5, 6, 3, 4, 5, 1, 2, 6, 3, 4, 5, 2, 1, 3, 4, 5, 2, 4, 6, 1, 3, 5, 4, 2, 6, 4, 3, 2, 5, 4, 3, 6, 2, 1, 4, 5, 2, 3, 6, 4, 1, 5, 2, 3, 4, 6]
bmi = [27, 29, 31, 25, 24, 28, 26, 23, 30, 29, 24, 27, 26, 23, 30, 32, 28, 26, 24, 30, 29, 24, 31, 28, 25, 26, 30, 24, 28, 29, 25, 26, 27, 23, 28, 31, 26, 25, 30, 27, 29, 25, 32, 27, 28, 23, 29, 26, 31]

## calculating the pearson correlation coefficient and p-value 
pearson_corr , pearson_p_value = pearsonr(hours_of_exercise, bmi)

## calculting the spearman's correlation coefficient and p-value 
spearman_corr , spearman_p_value = spearmanr(hours_of_exercise, bmi)

print(f"Pearson Correlation Coefficient : {pearson_corr:.2f}")
print(f'Pearson p-value: {pearson_p_value:.4f}')

print(f"Spearman's Rank Correlation Coefficient: {spearman_corr:.2f}")
print(f'Spearman p-value: {spearman_p_value:.4f}')


Pearson Correlation Coefficient : -0.46
Pearson p-value: 0.0009
Spearman's Rank Correlation Coefficient: -0.47
Spearman p-value: 0.0007


- # **Pearson Correlation Coefficient:**
> - > - The Pearson correlation coefficient is -0.46. 
> - > - This negative value indicates a negative linear relationship between the number of hours of exercise per week and BMI.
> - > - In simpler terms, as the number of hours of exercise per week increases, BMI tends to decrease.
> - > - The strength of this negative linear relationship is moderate.
- # **Spearman's Rank Correlation Coefficient:**
> - > - The Spearman's rank correlation coefficient is -0.47. 
> - > - Like the Pearson coefficient, this negative value suggests a negative relationship between the number of hours of exercise per week and BMI.
> - > - It's a consistent trend, indicating that as exercise hours increase, BMI tends to decrease. This correlation is also of moderate strength.
- # **p-values:**
> - > - Both the Pearson and Spearman correlations have associated p-values. 
> - > - Both p-values are very low (Pearson p-value: 0.0009 and Spearman p-value: 0.0007). These low p-values suggest that the correlations are statistically significant, meaning the observed relationships between exercise hours and BMI are unlikely to have occurred by random chance.

- > - Hence, both the Pearson and Spearman correlations indicate a statistically significant, moderate negative relationship between the number of hours of exercise per week and BMI in your sample of adults.
- > - This means that, in your dataset, as people engage in more hours of exercise, their BMI tends to decrease, and this relationship is consistent and not likely due to random chance.

# 4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [16]:
import numpy as np
from scipy.stats import pearsonr

# Sample data: Hours of TV per day and Level of Physical Activity (on a scale of 1 to 10)
hours_of_tv = [2, 3, 4, 5, 1, 3, 2, 4, 6, 5, 2, 1, 3, 4, 5, 6, 2, 3, 1, 4, 5, 6, 2, 3, 4, 5, 6, 1, 3, 2, 4, 1, 5, 2, 6, 4, 3, 5, 1, 2, 3, 6, 4, 5, 1, 2, 3, 5]
physical_activity = [6, 7, 5, 4, 8, 6, 7, 5, 3, 4, 7, 8, 6, 5, 4, 3, 7, 6, 8, 5, 4, 3, 7, 6, 5, 4, 3, 8, 6, 7, 5, 8, 4, 7, 3, 5, 6, 4, 8, 7, 6, 3, 5, 8, 7, 6, 4,1 ]

# Calculate the Pearson correlation coefficient and p-value
corr_coefficient, p_value = pearsonr(hours_of_tv, physical_activity)

print(f'Pearson Correlation Coefficient: {corr_coefficient:.2f}')
print(f'p-value: {p_value:.4f}')


Pearson Correlation Coefficient: -0.88
p-value: 0.0000


- # **Pearson Correlation Coefficient (r):**
> - > - The Pearson Correlation Coefficient ranges from -1 to 1. 
> - > - In this case, a value of -0.88 suggests a very strong negative linear relationship between the two variables you examined.
> - > - 
> - > - This means that as one variable increases, the other tends to decrease, and vice versa. It's a strong and consistent negative relationship.
- # **p-value:**
> - > - The p-value (p-value: 0.0000) is essentially zero, indicating that the correlation you observed is highly statistically significant. 
> - > - In other words, it's extremely unlikely that this correlation occurred by random chance. 
> - > - This low p-value reinforces the strong negative relationship we found.

- > - =>  In simple terms, the results indicate a very strong and highly statistically significant negative linear relationship between the two variables.
- > - => When one variable increases, the other consistently decreases, and this relationship is not due to chance.

# 5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

> - Age(Years) -    Soft drink Preference

> - 25        ||      Coke
> - 42        ||     Pepsi
> - 37        ||    Mountain dew
> - 19        ||     Coke
> - 31        ||      Pepsi
> - 28        ||      Coke

In [17]:
# Age and Soft Drink Preference data
ages = [25, 42, 37, 19, 31, 28]
preferences = ["Coke", "Pepsi", "Mountain Dew", "Coke", "Pepsi", "Coke"]

# Create a dictionary to store preferences for each age group
age_preferences = {}

# Iterate through the data and group preferences by age
for age, preference in zip(ages, preferences):
    if age in age_preferences:
        age_preferences[age].append(preference)
    else:
        age_preferences[age] = [preference]

# Calculate the most preferred soft drink for each age group
most_preferred = {}
for age, preference_list in age_preferences.items():
    most_preferred[age] = max(set(preference_list), key=preference_list.count)

# Print the results
for age, preference in most_preferred.items():
    print(f"For respondents aged {age} years, the most preferred soft drink is {preference}.")


For respondents aged 25 years, the most preferred soft drink is Coke.
For respondents aged 42 years, the most preferred soft drink is Pepsi.
For respondents aged 37 years, the most preferred soft drink is Mountain Dew.
For respondents aged 19 years, the most preferred soft drink is Coke.
For respondents aged 31 years, the most preferred soft drink is Pepsi.
For respondents aged 28 years, the most preferred soft drink is Coke.


- > - Hence, this analysis provides insights into the soft drink preferences of respondents in different age groups. 
- > - It's important to note that this analysis is based on the most frequently chosen soft drink within each age group, which may not necessarily indicate a direct causal relationship between age and preference.

# 6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [21]:
import numpy as np
from scipy.stats import pearsonr

# Sample data: Number of sales calls made per day and number of sales made per week
sales_calls_per_day = [20, 25, 15, 30, 10, 18, 22, 28, 12, 24, 19, 21, 17, 14, 26, 23, 16, 29, 27, 13, 31, 32, 11, 33, 34, 35, 36, 37, 38, 39, 40]
sales_per_week = [3, 5, 2, 6, 1, 3, 4, 5, 2, 4, 3, 4, 2, 1, 5, 4, 2, 6, 5, 1, 7, 8, 1, 9, 10, 11, 12, 13, 14, 15, 17]

# Calculate the Pearson correlation coefficient and p-value
corr_coefficient, p_value = pearsonr(sales_calls_per_day, sales_per_week)

print(f'Pearson Correlation Coefficient: {corr_coefficient:.2f}')
print(f'p-value: {p_value:.4f}')


Pearson Correlation Coefficient: 0.94
p-value: 0.0000


- # **Pearson Correlation Coefficient (r):**
> - > - The Pearson correlation coefficient measures the strength and direction of the linear relationship between two continuous variables. 
> - > - In this context, the Pearson correlation coefficient of 0.94 indicates a very strong positive linear relationship.
> - > - This means that as the number of sales calls made per day increases, the number of sales made per week tends to increase significantly.
- # **p-value:**
> - > - The p-value (p-value: 0.0000) indicates the statistical significance of the observed correlation.
> - > - In this case, the extremely low p-value (close to zero) suggests that the correlation between sales calls per day and sales per week is highly statistically significant.
> - > - Essentially, this means that the observed strong positive correlation is unlikely to be due to random chance.

- > - Hence. un simple terms, the Pearson correlation coefficient of 0.94 suggests a very strong and statistically significant positive relationship.
- > - When sales representatives make more sales calls per day, it significantly corresponds to an increase in the number of sales made per week. 
- > - This finding implies that making more calls leads to higher sales, which can be valuable information for the company's sales strategy.