# Feature Engineering-6

#### Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [1]:
# Example
import numpy as ny
from scipy.stats import pearsonr as p
study_time = ny.array([10, 8, 6, 9, 7, 5, 4, 7, 9, 8])
exam_scores = ny.array([85, 78, 60, 90, 75, 70, 65, 80, 88, 82])
pcorr, _ = p(study_time,exam_scores)
if pcorr > 0:
    result = "Positive Correlation"
elif pcorr < 0:
    result = "Negative Correlation"
else:
    result = "No Correlation"
print(f"Pearson Correlation Coefficient: {pcorr:.2f}")
print(f"Interpretation: There is a {result} between study time and exam scores.")

Pearson Correlation Coefficient: 0.86
Interpretation: There is a Positive Correlation between study time and exam scores.


#### Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

In [2]:
import numpy as ny
from scipy.stats import spearmanr as s
sleep_hours = ny.array([7, 6, 8, 5, 6, 7, 8, 6, 5, 7])
job_satisfaction = ny.array([8, 7, 9, 6, 7, 6, 5, 7, 6, 8])
scorr, _ = s(study_time,exam_scores)
if scorr > 0:
    result = "Positive Correlation"
elif scorr < 0:
    result = "Negative Correlation"
else:
    result = "No Correlation"
print(f"Spearman's Rank Correlation Coefficient: {scorr:.2f}")
print(f"Interpretation: There is a {result} between sleep hours and job satisfaction")

Spearman's Rank Correlation Coefficient: 0.89
Interpretation: There is a Positive Correlation between sleep hours and job satisfaction


#### Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [3]:
import numpy as ny
from scipy.stats import pearsonr, spearmanr
ex_hrs = ny.array([3, 5, 2, 4, 6, 1, 7, 2, 3, 4, 6, 5, 2, 4, 5, 3, 6, 1, 4, 2,
                           5, 3, 2, 6, 4, 3, 7, 1, 2, 4, 5, 6, 3, 2, 4, 5, 1, 6, 2, 3,
                           4, 5, 2, 3, 6, 1, 4, 7, 5, 2])  
bmi = ny.array([23, 25, 30, 28, 22, 26, 21, 29, 24, 27, 20, 25, 31, 28, 26, 29, 23,
                30, 27, 29, 25, 28, 30, 22, 26, 29, 21, 31, 28, 26, 25, 23, 30, 29,
                28, 24, 27, 30, 25, 29, 26, 23, 22, 30, 27, 28, 25, 29, 26, 21])
pcorr, _ = pearsonr(ex_hrs,bmi)
scorr, _ = spearmanr(ex_hrs,bmi)
if abs(pcorr) > abs(scorr):
    result = "Stronger Linear Relation"
elif abs(pcorr) < abs(scorr):
    result = "Stronger Monotonic Relation"
else:
    result = "Similar Relationship"
print(f"Pearson Correlation Coefficient: {pcorr:.2f}")
print(f"Spearman's Rank Correlation Coefficient: {scorr:.2f}")
print(f"Interpretation: Pearson correlation has a {result} compared to Spearman's rank correlation.")

Pearson Correlation Coefficient: -0.52
Spearman's Rank Correlation Coefficient: -0.51
Interpretation: Pearson correlation has a Stronger Linear Relation compared to Spearman's rank correlation.


#### Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [4]:
import numpy as ny
from scipy.stats import pearsonr as p
tv_hrs = ny.array([2, 3, 4, 2, 5, 1, 3, 2, 4, 3, 1, 4, 6, 2, 3, 5, 4, 1, 3, 2,
                     5, 2, 4, 3, 1, 2, 5, 3, 4, 2, 1, 5, 3, 2, 4, 6, 2, 1, 3, 5,
                     4, 2, 3, 1, 4, 5, 2, 3, 1, 6])
pa = ny.array([3, 4, 2, 5, 1, 3, 2, 4, 3, 1, 4, 2, 5, 3, 4, 2, 1,
                             5, 3, 4, 2, 1, 3, 2, 4, 6, 3, 2, 4, 5, 1, 3, 2, 4,
                             3, 1, 5, 4, 2, 3, 1, 6, 4, 2, 3, 5, 1, 4, 2, 3])
pcorr, _ = p(tv_hrs, pa)
print(f"Pearson Correlation: {pcorr:.2f}")

Pearson Correlation: -0.22


#### Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:
Age(Years)----Soft drink Preference

* 25 ---- Coke
* 42 ---- Pepsi
* 37 ---- Mountain dew
* 19 ---- Coke
* 31 ---- Pepsi
* 28 ---- Coke

In [5]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from scipy.stats import spearmanr, pearsonr
df = pd.DataFrame({'Age': [25, 42, 37, 19, 31, 28],
    'Soft Drink Preference': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']})
le = LabelEncoder()
df['Soft_Drink_Preference_Encoded'] = le.fit_transform(df['Soft Drink Preference'])
df.head()
pcorr, p_pval = pearsonr(df['Age'],df['Soft_Drink_Preference_Encoded'])
scorr, s_pval = spearmanr(df['Age'],df['Soft_Drink_Preference_Encoded'])
print(f"Spearman's Rank Correlation Coefficient: {scorr:.2f}")
print(f"P-value (Spearman): {s_pval:.2f}")
print(f"Pearson Correlation Coefficient: {pcorr:.2f}")
print(f"P-value (Pearson): {p_pval:.2f}")

Spearman's Rank Correlation Coefficient: 0.83
P-value (Spearman): 0.04
Pearson Correlation Coefficient: 0.77
P-value (Pearson): 0.07


#### Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [6]:
import numpy as ny
from scipy.stats import pearsonr as p
sd = ny.array([10, 8, 12, 9, 11, 7, 10, 9, 11, 8, 10, 12, 8, 9, 7, 11, 10, 9, 12, 8,
                                11, 7, 10, 9, 12, 8, 9, 7, 11, 10]) 
sw = ny.array([70, 52, 82, 68, 75, 45, 68, 72, 78, 55, 68, 80, 50, 62, 45, 74, 68, 60,
                           85, 48, 73, 44, 70, 60, 88, 50, 58, 42, 70, 64]) 
p_corr, _ = p(sd, sw)
print(f"Pearson Correlation: {pcorr:.2f}")

Pearson Correlation: 0.77
