### Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.


In [1]:
import numpy as np
import pandas as pd
from scipy.stats import pearsonr

data = {
    'Hours_Studied': [5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
    'Exam_Score': [50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
}
df = pd.DataFrame(data)

corr, _ = pearsonr(df['Hours_Studied'], df['Exam_Score'])
corr


1.0

The Pearson correlation coefficient between hours studied and exam scores will be calculated. A value close to 1 indicates a strong positive linear relationship, meaning as the hours studied increase, the exam scores also increase.


### Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.


In [2]:
from scipy.stats import spearmanr

data = {
    'Hours_Sleep': [6, 7, 8, 5, 6, 7, 9, 4, 5, 6],
    'Job_Satisfaction': [5, 6, 7, 4, 5, 6, 8, 3, 4, 5]
}
df = pd.DataFrame(data)

corr, _ = spearmanr(df['Hours_Sleep'], df['Job_Satisfaction'])
corr


1.0

The Spearman's rank correlation coefficient between hours of sleep and job satisfaction will be calculated. A value close to 1 indicates a strong positive monotonic relationship, meaning as the hours of sleep increase, the job satisfaction level also increases.


### Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.


In [3]:
np.random.seed(0)
hours_exercise = np.random.randint(0, 10, 50)
bmi = np.random.uniform(18, 30, 50)

df = pd.DataFrame({'Hours_Exercise': hours_exercise, 'BMI': bmi})

pearson_corr, _ = pearsonr(df['Hours_Exercise'], df['BMI'])

spearman_corr, _ = spearmanr(df['Hours_Exercise'], df['BMI'])

pearson_corr, spearman_corr


(0.1258122441364603, 0.13016221523048604)

Both the Pearson correlation coefficient and the Spearman's rank correlation coefficient will be calculated. Comparing these results will help understand if there is a linear relationship (Pearson) or a monotonic relationship (Spearman) between the number of hours of exercise and BMI.


### Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.


In [4]:
np.random.seed(1)
hours_tv = np.random.randint(0, 5, 50)
physical_activity = np.random.randint(0, 10, 50)

df = pd.DataFrame({'Hours_TV': hours_tv, 'Physical_Activity': physical_activity})

corr, _ = pearsonr(df['Hours_TV'], df['Physical_Activity'])
corr


0.1640933977448282

The Pearson correlation coefficient between the number of hours spent watching television and the level of physical activity will be calculated. A negative value would indicate that as TV watching hours increase, physical activity decreases.


### Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:
- Age (Years): 25, 42, 37, 19, 31, 28
- Preference: Coke, Pepsi, Mountain Dew, Coke, Pepsi, Coke


In [6]:
from sklearn.preprocessing import LabelEncoder

data = {
    'Age': [25, 42, 37, 19, 31, 28],
    'Preference': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']
}
df = pd.DataFrame(data)

label_encoder = LabelEncoder()
df['Preference_Encoded'] = label_encoder.fit_transform(df['Preference'])

df


Unnamed: 0,Age,Preference,Preference_Encoded
0,25,Coke,0
1,42,Pepsi,2
2,37,Mountain Dew,1
3,19,Coke,0
4,31,Pepsi,2
5,28,Coke,0


The DataFrame will show the ages and encoded preferences for the soft drinks. Label encoding is used to convert categorical variables into numerical values.


### Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.


In [7]:
np.random.seed(2)
sales_calls = np.random.randint(5, 20, 30)
sales = np.random.randint(1, 10, 30)

df = pd.DataFrame({'Sales_Calls': sales_calls, 'Sales': sales})

corr, _ = pearsonr(df['Sales_Calls'], df['Sales'])
corr


-0.053909403413974044

The Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week will be calculated. A positive value indicates that as the number of sales calls increases, the number of sales also increases.
