## Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

The Pearson correlation coefficient, denoted as "r," measures the strength and direction of the linear relationship between two continuous variables. It ranges from [-1 to 1]
![1_nIYaRBXdshKZZLLc_tJ5dw.webp](attachment:e14e5e68-0a7f-41aa-90c2-700251d749b6.webp)
OR
#### **r(x, y) = cov(x,y)/(σx × σy)**
 
 **Properties:**
- The range of r is between [-1,1].
- The computation of r is independent of the change of origin and scale of measurement.
- r = 1 (perfectly positive correlation), r =-1 (perfectly negative correction)
- r = 0 (no correlation)
![1_fF-cRPB7OWdKEBR8aUQw_Q.webp](attachment:00fc55e1-82c5-4f6d-849f-06a8706668c3.webp)

In [1]:
import pandas as pd

# Sample data for Studying Time (in hours) and Exam Scores
data = {
    'Studying Time (hours)': [5, 8, 10, 4, 7],
    'Exam Scores': [85, 92, 88, 78, 90]
}

# Create a DataFrame
df = pd.DataFrame(data)

x=df.corr(method='pearson')
# OR
# Calculate the Pearson correlation coefficient using pandas
correlation_coefficient = df['Studying Time (hours)'].corr(df['Exam Scores'])

# Display the Pearson correlation coefficient
print("Pearson Correlation Coefficient (r):", correlation_coefficient)
print (x)

Pearson Correlation Coefficient (r): 0.740426158392602
                       Studying Time (hours)  Exam Scores
Studying Time (hours)               1.000000     0.740426
Exam Scores                         0.740426     1.000000


**Interpretation**
The calculated Pearson correlation coefficient (r) is approximately 0.7404, which is a positive value close to 1. This suggests a strong positive linear relationship between studying time and exam scores. In other words, as students spend more time studying, their exam scores tend to increase.

## Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

Spearman's rank correlation coefficient (ρ) is used to measure the strength and direction of the monotonic relationship between two variables. Unlike the Pearson correlation coefficient, Spearman's correlation is suitable for both continuous and ordinal categorical data and does not assume a linear relationship.

![1_sIf_Ddn_9yO7bmZfgHBavw.webp](attachment:f936c98f-0114-4ed2-a408-2164ae250393.webp)
**General Formula**
![1_LYITuZuuz_sjWqSgVweVCQ.webp](attachment:7ba8549e-392b-4970-aa62-5ca4790028a4.webp)
**Simplified Formula**![1_eorfwGB_LtnWC31drM6Dqg.webp](attachment:4938f58c-5d26-4203-aebd-f4458f57c2d0.webp)


In [2]:
from scipy.stats import spearmanr

# Sample data for Amount of Sleep (in hours) and Job Satisfaction (on a scale of 1 to 10)
amount_of_sleep = [7, 6, 5, 8, 7, 6, 5, 9, 8, 7]
job_satisfaction = [8, 7, 5, 9, 8, 6, 4, 10, 9, 7]

# Calculate Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(amount_of_sleep, job_satisfaction)

# Display the Spearman's rank correlation coefficient
print("Spearman's Rank Correlation (ρ):", spearman_corr)


Spearman's Rank Correlation (ρ): 0.9719509366333152


**Interpretation**
The calculated Spearman's rank correlation coefficient (ρ) is approximately 0.9719. This value indicates a highly positive monotonic relationship between the amount of sleep and job satisfaction. In other words, individuals who get more sleep tend to report higher job satisfaction levels, and vice versa.

## Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [3]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Sample data for Exercise Hours per Week and BMI
exercise_hours = [3, 5, 2, 4, 6, 1, 2, 3, 5, 4, 3, 6, 1, 2, 4, 5, 3, 2, 4, 6, 1, 3, 2, 5, 4,
                  3, 6, 1, 2, 4, 5, 3, 2, 4, 6, 1, 3, 2, 5, 4, 3, 6, 1, 2, 4, 5, 3, 2, 4, 6]
bmi = [25.5, 27.0, 28.1, 26.5, 24.8, 29.7, 30.2, 27.8, 26.0, 25.3, 27.2, 25.1, 30.5, 29.9, 26.7,
       25.8, 28.0, 29.5, 26.9, 24.6, 30.8, 26.4, 28.3, 27.6, 25.7, 27.1, 24.9, 30.0, 29.4, 26.2,
       25.6, 27.4, 28.7, 26.8, 30.3, 29.3, 27.7, 25.9, 24.7, 30.7, 28.2, 27.3, 26.1, 29.8, 30.1,
       28.5, 25.0, 24.5, 29.5, 26.9]

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(exercise_hours, bmi)

# Calculate Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(exercise_hours, bmi)

# Display the correlation coefficients
print("Pearson Correlation Coefficient:", pearson_corr)
print("Spearman's Rank Correlation:", spearman_corr)


Pearson Correlation Coefficient: -0.4920349936913371
Spearman's Rank Correlation: -0.4803540755552919


**Interpretation**
The calculated Pearson correlation coefficient is approximately -0.492, indicating a moderate negative linear relationship between exercise hours and BMI. This suggests that, on average, as the number of hours of exercise per week increases, BMI tends to decrease.

The calculated Spearman's rank correlation coefficient is approximately -0.480, indicating a moderate negative monotonic relationship between exercise hours and BMI. This suggests that, on average, as the number of hours of exercise per week increases, BMI tends to decrease in a monotonic fashion.

## Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [4]:
import numpy as np

# Sample data for Hours of TV Watching per Day and Physical Activity Level
tv_hours = [2, 3, 4, 2, 5, 1, 3, 2, 4, 3, 2, 5, 1, 3, 4, 2, 5, 1, 3, 4,
            2, 5, 1, 3, 2, 4, 3, 2, 5, 1, 3, 4, 2, 5, 1, 3, 4, 2, 5, 1,
            3, 4, 2, 5, 1, 3, 4, 2, 5, 1]
physical_activity = [3, 4, 2, 3, 4, 5, 4, 3, 2, 3, 4, 2, 5, 4, 3, 4, 2, 3, 5, 4,
                     3, 2, 4, 3, 5, 4, 3, 4, 2, 3, 4, 5, 4, 3, 2, 3, 4, 2, 5, 4,
                     3, 4, 2, 3, 4, 5, 4, 3, 2, 3]

# Calculate Pearson correlation coefficient using NumPy
pearson_corr, _ = pearsonr(tv_hours, physical_activity)

# Display the Pearson correlation coefficient
print("Pearson Correlation Coefficient:", pearson_corr)


Pearson Correlation Coefficient: -0.20986450263172346


## Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:
![image.png](attachment:4fbe5204-27bf-416f-b849-bf59338d632a.png)

In [5]:
import pandas as pd
from scipy.stats import spearmanr

# Sample data for Age and Soft Drink preference

Age=[25, 42, 37, 19, 31, 28]
Soft_Drink= ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']

# Create a DataFrame
df = pd.DataFrame(data)

spearman_corr,_ = spearmanr(Age, Soft_Drink)

# Display the Spearman's rank correlation coefficient
print("Spearman's Rank Correlation:", spearman_corr)


Spearman's Rank Correlation: 0.8332380897952965


## Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [6]:
# Sample data for Sales Calls per Day and Sales per Week
sales_calls_per_day = [20, 15, 18, 25, 30, 22, 19, 16, 24, 28, 21, 17, 23, 27, 29,
                        20, 15, 18, 25, 30, 22, 19, 16, 24, 28, 21, 17, 23, 27, 29]
sales_per_week = [100, 85, 90, 110, 120, 105, 95, 88, 115, 125, 98, 87, 112, 118, 130,
                   102, 82, 88, 108, 122, 110, 100, 86, 120, 130, 95, 80, 105, 115, 128]

# Calculate Pearson correlation coefficient using NumPy
pearson_corr, _ = pearsonr(sales_calls_per_day,sales_per_week)

# Display the Pearson correlation coefficient
print("Pearson Correlation Coefficient:", pearson_corr)


Pearson Correlation Coefficient: 0.9505698867985949


The calculated Pearson correlation coefficient is approximately 0.950. This indicates a strong positive linear relationship between the number of sales calls made per day and the number of sales made per week. In other words, as the number of sales calls per day increases, the number of sales per week tends to increase as well.