#Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

# **Pearson Correlation Coefficient for Study Time and Exam Scores**

## **Step 1: Understanding Pearson Correlation**
The **Pearson correlation coefficient (r)** measures the strength and direction of the **linear** relationship between two variables:
- **r = 1** → Perfect positive correlation (as one increases, the other increases).
- **r = -1** → Perfect negative correlation (as one increases, the other decreases).
- **r = 0** → No linear relationship.

---

## **Step 2: Formula for Pearson Correlation**
The formula for Pearson’s correlation coefficient is:

\[
r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2} \cdot \sqrt{\sum (Y_i - \bar{Y})^2}}
\]

Where:
- \(X_i\) and \(Y_i\) are individual data points for **study time** and **exam scores**.
- \(\bar{X}\) and \(\bar{Y}\) are the means of study time and exam scores.
- The numerator calculates the **covariance** between the two variables.
- The denominator normalizes by the **standard deviations** of each variable.

---

## **Step 3: Interpretation of Pearson Correlation**
| Pearson \( r \) Value | Interpretation |
|-----------------|----------------|
| \( r > 0.7 \) | Strong positive correlation (More study time leads to higher scores). |
| \( 0.3 < r < 0.7 \) | Moderate positive correlation. |
| \( 0 < r < 0.3 \) | Weak positive correlation. |
| \( r = 0 \) | No correlation. |
| \( -0.3 < r < 0 \) | Weak negative correlation. |
| \( -0.7 < r < -0.3 \) | Moderate negative correlation. |
| \( r < -0.7 \) | Strong negative correlation (More study time leads to lower scores, which is unlikely in this scenario). |

---

## **Conclusion**
- If \( r \) is **close to 1**, students who study more tend to score higher.
- If \( r \) is **close to 0**, study time does not strongly affect scores.
- If \( r \) is **negative**, increased study time could indicate ineffective study strategies.

Pearson correlation helps in understanding whether study time is **linearly** related to exam performance.


#Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

# **Spearman’s Rank Correlation for Sleep Duration and Job Satisfaction**

## **Step 1: Understanding Spearman's Rank Correlation**
The **Spearman’s rank correlation coefficient (ρ or rₛ)** measures the strength and direction of the **monotonic** relationship between two variables:
- Unlike Pearson correlation, which measures **linear** relationships, Spearman’s correlation detects whether one variable **increases or decreases consistently** as the other changes.
- It is useful for **ordinal data** and **non-linear** relationships.

---

## **Step 2: Formula for Spearman’s Rank Correlation**
The formula for Spearman’s rank correlation coefficient is:

\[
r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}
\]

Where:
- \( d_i \) is the **difference** between the ranks of corresponding values in two variables.
- \( n \) is the number of observations.

### **Steps to Calculate Spearman’s Rank Correlation**
1. **Rank** both variables (sleep duration and job satisfaction).
2. Calculate the **differences** (\( d_i \)) between the ranks.
3. Square the differences (\( d_i^2 \)).
4. Use the formula to compute \( r_s \).

---

## **Step 3: Interpretation of Spearman’s Rank Correlation**
| Spearman \( r_s \) Value | Interpretation |
|-----------------|----------------|
| \( r_s > 0.7 \) | Strong positive correlation (More sleep leads to higher job satisfaction). |
| \( 0.3 < r_s < 0.7 \) | Moderate positive correlation. |
| \( 0 < r_s < 0.3 \) | Weak positive correlation. |
| \( r_s = 0 \) | No correlation. |
| \( -0.3 < r_s < 0 \) | Weak negative correlation. |
| \( -0.7 < r_s < -0.3 \) | Moderate negative correlation. |
| \( r_s < -0.7 \) | Strong negative correlation (More sleep leads to lower job satisfaction, which is unlikely in this scenario). |

---

## **Conclusion**
- If \( r_s \) is **close to 1**, people who get more sleep tend to have higher job satisfaction.
- If \( r_s \) is **close to 0**, sleep duration has little to no effect on job satisfaction.
- If \( r_s \) is **negative**, it suggests an **inverse** relationship (unlikely in this case).

Spearman’s rank correlation helps in understanding the **general trend** rather than assuming a strict linear relationship.


#Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

# **Comparison of Pearson and Spearman Correlation for Exercise Hours and BMI**

## **Step 1: Understanding the Correlation Measures**
To analyze the relationship between **exercise hours per week** and **BMI**, we compute:
1. **Pearson Correlation Coefficient (r)** – Measures the **linear relationship** between two continuous variables.
2. **Spearman's Rank Correlation Coefficient (rₛ)** – Measures the **monotonic relationship** (whether an increase in one variable corresponds to an increase or decrease in another, not necessarily in a linear manner).

---

## **Step 2: Formula for Pearson Correlation**
Pearson’s correlation is given by:

\[
r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2} \cdot \sqrt{\sum (Y_i - \bar{Y})^2}}
\]

Where:
- \( X \) represents **exercise hours per week**.
- \( Y \) represents **BMI**.
- \( \bar{X} \) and \( \bar{Y} \) are the respective means.
- Pearson’s correlation assumes a **linear** relationship and requires **normally distributed** data.

---

## **Step 3: Formula for Spearman’s Rank Correlation**
Spearman’s rank correlation is computed as:

\[
r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}
\]

Where:
- \( d_i \) is the difference between the ranks of corresponding \( X \) and \( Y \) values.
- It is **non-parametric** and detects **monotonic** relationships.

---

## **Step 4: Interpreting and Comparing Results**
| Correlation Measure | Interpretation |
|---------------------|----------------|
| **Pearson’s \( r \)** | Measures **linear correlation** between exercise and BMI. A strong negative Pearson correlation suggests that **more exercise leads to lower BMI** in a linear manner. |
| **Spearman’s \( r_s \)** | Measures the **general trend** (monotonicity). If Spearman’s correlation is strong but Pearson’s is weak, the relationship might be **non-linear**. |

### **Comparison Cases:**
- If **both Pearson and Spearman correlations are negative and similar**, it suggests that **exercise reduces BMI in a linear way**.
- If **Pearson’s correlation is weak but Spearman’s is strong**, the relationship is likely **non-linear but monotonic** (e.g., BMI drops initially with more exercise but plateaus after a certain point).
- If **Spearman’s correlation is close to zero**, there is **no consistent trend**.

### **Conclusion**
Comparing Pearson and Spearman’s correlation helps determine whether the relationship is **strictly linear** or **monotonic but non-linear**, influencing how we model exercise's impact on BMI.


#Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

# **Pearson Correlation Coefficient for TV Watching Hours and Physical Activity**

## **Step 1: Understanding Pearson Correlation**
The **Pearson correlation coefficient (r)** measures the **linear relationship** between two continuous variables. In this case:
- **X (Independent Variable)** = Number of hours spent watching TV per day.
- **Y (Dependent Variable)** = Level of physical activity (e.g., hours of exercise per day).

The correlation coefficient (\( r \)) is computed as:

\[
r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2} \cdot \sqrt{\sum (Y_i - \bar{Y})^2}}
\]

Where:
- \( X_i \) and \( Y_i \) are the individual data points.
- \( \bar{X} \) and \( \bar{Y} \) are the means of the respective variables.

---

## **Step 2: Expected Interpretation**
- **\( r \approx -1 \)** → Strong negative correlation: More TV watching strongly **reduces** physical activity.
- **\( r \approx 0 \)** → No correlation: TV watching has **no relationship** with physical activity.
- **\( r \approx 1 \)** → Strong positive correlation: More TV watching **increases** physical activity (unlikely in this case).

---

## **Step 3: Potential Findings**
Based on prior research, we might expect a **negative correlation** because:
- More time spent watching TV likely means **less time for physical activity**.
- A strong negative \( r \) value (e.g., \( r < -0.5 \)) would indicate that increased TV time **significantly reduces** exercise levels.

---

## **Step 4: Conclusion**
Calculating Pearson’s correlation will help the researcher understand whether reducing screen time could be an effective strategy to **increase** physical activity levels. If a strong negative correlation exists, interventions could focus on **limiting TV time** to encourage a more active lifestyle.






#Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:


| Age (Years) | Soft Drink Preference |
|-------------|-----------------------|
| 25          | Coke                  |
| 42          | Pepsi                 |
| 37          | Mountain Dew          |
| 19          | Coke                  |
| 31          | Pepsi                 |
| 28          | Coke                  |



# **Relationship Between Age and Soft Drink Preference**

### **Step 1: Understanding the Variables**
The dataset consists of two variables:
- **Age (Years)**: Numerical variable representing the age of the survey participants.
- **Soft Drink Preference**: Categorical variable representing the preference for a particular brand of soft drink.

The values for the Soft Drink Preference are categorical (Coke, Pepsi, Mountain Dew). Since we cannot directly calculate correlation between a numerical and categorical variable, we need to convert the categorical variable into a numerical form.

---

### **Step 2: Encoding the Soft Drink Preference**
To calculate any meaningful correlation, we need to **encode** the soft drink preferences. We can use **label encoding** to convert the categorical values into numerical values. Here's how we can encode the preferences:

| Soft Drink Preference | Encoded Value |
|-----------------------|---------------|
| Coke                  | 1             |
| Pepsi                 | 2             |
| Mountain Dew          | 3             |

After encoding, the dataset looks like this:

| Age (Years) | Soft Drink Preference (Encoded) |
|-------------|----------------------------------|
| 25          | 1 (Coke)                        |
| 42          | 2 (Pepsi)                       |
| 37          | 3 (Mountain Dew)                |
| 19          | 1 (Coke)                        |
| 31          | 2 (Pepsi)                       |
| 28          | 1 (Coke)                        |

---

### **Step 3: Calculating Pearson's Correlation**
Now that we have numerical representations for both variables, we can calculate **Pearson’s correlation coefficient** to measure the **linear relationship** between **Age** and **Soft Drink Preference**.

The Pearson correlation coefficient is given by:

\[
r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2} \cdot \sqrt{\sum (Y_i - \bar{Y})^2}}
\]

Where:
- \( X \) = Age (numerical values).
- \( Y \) = Encoded Soft Drink Preference (numerical values).
- \( \bar{X} \) and \( \bar{Y} \) are the means of **Age** and **Encoded Soft Drink Preference** respectively.

---

### **Step 4: Interpreting the Result**
- A **positive Pearson correlation** would indicate that **older individuals prefer** a specific type of soft drink (e.g., Mountain Dew).
- A **negative Pearson correlation** would suggest that **younger individuals prefer** a different type of soft drink (e.g., Coke).
- A **zero or near-zero correlation** would imply there is **no significant relationship** between age and soft drink preference.

---

### **Step 5: Conclusion**
After calculating Pearson's correlation coefficient, we can determine whether age influences soft drink preference in the sample surveyed. However, due to the small dataset, the results may not be highly reliable. A larger dataset would be more effective for drawing meaningful conclusions.


#Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

## Pearson Correlation Coefficient Calculation

To calculate the **Pearson correlation coefficient** between the number of sales calls made per day and the number of sales made per week, follow these steps:

1. **Obtain the Data**: Collect data on the number of sales calls made per day and the number of sales made per week for 30 sales representatives.

2. **Use the Pearson Correlation Formula**:

   \[
   r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2} \cdot \sqrt{\sum (Y_i - \bar{Y})^2}}
   \]

   Where:
   - \( X_i \) = Sales calls per day for the \(i\)-th representative.
   - \( Y_i \) = Sales made per week for the \(i\)-th representative.
   - \( \bar{X} \) = Mean of sales calls per day.
   - \( \bar{Y} \) = Mean of sales made per week.

3. **Interpret the Result**:
   - **r = 1**: Perfect positive correlation (as sales calls increase, sales also increase).
   - **r = -1**: Perfect negative correlation (as sales calls increase, sales decrease).
   - **r = 0**: No linear relationship between the two variables.

  
