## Q1. Difference between Ordinal Encoding and Label Encoding

🔹 **Ordinal Encoding**: Used when categorical features have a clear **order** or **ranking**.

🔹 **Label Encoding**: Used for **nominal (unordered)** categorical variables. However, it also assigns integers but **implies an order**, which might mislead ML models.

### Example:
```python
# Ordinal Example
education_levels = ['High School', 'Bachelors', 'Masters', 'PhD']
ordinal_mapping = {'High School': 0, 'Bachelors': 1, 'Masters': 2, 'PhD': 3}
```

- Use **Ordinal Encoding** when dealing with ordered features like `Education Level`.
- Use **Label Encoding** for `Gender` (Male/Female), where there's no order.

---

## Q2. Target Guided Ordinal Encoding

**Definition**: Categories are replaced with the **mean of the target variable** for each category.

### Use case:
In churn prediction, we might encode `Contract` types based on average churn rate.

### Example:
```python
import pandas as pd

df = pd.DataFrame({
    'Contract': ['Month-to-month', 'One year', 'Two year', 'Month-to-month', 'One year'],
    'Churn': [1, 0, 0, 1, 0]
})

# Calculate mean churn for each contract type
mean_churn = df.groupby('Contract')['Churn'].mean().sort_values()
encoding = {key: idx for idx, key in enumerate(mean_churn.index)}
df['Contract_encoded'] = df['Contract'].map(encoding)

print(df)
```

---

## Q3. Covariance – Definition & Formula

**Covariance** measures how two variables change together.

- **Positive covariance**: variables increase together
- **Negative covariance**: one increases, the other decreases

### Formula:
\[
\text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})
\]

---

## Q4. Label Encoding of: Color, Size, Material

```python
import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.DataFrame({
    'Color': ['Red', 'Green', 'Blue', 'Green'],
    'Size': ['Small', 'Medium', 'Large', 'Medium'],
    'Material': ['Wood', 'Metal', 'Plastic', 'Wood']
})

le = LabelEncoder()

for col in df.columns:
    df[col + '_encoded'] = le.fit_transform(df[col])

print(df)
```

### Output Explanation:
Each categorical value is mapped to an integer:
- 'Red' → 2, 'Green' → 1, 'Blue' → 0
- 'Small' → 2, 'Medium' → 1, 'Large' → 0
- 'Wood' → 2, 'Metal' → 1, 'Plastic' → 0

---

## Q5. Covariance Matrix for Age, Income, Education

```python
import numpy as np
import pandas as pd

data = {
    'Age': [25, 45, 35, 50, 23],
    'Income': [50000, 80000, 60000, 90000, 48000],
    'Education': [12, 16, 14, 18, 12]  # years
}
df = pd.DataFrame(data)

# Covariance matrix
cov_matrix = df.cov()
print(cov_matrix)
```

### Interpretation:
- Positive values → variables increase together
- High covariance between `Income` and `Education` implies more education leads to higher income

---

## Q6. Encoding Method for Gender, Education, Employment

| Variable          | Type        | Recommended Encoding         | Reason                                          |
|------------------|-------------|------------------------------|-------------------------------------------------|
| Gender           | Nominal     | One-Hot Encoding              | Only 2 categories, no order                     |
| Education Level  | Ordinal     | Ordinal Encoding              | Has meaningful order (High School < PhD)        |
| Employment Status| Nominal     | One-Hot Encoding              | No order; multiple classes                      |

---

## Q7. Covariance Between Mixed Variable Types

Only **numerical variables** (e.g., Temperature & Humidity) can be used directly for covariance.

For `Weather Condition` and `Wind Direction`, you need to **encode** them first (One-Hot or Label Encoding), then compute covariance.

```python
data = pd.DataFrame({
    'Temperature': [30, 25, 28, 35, 22],
    'Humidity': [80, 70, 75, 85, 65],
    'Weather': ['Sunny', 'Cloudy', 'Rainy', 'Sunny', 'Rainy'],
    'Wind': ['North', 'South', 'East', 'West', 'North']
})

# Label Encoding for simplicity
data['Weather_enc'] = LabelEncoder().fit_transform(data['Weather'])
data['Wind_enc'] = LabelEncoder().fit_transform(data['Wind'])

# Covariance matrix
cov_matrix = data[['Temperature', 'Humidity', 'Weather_enc', 'Wind_enc']].cov()
print(cov_matrix)
```

### Interpretation:
- Covariance between Temp and Humidity shows weather impact
- Caution: Covariance involving encoded categories might be misleading if encoding implies false order
