Q1. What is the difference between Ordinal Encoding and Label Encoding? Provide an example of when you
might choose one over the other.
Q2. Explain how Target Guided Ordinal Encoding works and provide an example of when you might use it in
a machine learning project.
Q3. Define covariance and explain why it is important in statistical analysis. How is covariance calculated?
Q4. For a dataset with the following categorical variables: Color (red, green, blue), Size (small, medium,
large), and Material (wood, metal, plastic), perform label encoding using Python's scikit-learn library.
Show your code and explain the output.
Q5. Calculate the covariance matrix for the following variables in a dataset: Age, Income, and Education
level. Interpret the results.
Q6. You are working on a machine learning project with a dataset containing several categorical
variables, including "Gender" (Male/Female), "Education Level" (High School/Bachelor's/Master's/PhD),
and "Employment Status" (Unemployed/Part-Time/Full-Time). Which encoding method would you use for
each variable, and why?
Q7. You are analyzing a dataset with two continuous variables, "Temperature" and "Humidity", and two
categorical variables, "Weather Condition" (Sunny/Cloudy/Rainy) and "Wind Direction" (North/South/
East/West). Calculate the covariance between each pair of variables and interpret the results.

Q1. The difference between Ordinal Encoding and Label Encoding lies in the nature of the categorical variable being encoded. 

- Ordinal Encoding: Assigns numerical labels to categories based on their order or rank. This implies that there's a meaningful order among the categories.
- Label Encoding: Assigns unique numerical labels to each category without considering any order or rank among them.

Example:
Suppose you have a categorical variable "Education Level" with categories "High School," "Bachelor's," "Master's," and "PhD." If there's a clear order among these categories, like "High School" < "Bachelor's" < "Master's" < "PhD," you would use Ordinal Encoding. If there's no such order, you would use Label Encoding.

Q2. Target Guided Ordinal Encoding is a technique where categories are encoded based on the target variable's mean or median value for each category. It's useful when you have a categorical variable with a high cardinality (many unique categories) and want to capture the relationship between the categorical variable and the target variable.

Example:
In a machine learning project predicting customer churn, you might use Target Guided Ordinal Encoding to encode customer segments based on their average churn rate. This would create ordinal encoding where categories with higher churn rates get higher numerical labels, capturing the relationship between customer segments and churn.

Q3. Covariance measures the degree to which two variables change together. It's important in statistical analysis because it helps understand the relationship between variables. A positive covariance indicates that the variables tend to move in the same direction, while a negative covariance indicates that they move in opposite directions. Covariance is calculated using the formula:

```python
import numpy as np

# Sample dataset
data = np.array([
    [30, 50000, 12],
    [35, 60000, 16],
    [40, 70000, 18],
    [45, 80000, 20],
    [50, 90000, 22]
])

# Calculate covariance matrix
covariance_matrix = np.cov(data, rowvar=False)

print("Covariance Matrix:")
print(covariance_matrix)
'''

Q4. Performing label encoding using scikit-learn:

```python
from sklearn.preprocessing import LabelEncoder

# Sample data
data = {
    'Color': ['red', 'green', 'blue'],
    'Size': ['small', 'medium', 'large'],
    'Material': ['wood', 'metal', 'plastic']
}

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Apply label encoding to each column
encoded_data = {}
for column in data:
    encoded_data[column] = label_encoder.fit_transform(data[column])

print(encoded_data)
```

Output:
```
{'Color': array([2, 1, 0]), 'Size': array([2, 1, 0]), 'Material': array([2, 1, 0])}
```

Explanation:
Label encoding assigns a numerical label to each category in each column. For example, in the "Color" column, "red" is encoded as 2, "green" as 1, and "blue" as 0.

Q5. To calculate the covariance matrix for the variables Age, Income, and Education level, you would compute the pairwise covariances between these variables. The covariance matrix would be a 3x3 matrix where each element (i, j) represents the covariance between variables i and j.

Interpretation:
- Positive covariance between Age and Income would indicate that as age increases, income tends to increase.
- Covariance between Age and Education level might indicate how education level and age relate to each other, but interpretation would depend on the context and specific dataset.

Q6. Encoding method selection for each variable:
- "Gender": Binary categorical variable (two categories), suitable for Label Encoding.
- "Education Level": Ordinal variable with inherent order, suitable for Ordinal Encoding.
- "Employment Status": Nominal variable without a natural order, also suitable for Label Encoding.

Q7. To calculate the covariance between each pair of variables, you would use the covariance formula for continuous variables and interpret the results accordingly. For categorical variables, you could calculate the covariance using appropriate methods like Cramer's V.

Interpretation:
- Positive covariance between Temperature and Humidity indicates that as temperature increases, humidity tends to increase.
- Covariance between Weather Condition and Wind Direction might indicate how these variables relate to each other, but interpretation would depend on the context and specific dataset.