Q1. What is the difference between Ordinal Encoding and Label Encoding? Provide an example of when you
might choose one over the other.
Q2. Explain how Target Guided Ordinal Encoding works and provide an example of when you might use it in
a machine learning project.
Q3. Define covariance and explain why it is important in statistical analysis. How is covariance calculated?
Q4. For a dataset with the following categorical variables: Color (red, green, blue), Size (small, medium,
large), and Material (wood, metal, plastic), perform label encoding using Python's scikit-learn library.
Show your code and explain the output.
Q5. Calculate the covariance matrix for the following variables in a dataset: Age, Income, and Education
level. Interpret the results.
Q6. You are working on a machine learning project with a dataset containing several categorical
variables, including "Gender" (Male/Female), "Education Level" (High School/Bachelor's/Master's/PhD),
and "Employment Status" (Unemployed/Part-Time/Full-Time). Which encoding method would you use for
each variable, and why?
Q7. You are analyzing a dataset with two continuous variables, "Temperature" and "Humidity", and two
categorical variables, "Weather Condition" (Sunny/Cloudy/Rainy) and "Wind Direction" (North/South/
East/West). Calculate the covariance between each pair of variables and interpret the results.

**Q1:**  
Ordinal Encoding assigns a unique integer to each category while maintaining order, whereas Label Encoding assigns arbitrary integers without considering order. For example, if encoding **education levels** ("High School," "Bachelor’s," "Master’s," "PhD"), **Ordinal Encoding (0, 1, 2, 3)** is preferred because education follows a natural ranking. Label Encoding is better for unordered categories like "Car Brand" (Toyota, Ford, BMW).  

**Q2:**  
Target Guided Ordinal Encoding assigns ordinal values based on the mean or median of the target variable. For example, in predicting **house prices**, if "Neighborhood" is categorical, it can be encoded based on the average house price in each neighborhood. This helps capture the relationship between categories and the target variable.  

**Q3:**  
Covariance measures the direction of the linear relationship between two variables. A **positive covariance** indicates both variables increase together, while a **negative covariance** means one increases while the other decreases. It is calculated as:  
\[
\text{Cov}(X, Y) = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{n-1}
\]  
Covariance is useful in feature selection but does not measure correlation strength.  

**Q4:**  


In [1]:

from sklearn.preprocessing import LabelEncoder
import pandas as pd

data = pd.DataFrame({'Color': ['Red', 'Green', 'Blue'],
                     'Size': ['Small', 'Medium', 'Large'],
                     'Material': ['Wood', 'Metal', 'Plastic']})

encoder = LabelEncoder()
data_encoded = data.apply(encoder.fit_transform)
print(data_encoded)


   Color  Size  Material
0      2     2         2
1      1     1         0
2      0     0         1




**Q5:**  
To calculate the **covariance matrix** for **Age, Income, and Education Level**, you use:  
\[
\text{Covariance Matrix} = \frac{1}{n-1} (X - \bar{X})^T (Y - \bar{Y})
\]  
A **positive value** means variables increase together (e.g., Age & Income), while a **negative value** means an inverse relationship (e.g., Education & Age).  

**Q6:**  
- **Gender (Binary Category):** Use **binary encoding** (0 for Male, 1 for Female).  
- **Education Level (Ordered Category):** Use **Ordinal Encoding** ("High School" = 0, "Bachelor’s" = 1, etc.).  
- **Employment Status (Nominal Category):** Use **One-Hot Encoding** ("Unemployed" → [1,0,0], "Part-Time" → [0,1,0], etc.).  

**Q7:**  
To compute covariance between **Temperature & Humidity** (continuous variables), use:  
\[
\text{Cov}(Temp, Humidity) = \frac{\sum (Temp_i - \bar{Temp})(Humidity_i - \bar{Humidity})}{n-1}
\]  
For categorical variables **Weather Condition & Wind Direction**, use **label encoding** before computing covariance. If covariance between **Temperature & Humidity** is **positive**, higher temperatures correlate with higher humidity. If **negative**, one increases while the other decreases.