## Q(1)

Ordinal Encoding and Label Encoding are both techniques used to convert categorical data into numerical format, but they are applied in different contexts.

Use Ordinal Encoding:

When the order of categories carries information relevant to your model.
For example, predicting student performance using ordinal encoding for education levels might be more accurate than using label encoding.

Use Label Encoding:

When the order of categories doesn't matter for your model.
For example, classifying fruit images might not be affected by whether "apple" is encoded as 1 and "banana" as 2 or vice versa.

Imagine we're building a model to predict customer churn based on their satisfaction levels (High, Medium, Low).

Ordinal Encoding: High = 1, Medium = 2, Low = 3. This encoding preserves the order of satisfaction and allows your model to learn that customers closer to "Low" are more likely to churn.
Label Encoding: High = 1, Medium = 2, Low = 3. This encoding treats the levels as unordered, and your model might not capture the relationship between satisfaction and churn effectively.
Choosing the right encoding technique depends on the specific characteristics of your data and the goal of your machine learning model.

## Q(2)

Target Guided Ordinal Encoding is a technique used to encode categorical variables based on the relationship between the categories and the target variable in a supervised machine learning setting. This method assigns ordinal ranks to categories based on the mean or median of the target variable for each category. Target Guided Ordinal Encoding is particularly useful when dealing with categorical features with high cardinality and when the categories have a meaningful relationship with the target variable.

Example:

Predicting customer credit risk based on city of residence

Categorical feature: City (London, Paris, New York, Berlin)
Target variable: Credit risk (high, medium, low)
TGE steps:

Calculate average credit risk for each city:
London: 0.4 (average risk)
Paris: 0.2 (low risk)
New York: 0.8 (high risk)
Berlin: 0.6 (medium risk)
Sort cities based on average risk: Paris, London, Berlin, New York
Assign ordinal values: Paris = 1, London = 2, Berlin = 3, New York = 4

## Q(3)

Covariance, in the realm of statistics, is a powerful tool for understanding the relationship between two variables. It quantifies how much, and in what direction, two variables tend to move together.

Think of it like this: Imagine variables as dancing partners. If they both sashay to the right when the music picks up, their covariance is positive. If one twirls left while the other dips right, their covariance is negative. And if they just shuffle independently, their covariance is zero, suggesting no coordinated movement.

Here's why covariance is so important:

Unveiling hidden relationships: It can reveal subtle connections between variables that might be missed by simpler analysis. For example, you might find that income and education have a positive covariance, but their correlation (a related measure) might be weaker due to non-linear relationships.
Building better models: Understanding how variables covary can inform the design of statistical models. In finance, knowing the covariance between stocks helps build portfolios that minimize risk.
Data exploration: Covariance can be a valuable tool for initial data exploration, helping you identify potential patterns and relationships to investigate further.
Calculating Covariance:

The formula for covariance involves some statistical magic, but the basic idea is straightforward:

Take the difference between each variable's value and its average. This removes the overall "bias" of the data and focuses on how individual pairs of points deviate from the norm.
Multiply those differences together. This captures the joint movement of the variables.
Average the multiplied differences across all data points. This gives you the final covariance value.

## Q(4)

In [1]:
from sklearn.preprocessing import LabelEncoder
import pandas as pd

In [2]:
data = {'Color': ['red', 'green', 'blue', 'red', 'blue'],
        'Size': ['small', 'medium', 'large', 'medium', 'small'],
        'Material': ['wood', 'metal', 'plastic', 'metal', 'wood']}

In [5]:
df = pd.DataFrame(data)

In [6]:
label_encoder = LabelEncoder()

In [13]:
df['Color_encoder']= label_encoder.fit_transform(df['Color'])
df['Size_Encoded'] = label_encoder.fit_transform(df['Size'])
df['Material_Encoded'] = label_encoder.fit_transform(df['Material'])

In [16]:
print(df[['Color', 'Color_Encoded', 'Size', 'Size_Encoded', 'Material', 'Material_Encoded']])

   Color  Color_Encoded    Size  Size_Encoded Material  Material_Encoded
0    red              2   small             2     wood                 2
1  green              1  medium             1    metal                 0
2   blue              0   large             0  plastic                 1
3    red              2  medium             1    metal                 0
4   blue              0   small             2     wood                 2


## Q(5)

In [17]:
import pandas as pd

In [18]:
data = {'Age': [25, 30, 35, 40, 45],
        'Income': [50000, 60000, 75000, 90000, 80000],
        'Education': [12, 16, 18, 20, 14]}


In [20]:
pd.DataFrame(data)

Unnamed: 0,Age,Income,Education
0,25,50000,12
1,30,60000,16
2,35,75000,18
3,40,90000,20
4,45,80000,14


In [25]:
df = pd.DataFrame(data)

In [26]:
covarience_matrix = df.cov()

In [27]:
print("Covariance Matrix:")
print(covariance_matrix)

Covariance Matrix:
                Age       Income  Education
Age            62.5     112500.0       10.0
Income     112500.0  255000000.0    37500.0
Education      10.0      37500.0       10.0


## Q(6)

In [31]:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, LabelEncoder

data = {'Gender': ['Male', 'Female', 'Male', 'Female'],
        'Education Level': ['High School', 'Bachelor\'s', 'Master\'s', 'PhD'],
        'Employment Status': ['Unemployed', 'Part-Time', 'Full-Time', 'Part-Time']}

df = pd.DataFrame(data)

label_encoder_gender = LabelEncoder()
df['Gender_Encoded'] = label_encoder_gender.fit_transform(df['Gender'])

ordinal_encoder_education = OrdinalEncoder(categories=[['High School', 'Bachelor\'s', 'Master\'s', 'PhD']])
df['Education Level_Encoded'] = ordinal_encoder_education.fit_transform(df[['Education Level']])

onehot_encoder_employment = OneHotEncoder(drop='first', sparse=False)
employment_status_encoded = pd.DataFrame(onehot_encoder_employment.fit_transform(df[['Employment Status']]),
                                         columns=['Part-Time', 'Full-Time'])
df = pd.concat([df, employment_status_encoded], axis=1)

print(df)


   Gender Education Level Employment Status  Gender_Encoded  \
0    Male     High School        Unemployed               1   
1  Female      Bachelor's         Part-Time               0   
2    Male        Master's         Full-Time               1   
3  Female             PhD         Part-Time               0   

   Education Level_Encoded  Part-Time  Full-Time  
0                      0.0        0.0        1.0  
1                      1.0        1.0        0.0  
2                      2.0        0.0        0.0  
3                      3.0        1.0        0.0  




## Q(7)

In [32]:
import pandas as pd


data = {'Temperature': [25, 20, 22, 28, 30],
        'Humidity': [60, 70, 65, 75, 80],
        'Weather Condition': ['Sunny', 'Cloudy', 'Rainy', 'Cloudy', 'Rainy'],
        'Wind Direction': ['North', 'South', 'East', 'West', 'North']}

df = pd.DataFrame(data)

covariance_matrix = df.cov()

print("Covariance Matrix:")
print(covariance_matrix)


Covariance Matrix:
             Temperature  Humidity
Temperature         17.0      20.0
Humidity            20.0      62.5


  covariance_matrix = df.cov()
