#QNO.1 ANS
 Ordinal Encoding and Label Encoding are both techniques used to encode categorical variables into numerical values, but there is a difference between them.

Label Encoding, also known as nominal encoding, assigns a unique numerical value to each category in a categorical variable without considering any order or hierarchy between the categories. For example, "red" might be encoded as 0, "green" as 1, and "blue" as 2. This encoding is suitable for variables where there is no inherent order or ranking among the categories.

Ordinal Encoding, on the other hand, assigns numerical values to categories based on their relative order or ranking. Each category is assigned a value that represents its position in the order. For example, "small" might be encoded as 0, "medium" as 1, and "large" as 2. This encoding is appropriate when there is a meaningful order or ranking among the categories.

The choice between ordinal encoding and label encoding depends on the nature of the categorical variable and the relationship between its categories. If there is a clear order or ranking among the categories, ordinal encoding should be used. If the categories are unordered or the order is not meaningful, label encoding can be applied.

#QNO.2 ANS
 Target Guided Ordinal Encoding is a technique where the categories of a categorical variable are encoded with values based on the target variable's mean or median value for each category. It can be useful when there is a significant relationship between the categorical variable and the target variable.

For example, let's say you are working on a credit risk prediction project, and one of the categorical variables is "Education Level" with categories "High School," "Bachelor's," "Master's," and "PhD." You observe that the default rate increases with the level of education, with PhD holders having the lowest default rate. In this case, you can apply Target Guided Ordinal Encoding to assign higher values to categories with lower default rates and lower values to categories with higher default rates. This way, the encoded variable captures the relationship between education level and default risk.

#QNO.3 ANS
 Covariance measures the relationship between two variables and indicates how changes in one variable correspond to changes in another variable. It is important in statistical analysis because it helps in understanding the direction and strength of the linear relationship between variables.

Covariance is calculated using the following formula:
Cov(X, Y) = Σ[(X - μX)(Y - μY)] / (n - 1)

Where X and Y are the variables, μX and μY are the means of X and Y, and n is the number of observations.

The resulting covariance value can be positive, indicating a positive relationship (both variables increase or decrease together), negative, indicating a negative relationship (one variable increases while the other decreases), or zero, indicating no linear relationship.

In [4]:
#QNO.4 ANS
from sklearn.preprocessing import LabelEncoder
import numpy as np

data = np.array([['red', 'small', 'wood'],
                ['blue', 'medium', 'plastic'],
                ['green', 'large', 'metal'],
                ['red', 'medium', 'plastic']])


label_encoder = LabelEncoder()

# Apply label encoding to each column
for column in range(data.shape[1]):
    data[:, column] = label_encoder.fit_transform(data[:, column])

# Print the encoded data
print(data)


[['2' '2' '2']
 ['0' '1' '1']
 ['1' '0' '0']
 ['2' '1' '1']]


#QNO.5 ANS

In [6]:
from sklearn.preprocessing import LabelEncoder
import numpy as np

data = np.array([['red', 'small', 'wood'],
                ['blue', 'medium', 'plastic'],
                ['green', 'large', 'metal'],
                ['red', 'medium', 'plastic']])

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Apply label encoding to each column
for column in range(data.shape[1]):
    data[:, column] = label_encoder.fit_transform(data[:, column])

# Print the encoded data
print(data)


[['2' '2' '2']
 ['0' '1' '1']
 ['1' '0' '0']
 ['2' '1' '1']]


#QNO.6 ANS
 For the categorical variables "Gender," "Education Level," and "Employment Status," the choice of encoding method depends on the nature of the variables and the machine learning algorithm you intend to use. Here are some suggestions:

Gender: Since "Gender" has only two categories ("Male" and "Female"), you can use nominal encoding or label encoding. Assigning 0 for "Male" and 1 for "Female" using label encoding would be a simple approach.

Education Level: This variable has multiple categories with no inherent order. One-hot encoding would be suitable here. Each category (e.g., "High School," "Bachelor's," "Master's," "PhD") would be represented by a separate binary feature.

Employment Status: Similar to "Education Level," one-hot encoding would be appropriate for "Employment Status" since there is no ordinal relationship between the categories ("Unemployed," "Part-Time," "Full-Time"). Each category would be represented by a separate binary feature.

In [9]:
#QNO.7 ANS
import numpy as np

# Assuming you have a dataset with variables: Temperature, Humidity, Weather Condition, and Wind Direction
dataset = np.array([[25.5, 60, 'Sunny', 'North'],
                    [28.2, 55, 'Cloudy', 'South'],
                    [30.1, 70, 'Rainy', 'East'],
                    [27.8, 65, 'Cloudy', 'West']])

# Calculate the covariance matrix
covariance_matrix = np.cov(dataset[:, :2].astype(float), rowvar=False)

# Print the covariance matrix
print(covariance_matrix)


[[ 3.56666667  6.66666667]
 [ 6.66666667 41.66666667]]
