<a href="https://colab.research.google.com/github/indranil046/4-febasian/blob/main/Feature_Engineering_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
1:Data encoding refers to the process of converting data from its original
format (often categorical data or non-numeric data) into a format that can be
effectively used by machine learning algorithms, which typically require
numerical input.

2:Nominal encoding refers to the encoding technique used for categorical data
where the categories have no inherent order or ranking. The categories are
nominal (i.e., they are labels or names) and can only be identified or classified
, without any meaningful order or distance between them.

For example, consider a feature like "City," where the categories could be New
York, Los Angeles, Chicago, etc. These cities are just labels with no specific
order. Nominal encoding is used to transform these categorical labels into a
format suitable for machine learning algorithms.

3:In the context of data preprocessing, nominal encoding
 (specifically, label encoding) is generally preferred over one-hot encoding
 in situations where the categorical variable has a natural order or when
 there are practical concerns about memory and computational efficiency.
 Label encoding is most useful in cases where one-hot encoding would result
 in excessive feature expansion, leading to issues like high-dimensionality
 or sparse matrices.

4:In this scenario, there are several encoding techniques you can use to
transform categorical data into a numerical format suitable for machine
learning algorithms. The choice of technique depends on the nature of the
categorical data, its relationship with the target variable, and the type
of machine learning algorithm you're using.

5:To solve this, we'll assume that nominal encoding refers to one-hot encoding,
which is the most common form of encoding for nominal categorical data.
The goal is to determine how many new columns will be created when the
categorical variables are transformed.
In this example, 7 new columns would be created by nominal (one-hot)
encoding the two categorical columns.

6:In this scenario, the dataset contains categorical features such as species,
habitat, and diet, and the goal is to transform these features into a format
 that machine learning algorithms can understand. Let's analyze the best
 encoding techniques for each of these categorical columns.

7:In this scenario, we have a dataset containing the following features:

Gender: Categorical (Nominal)

Age: Numerical (Continuous)

Contract Type: Categorical (Nominal/Ordinal)

Monthly Charges: Numerical (Continuous)

Tenure: Numerical (Continuous)

import pandas as pd
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder

# Sample dataset
data = {'Customer': [1, 2, 3],
        'Gender': ['Male', 'Female', 'Male'],
        'Age': [34, 45, 23],
        'Contract Type': ['Month-to-Month', 'One-Year', 'Two-Year'],
        'Monthly Charges': [50, 75, 60],
        'Tenure': [12, 24, 36]}

df = pd.DataFrame(data)

# 1. One-Hot Encoding for Gender
gender_encoder = OneHotEncoder(drop='first', sparse=False)
gender_encoded = gender_encoder.fit_transform(df[['Gender']])
gender_encoded_df = pd.DataFrame(gender_encoded, columns=gender_encoder.get_feature_names_out(['Gender']))

# 2. Ordinal Encoding for Contract Type
contract_encoder = OrdinalEncoder(categories=[['Month-to-Month', 'One-Year', 'Two-Year']])
df['Contract Type'] = contract_encoder.fit_transform(df[['Contract Type']])

# 3. Concatenate the encoded columns with the original DataFrame
df_encoded = pd.concat([df, gender_encoded_df], axis=1)
df_encoded = df_encoded.drop('Gender', axis=1)  # Drop the original 'Gender' column

# Show the result
print(df_encoded)
In this case, One-Hot Encoding was used for the Gender column, and Ordinal
Encoding was applied to the Contract Type column, while the numerical columns
 (Age, Monthly Charges, and Tenure) did not require any encoding. This
 preprocessing step ensures the dataset is ready for machine learning algorithms