# Q1. What is Data Encoding?
Data Encoding: The process of converting categorical data into a numerical format so that machine learning algorithms can process it. Categorical data, which includes nominal (no order) and ordinal (ordered) data, must be transformed into numerical values because most algorithms require numerical input.

Usefulness in Data Science:

Compatibility: Ensures that data is in a format compatible with machine learning algorithms.
Improves Model Performance: Proper encoding can enhance the performance and accuracy of models.
Handles Categorical Data: Converts categories into a form that preserves the meaning and relationships within the data.

# Q2. What is Nominal Encoding?
Nominal Encoding: A method of converting categorical data with no inherent order (nominal data) into numerical values. Each category is assigned a unique integer.

In [1]:
#example

import pandas as pd
from sklearn.preprocessing import LabelEncoder

data = pd.DataFrame({'Car Brand': ['Toyota', 'Ford', 'BMW', 'Toyota', 'BMW', 'Ford']})
encoder = LabelEncoder()
data['Car Brand Encoded'] = encoder.fit_transform(data['Car Brand'])
print(data)


  Car Brand  Car Brand Encoded
0    Toyota                  2
1      Ford                  1
2       BMW                  0
3    Toyota                  2
4       BMW                  0
5      Ford                  1


# Q3. When is Nominal Encoding Preferred Over One-Hot Encoding?
Situations:

When the Categorical Variable has Many Unique Values: Nominal encoding is more space-efficient compared to one-hot encoding.
When Ordinal Relationship is Irrelevant: Nominal encoding can be more practical for categories without any order or hierarchy.

# Q4. Encoding Categorical Data with 5 Unique Values
Choice of Encoding Technique: One-Hot Encoding

In [2]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Sample data
data = pd.DataFrame({'Category': ['A', 'B', 'C', 'D', 'E']})

# One-Hot Encoding
encoder = OneHotEncoder(sparse=False)
encoded_data = encoder.fit_transform(data[['Category']])
print(encoded_data)


[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]




# Q5. Nominal Encoding with 1000 Rows and 5 Columns (2 Categorical)
Columns and Calculations:

Assume 2 categorical columns: Cat1 with 4 unique values and Cat2 with 3 unique values.
Encoding:

Nominal Encoding: Each unique value in a categorical column is assigned a unique integer.
Number of New Columns: Nominal encoding does not increase the number of columns; it transforms existing columns into encoded form.

# Q6. Encoding Technique for Animal Dataset

In [3]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Sample data
data = pd.DataFrame({
    'Species': ['Lion', 'Tiger', 'Elephant', 'Tiger'],
    'Habitat': ['Savanna', 'Forest', 'Savanna', 'Forest'],
    'Diet': ['Carnivore', 'Carnivore', 'Herbivore', 'Carnivore']
})

# One-Hot Encoding
encoder = OneHotEncoder(sparse=False)
encoded_data = encoder.fit_transform(data)
print(encoded_data)


[[0. 1. 0. 0. 1. 1. 0.]
 [0. 0. 1. 1. 0. 1. 0.]
 [1. 0. 0. 0. 1. 0. 1.]
 [0. 0. 1. 1. 0. 1. 0.]]




# Q7. Encoding Categorical Data for Customer Churn Prediction
Features: gender, age, contract type, monthly charges, tenure.

In [5]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
import numpy as np
# Sample data
data = pd.DataFrame({
    'gender': ['Male', 'Female', 'Female', 'Male'],
    'age': [23, 45, 31, 22],
    'contract_type': ['Month-to-month', 'One year', 'Two year', 'Month-to-month'],
    'monthly_charges': [70.5, 88.3, 52.0, 60.1],
    'tenure': [5, 20, 15, 10]
})

# One-Hot Encoding for categorical features
encoder = OneHotEncoder(sparse=False)
categorical_data = data[['gender', 'contract_type']]
encoded_categorical_data = encoder.fit_transform(categorical_data)

# Combine encoded categorical data with numerical data
numerical_data = data[['age', 'monthly_charges', 'tenure']].values
processed_data = np.hstack((encoded_categorical_data, numerical_data))

print(processed_data)


[[ 0.   1.   1.   0.   0.  23.  70.5  5. ]
 [ 1.   0.   0.   1.   0.  45.  88.3 20. ]
 [ 1.   0.   0.   0.   1.  31.  52.  15. ]
 [ 0.   1.   1.   0.   0.  22.  60.1 10. ]]


