### **Decoding/Inverse Encoding**

1. **Definition**:  
   Decoding (or inverse encoding) is the process of converting encoded categorical or numerical values back to their original labels or categories. This is often used to make machine learning models' outputs interpretable.

2. **Purpose**:  
   - To **interpret model predictions** by converting encoded values back to their original labels.  
   - To **reverse preprocessing steps** for better visualization or analysis of results.  
   - To **maintain consistency** between original data and predicted outputs.

3. **Use Cases**:  
   - Converting encoded categorical values (e.g., `0`, `1`, `2`) back to their original labels (e.g., `Male`, `Female`, `Other`).  
   - Translating predicted numerical classes into human-readable categories.  
   - Converting probabilities, classes, or numerical outputs into interpretable forms.

4. **Methods to Decode/Inverse Encode**:  
   - **Using LabelEncoder**:  
     - Use the `inverse_transform()` method to map encoded values back to their original labels.  
   - **Using Mapping Dictionaries**:  
     - Predefine dictionaries that map encoded values to original categories for decoding.  
   - **One-Hot Encoding Decoding**:  
     - Convert one-hot encoded vectors back to their original categories by identifying the index of the maximum value.

5. **Best Practices**:  
   - Always store the mapping (e.g., `LabelEncoder` instance or dictionary) used during encoding for consistent decoding.  
   - Handle cases where encoded values may not match the original mapping (e.g., unseen categories).  
   - Verify decoded outputs by comparing them with the original data to ensure accuracy.

6. **Common Challenges**:  
   - Missing or mismatched mappings can lead to errors during decoding.  
   - Some encoders (e.g., `OneHotEncoder`) might require additional steps to match the one-hot encoded output to original labels.  
   - If a model predicts values outside the range of encoded classes, interpretation can become ambiguous.

---

### **Conclusion**  
Decoding or inverse encoding is a crucial step in machine learning workflows to ensure predictions and outputs are interpretable. By translating encoded values back to their original form, we can effectively communicate results, validate predictions, and make the insights actionable for stakeholders.

In [2]:
# import libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder , OneHotEncoder , OrdinalEncoder


In [3]:
# load titanic dataset from sns
df = sns.load_dataset('titanic')
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [4]:
# impute missing values and remove deck column
df = df.drop(['deck' ] , axis=1)
df['age'] = df['age'].fillna(df['age'].median())
df['fare'] = df['fare'].fillna(df['fare'].median())
df['embark_town'] = df['embark_town'].fillna(df['embark_town'].mode()[0])
df['embarked'] = df['embarked'].fillna(df['embarked'].mode()[0])

### Manual Encoding with LabelEncoder

In [5]:
# encode all categorical variables and object variables
le_sex = LabelEncoder()
df['sex'] = le_sex.fit_transform(df['sex'])
le_embarked = LabelEncoder()
df['embarked'] = le_embarked.fit_transform(df['embarked'])
le_who = LabelEncoder()
df['who'] = le_who.fit_transform(df['who'])
le_class = LabelEncoder()
df['class'] = le_class.fit_transform(df['class'])
le_aliv = LabelEncoder()
df['alive'] = le_aliv.fit_transform(df['alive'])
le_embark_town = LabelEncoder()
df['embark_town'] = le_embark_town.fit_transform(df['embark_town'])
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,embark_town,alive,alone
0,0,3,1,22.0,1,0,7.25,2,2,1,True,2,0,False
1,1,1,0,38.0,1,0,71.2833,0,0,2,False,0,1,False
2,1,3,0,26.0,0,0,7.925,2,2,2,False,2,1,True
3,1,1,0,35.0,1,0,53.1,2,0,2,False,2,1,False
4,0,3,1,35.0,0,0,8.05,2,2,1,True,2,0,True


In [6]:
# decode the data
df['sex'] = le_sex.inverse_transform(df['sex'])
df['embarked'] = le_embarked.inverse_transform(df['embarked'])
df['who'] = le_who.inverse_transform(df['who'])
df['class'] = le_class.inverse_transform(df['class'])
df['alive'] = le_aliv.inverse_transform(df['alive'])    
df['embark_town'] = le_embark_town.inverse_transform(df['embark_town'])
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,Southampton,no,True


### Using loop to encode the data with LabelEncoder

In [13]:
df = sns.load_dataset('titanic')
df.head(3)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True


In [14]:
# encode all categorical columns using label encoding in loop
label_encoders = {}
for col in df.select_dtypes(include=['object'] or ['category']).columns:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le

df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,1,22.0,1,0,7.25,2,Third,1,True,,2,0,False
1,1,1,0,38.0,1,0,71.2833,0,First,2,False,C,0,1,False
2,1,3,0,26.0,0,0,7.925,2,Third,2,False,,2,1,True
3,1,1,0,35.0,1,0,53.1,2,First,2,False,C,2,1,False
4,0,3,1,35.0,0,0,8.05,2,Third,1,True,,2,0,True


In [15]:
# lets inverse encode the data
for col , le in label_encoders.items():
    df[col] = le.inverse_transform(df[col])
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
