# Encoding data with labels

Prompt: Write a code to encode all the columns in the dataset called train.csv. Use label encoding method.

To perform label encoding on all the categorical columns in a dataset, you can use the LabelEncoder from the sklearn.preprocessing module. Here's a code snippet that demonstrates how to do this for a CSV file named train.csv:

In [1]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Load the dataset
df = pd.read_csv('train.csv', sep=";")

# Initialize a label encoder for each categorical column
label_encoders = {}
for column in df.columns:
    if df[column].dtype == 'object':  # If the column is categorical
        label_encoders[column] = LabelEncoder()
        df[column] = label_encoders[column].fit_transform(df[column])

# Save the label encoded DataFrame to a new CSV file
df.to_csv('train_label_encoded.csv', index=False)

print("All categorical columns have been label encoded and saved to 'train_label_encoded.csv'.")



All categorical columns have been label encoded and saved to 'trainCSET_label_encoded.csv'.


This script reads your dataset from train.csv, iterates over all columns, checks if they are categorical (of object type), and applies label encoding to them. The encoded DataFrame is then saved back to a new CSV file called train_label_encoded.csv.

Remember that label encoding assigns a unique integer to each category within a column, which might not be appropriate for all machine learning models, especially those that assume numerical order unless the categorical data is ordinal. For nominal data, you might want to use one-hot encoding instead.