# Day 7: Data Pre­pro­cessing - En­cod­ing Cat­egor­ic­al Vari­ables

## Task: 
- En­co­de ca­te­go­ri­cal va­ria­bles into nu­me­ri­cal for­mat.

## De­scrip­tion: 
- Use tech­niques like one-hot en­cod­ing or la­bel en­cod­ing
to con­vert cat­e­gor­i­cal vari­ables into a for­mat suit­able for ma­chine learning mod­els. 

## What are Encoding Categorical Variables?

- In data preprocessing, encoding categorical variables is a crucial step to prepare the data for machine learning models. There are several techniques for encoding categorical variables into numerical formats, including one-hot encoding and label encoding.

## What is Label Encoding?

- Label encoding involves assigning a unique integer to each category in a categorical variable.

- It's suitable for ordinal categorical variables where there's an inherent order among categories.

In [20]:
from sklearn.preprocessing import LabelEncoder
import pandas as pd
import numpy as np

# Label Encoding:
- Label encoding assigns a unique integer to each category in the categorical variable.

In [21]:
# Load dataset
suicide_data_le = pd.read_csv("suicide_rates_1990-2022.csv")

# Assuming 'CountryName' is the column with country names
country_labels = suicide_data_le['CountryName']

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Fit and transform country labels
encoded_countries = label_encoder.fit_transform(country_labels)

# Replace the original 'CountryName' column with the encoded values
suicide_data_le['CountryName'] = encoded_countries


# One-Hot Encoding:
- One-hot encoding creates binary columns for each category in the categorical variable. I can use pandas' get_dummies() function for this.

In [22]:
# Load dataset
suicide_data = pd.read_csv("suicide_rates_1990-2022.csv")

# Assuming 'country' is the column with country names
country_dummies = pd.get_dummies(suicide_data['CountryName'], prefix='CountryName')

# Concatenate the one-hot encoded columns with the original dataframe
suicide_data = pd.concat([suicide_data, country_dummies], axis=1)

# Drop the original 'CountryName' column if needed
suicide_data.drop('CountryName', axis=1, inplace=True)
