# Label Encoding

<span>Encoding a categorical feature into numeric values before processing the data through your machine learning model is now easier than ever, given that two methods now exist for a data scientist. You can use Pandas category datatype to quickly encode your data, or use the Skearn LabelEncoding class to encode your data. I generally prefer going the Pandas route as Pandas is involved in my main workflow.</span>

### Import Preliminaries

In [6]:
# Import modules
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Create an example dataframe
df = pd.DataFrame(data = [['Matt','24','99'],
                         ['Owen','22', '98'],
                         ['Link', '16','100']], 
                  columns = ['name','age','score'])

# View the dataframe
df

Unnamed: 0,name,age,score
0,Matt,24,99
1,Owen,22,98
2,Link,16,100


In [7]:
# View the schema of the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
name     3 non-null object
age      3 non-null object
score    3 non-null object
dtypes: object(3)
memory usage: 152.0+ bytes


### Label Encoding with Pandas

In [8]:
# Copy example datafarme
pdf = df.copy()

# Change the name feature from an object to a category
pdf.name = pdf.name.astype('category')

# Use the codes attribute to convert the dataframe into some labesl
pdf.name = pdf.name.cat.codes

# View the new dataframe
pdf

Unnamed: 0,name,age,score
0,1,24,99
1,2,22,98
2,0,16,100


### Label Encoding with SkLearn

In [9]:
# Copy example datafarme
skdf = df.copy()

# Assing the label encoder
le = LabelEncoder()

# Fit and transform our dataframe with the label encoder
skdf.name = le.fit_transform(df.name)

# view the new datafarme
skdf

Unnamed: 0,name,age,score
0,1,24,99
1,2,22,98
2,0,16,100


Author: Kavi Sekhon