<h3>Ordinal Encoding for Categorical Variables</h3>

In [1]:
import pandas as pd
import numpy as np

<p>However in the case of ordinal variables, the user must be cautious in using <a href="http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.factorize.html" rel="nofollow noreferrer"><code>pandas.factorize</code></a>. The reason is that the engineer wants to preserve the relationship in the mapping such that <code> a &gt; b &gt; c</code>.</p>

<p>So if I want to take a set of categorical variables where <code>large &gt; medium &gt; small</code>, and preserve that, I need to make sure that <code>pandas.factorize</code> preserves that relationship.</p>

In [2]:
# create a sample of OPs unique values
series = pd.Series(np.random.randint(0,3,100))
print(series)

0     1
1     0
2     0
3     1
4     1
     ..
95    2
96    2
97    1
98    0
99    0
Length: 100, dtype: int32


In [3]:
mapper = {0:'small',1:'medium',2:'large'}
ordinal_variable = series.replace(mapper)

print(pd.factorize(ordinal_variable))

(array([0, 1, 1, 0, 0, 1, 2, 1, 0, 1, 1, 0, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2,
       2, 0, 2, 2, 1, 2, 0, 0, 1, 1, 0, 0, 1, 1, 2, 2, 2, 2, 1, 1, 0, 0,
       1, 1, 1, 1, 0, 0, 2, 1, 0, 1, 2, 0, 2, 2, 0, 1, 1, 0, 1, 1, 0, 1,
       0, 2, 1, 1, 1, 0, 2, 1, 2, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 2, 2, 2,
       1, 1, 1, 2, 2, 1, 0, 2, 2, 0, 1, 1], dtype=int64), Index(['medium', 'small', 'large'], dtype='object'))
