## Ordinal numbering encoding

**Ordinal categorical variables**

Categorical variable which categories can be meaningfully ordered are called ordinal. For example:

- Student's grade in an exam (A, B, C or Fail).
- Days of the week can be ordinal with Monday = 1, and Sunday = 7.
- Educational level, with the categories: Elementary school,  High school, College graduate, PhD ranked from 1 to 4.

When the categorical variable is ordinal, the most straightforward approach is to replace the labels by some ordinal number.

### Advantages

- Keeps the semantical information of the variable (human readable content)
- Straightforward

### Disadvantage

- Does not add machine learning valuable information

I will simulate some data below to demonstrate this exercise

In [1]:
import pandas as pd
import datetime

In [2]:
# create a variable with dates, and from that extract the weekday
# I create a list of dates with 30 days difference from today
# and then transform it into a datafame

base = datetime.datetime.today()
date_list = [base - datetime.timedelta(days=x) for x in range(0, 30)]
df = pd.DataFrame(date_list)
df.columns = ['day']
df

Unnamed: 0,day
0,2021-07-15 14:31:23.478240
1,2021-07-14 14:31:23.478240
2,2021-07-13 14:31:23.478240
3,2021-07-12 14:31:23.478240
4,2021-07-11 14:31:23.478240
5,2021-07-10 14:31:23.478240
6,2021-07-09 14:31:23.478240
7,2021-07-08 14:31:23.478240
8,2021-07-07 14:31:23.478240
9,2021-07-06 14:31:23.478240


In [9]:
dir(df['day'].dt)

['__annotations__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_accessors',
 '_add_delegate_accessors',
 '_constructor',
 '_delegate_method',
 '_delegate_property_get',
 '_delegate_property_set',
 '_deprecations',
 '_dir_additions',
 '_dir_deletions',
 '_freeze',
 '_get_values',
 '_reset_cache',
 'ceil',
 'date',
 'day',
 'day_name',
 'dayofweek',
 'dayofyear',
 'days_in_month',
 'daysinmonth',
 'floor',
 'freq',
 'hour',
 'is_leap_year',
 'is_month_end',
 'is_month_start',
 'is_quarter_end',
 'is_quarter_start',
 'is_year_end',
 'is_year_start',
 'isocalendar',
 'microsecond',
 'minute',
 'month',
 'month_name',
 'nanosecond',
 'normalize',
 'quarter',
 'round',

In [11]:
# extract the week day name

df['day_of_week'] = df['day'].dt.day_name()
df.head()

Unnamed: 0,day,day_of_week
0,2021-07-15 14:31:23.478240,Thursday
1,2021-07-14 14:31:23.478240,Wednesday
2,2021-07-13 14:31:23.478240,Tuesday
3,2021-07-12 14:31:23.478240,Monday
4,2021-07-11 14:31:23.478240,Sunday


In [12]:
# Engineer categorical variable by ordinal number replacement

weekday_map = {'Monday':1,
               'Tuesday':2,
               'Wednesday':3,
               'Thursday':4,
               'Friday':5,
               'Saturday':6,
               'Sunday':7
}

df['day_ordinal'] = df.day_of_week.map(weekday_map)
df.head(10)

Unnamed: 0,day,day_of_week,day_ordinal
0,2021-07-15 14:31:23.478240,Thursday,4
1,2021-07-14 14:31:23.478240,Wednesday,3
2,2021-07-13 14:31:23.478240,Tuesday,2
3,2021-07-12 14:31:23.478240,Monday,1
4,2021-07-11 14:31:23.478240,Sunday,7
5,2021-07-10 14:31:23.478240,Saturday,6
6,2021-07-09 14:31:23.478240,Friday,5
7,2021-07-08 14:31:23.478240,Thursday,4
8,2021-07-07 14:31:23.478240,Wednesday,3
9,2021-07-06 14:31:23.478240,Tuesday,2


We can now use the variable day_ordinal in sklearn to build machine learning models.