Options for encoding ordinal response

- Naive categorical encoding (ignore order)
- Integer encoding (really regression not classification)
- [Ordinal crossentropy loss for Keras](https://github.com/JHart96/keras_ordinal_categorical_crossentropy)
- [Ordinal regression for TF](https://github.com/gspell/TF-OrdinalRegression)
- "Cumulative" encoding [Cheng et al.]
- Split into K-1 binary classification problems [Frank and Hall] 
      - not sure if it's efficient with neural nets
      - Cheng et al.'s encoding does this in some sense, within a single network

In [1]:
# set up
# if installed, keras uses tf as backend
import keras
import numpy as np
import pandas as pd

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

# Import ordinal crossentropy loss function
import sys
sys.path.insert(0, "./keras_ordinal_categorical_crossentropy")
import ordinal_categorical_crossentropy as OCC

Using TensorFlow backend.


Example from Keras docs
https://keras.io/getting-started/sequential-model-guide/#examples

In [2]:
# Generate dummy data
x_train = np.random.random((1000, 20))
y_int_train = np.random.randint(10, size=(1000, 1))
y_train = keras.utils.to_categorical(y_int_train, num_classes=10)
x_test = np.random.random((100, 20))
y_int_test = np.random.randint(10, size=(100, 1))
y_test = keras.utils.to_categorical(y_int_test, num_classes=10)

# define classification model
model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

# set SGD as optimizer
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

In [3]:
# naive multi-class classification
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=20, batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[2.3121721744537354, 0.07000000029802322]

In [4]:
# Classification with ordinal crossentropy loss
model.compile(loss=OCC.loss,
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=20, batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[3.024740695953369, 0.09000000357627869]

In [5]:
# Integer encoding
model_int = Sequential()
model_int.add(Dense(64, activation='relu', input_dim=20))
model_int.add(Dropout(0.5))
model_int.add(Dense(64, activation='relu'))
model_int.add(Dropout(0.5))
model_int.add(Dense(1, activation='relu'))

model_int.compile(
    ## or try sparse_categorical_crossentropy for integer targets
    loss='mean_squared_error',
    optimizer=sgd,
    metrics=['accuracy', 'mse'])

model_int.fit(x_train, y_int_train, epochs=20, batch_size=128)
score = model_int.evaluate(x_test, y_int_test, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[8.508237838745117, 0.10000000149011612, 8.508237838745117]

In [7]:
# let's try it again but with sparse cross-entropy loss
# Integer encoding
model_sparse = Sequential()
model_sparse.add(Dense(64, activation='relu', input_dim=20))
model_sparse.add(Dropout(0.5))
model_sparse.add(Dense(64, activation='relu'))
model_sparse.add(Dropout(0.5))
model_sparse.add(Dense(10, activation='relu'))

model_sparse.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=sgd,
    metrics=['accuracy'])

model_sparse.fit(x_train, y_int_train, epochs=20, batch_size=128)
score = model_sparse.evaluate(x_test, y_int_test, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[10.895971298217773, 0.12999999523162842]

Remarks:

- Accuracy is pretty low across all three, but (sparse) integer encoding seems to work well
- Could be that architecture (ie, layers) and optimizer need tuning
- Oh yeah also because features are uncorrelated with response (duh!)

[TODO] Last three options...

## Applying it to our data

Let's apply this to a subset of our data

In [11]:
parcels = pd.read_csv('../data/toy-parcels.csv')
parcels.shape

X = parcels.drop(['recovery'], axis = 1)
X = np.array(X)
Y = parcels['recovery']

In [15]:
# create train / test split
from sklearn import model_selection
train_x, test_x, train_y, test_y = model_selection.train_test_split(X,Y,test_size = 0.1, random_state = 0)

# encode response as categorical
train_y_cat = keras.utils.to_categorical(train_y, num_classes=5)
test_y_cat = keras.utils.to_categorical(test_y, num_classes=5)

In [17]:
# cross-entropy loss
model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, activation='relu', input_dim=5))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(train_x, train_y_cat, epochs=20, batch_size=128)
score = model.evaluate(test_x, test_y_cat, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[1.2746046781539917, 0.47999998927116394]

In [18]:
# ordinal crossentropy loss
model.compile(loss=OCC.loss,
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(train_x, train_y_cat, epochs=20, batch_size=128)
score = model.evaluate(test_x, test_y_cat, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[1.9748244285583496, 0.30000001192092896]

In [20]:
# integer encoding
model_int = Sequential()
model_int.add(Dense(64, activation='relu', input_dim=5))
model_int.add(Dropout(0.5))
model_int.add(Dense(64, activation='relu'))
model_int.add(Dropout(0.5))
model_int.add(Dense(1, activation='relu'))

model_int.compile(
    ## or try sparse_categorical_crossentropy for integer targets
    loss='mean_squared_error',
    optimizer=sgd,
    metrics=['accuracy', 'mse'])

model_int.fit(train_x, train_y, epochs=20, batch_size=128)
score = model_int.evaluate(test_x, test_y, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[5.900000095367432, 0.14000000059604645, 5.900000095367432]

In [21]:
# sparse encoding
model_sparse = Sequential()
model_sparse.add(Dense(64, activation='relu', input_dim=5))
model_sparse.add(Dropout(0.5))
model_sparse.add(Dense(64, activation='relu'))
model_sparse.add(Dropout(0.5))
model_sparse.add(Dense(5, activation='relu'))

model_sparse.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=sgd,
    metrics=['accuracy'])

model_sparse.fit(train_x, train_y, epochs=20, batch_size=128)
score = model_sparse.evaluate(test_x, test_y, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[8.381409645080566, 0.47999998927116394]

## Takeaways

- Sparse categorical cross-entropy performs as well as categorical cross-entropy
- Ordinal cross-entropy performs worse than categorical cross-entropy!
- Integer encoding performs the worst