# Options for encoding ordinal response

- Naive categorical encoding (ignore order)
- Integer encoding (really regression not classification)
- [Ordinal crossentropy loss for Keras](https://github.com/JHart96/keras_ordinal_categorical_crossentropy)
- [Ordinal regression for TF](https://github.com/gspell/TF-OrdinalRegression)
- "Cumulative" encoding [Cheng et al.]
- Split into K-1 binary classification problems [Frank and Hall] 
     - not sure if it's efficient with neural nets
     - Cheng et al.'s encoding does this in some sense, within a single network

In [1]:
# set up
# if installed, keras uses tf as backend
import keras
import numpy as np
import pandas as pd

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

# Import ordinal crossentropy loss function
import sys
sys.path.insert(0, "./keras_ordinal_categorical_crossentropy")
import ordinal_categorical_crossentropy as OCC

Using TensorFlow backend.


Example from Keras docs
https://keras.io/getting-started/sequential-model-guide/#examples

In [2]:
# Generate dummy data
np.random.seed(123)
x_train = np.random.random((1000, 20))
y_int_train = np.random.randint(10, size=(1000, 1))
y_train = keras.utils.to_categorical(y_int_train, num_classes=10)
x_test = np.random.random((100, 20))
y_int_test = np.random.randint(10, size=(100, 1))
y_test = keras.utils.to_categorical(y_int_test, num_classes=10)

# define classification model
model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

# set SGD as optimizer
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

In [3]:
# naive multi-class classification
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=20, batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[2.2891061305999756, 0.11999999731779099]

In [4]:
# Classification with ordinal crossentropy loss
model.compile(loss=OCC.loss,
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=20, batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[3.2259721755981445, 0.14000000059604645]

In [5]:
# Integer encoding
model_int = Sequential()
model_int.add(Dense(64, activation='relu', input_dim=20))
model_int.add(Dropout(0.5))
model_int.add(Dense(64, activation='relu'))
model_int.add(Dropout(0.5))
model_int.add(Dense(1, activation='relu'))

model_int.compile(
    ## or try sparse_categorical_crossentropy for integer targets
    loss='mean_squared_error',
    optimizer=sgd,
    metrics=['accuracy', 'mse'])

model_int.fit(x_train, y_int_train, epochs=20, batch_size=128)
score = model_int.evaluate(x_test, y_int_test, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[9.062883377075195, 0.09000000357627869, 9.062883377075195]

In [6]:
# let's try it again but with sparse cross-entropy loss
# Integer encoding
model_sparse = Sequential()
model_sparse.add(Dense(64, activation='relu', input_dim=20))
model_sparse.add(Dropout(0.5))
model_sparse.add(Dense(64, activation='relu'))
model_sparse.add(Dropout(0.5))
model_sparse.add(Dense(10, activation='relu'))

model_sparse.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=sgd,
    metrics=['accuracy'])

model_sparse.fit(x_train, y_int_train, epochs=20, batch_size=128)
score = model_sparse.evaluate(x_test, y_int_test, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[2.3025853633880615, 0.14000000059604645]

Remarks:

- Accuracy is pretty low across all three, but (sparse) integer encoding seems to work well
- Could be that architecture (ie, layers) and optimizer need tuning
- Oh yeah also because features are uncorrelated with response (duh!)

[TODO] Last three options...

### Takeaways

- Sparse categorical cross-entropy performs as well as categorical cross-entropy
- Ordinal cross-entropy performs worse than categorical cross-entropy!
- Integer encoding performs the worst

## Simulated ordinal response

Let's generate toy data for which we actually know the 
data generating process (DGP). This will provide a better
benchmark for the different approaches.

In [104]:
# 1. set parameters
K = 3 # response categories
N = 10000 # number of examples
P = 3 # number of features
# thresholds:
mu0 = 0
mu1 = 3.14
# set DGP parameters
b0 = 1
b1 = 2
b2 = -2
b3 = 1

In [105]:
# 2. generate features
# TODO: auto-generate x_mean and x_cov
x_mean = (1, 2, 0.5) 
x_cov = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
X = np.random.multivariate_normal(x_mean, x_cov, N) # dim: NxK
# X.shape

In [106]:
# 3. generate normal error
u = np.random.normal(size=(N,1)) # dim: Nx1
# u.shape

In [107]:
# 4. generate latent response
B = np.array([[b1], [b2], [b3]]) # dim: Kx1
y_latent = b0 + X.dot(B) + u # dim: Nx1
# y_latent.shape

In [108]:
# 5. generate ordinal response
y = np.digitize(y_latent, [mu0, mu1], right=1)
# y.shape

In [110]:
# create train / test split
from sklearn import model_selection
train_x, test_x, train_y, test_y = model_selection.train_test_split(X,y,test_size = 0.1, random_state = 0)

# encode response as categorical
train_y_cat = keras.utils.to_categorical(train_y, num_classes=K)
test_y_cat = keras.utils.to_categorical(test_y, num_classes=K)

In [113]:
# cross-entropy loss
model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, activation='relu', input_dim=P))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(K, activation='softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(train_x, train_y_cat, epochs=20, batch_size=128)
score = model.evaluate(test_x, test_y_cat, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[0.3165572345256805, 0.8679999709129333]

In [114]:
# ordinal crossentropy loss
model.compile(loss=OCC.loss,
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(train_x, train_y_cat, epochs=20, batch_size=128)
score = model.evaluate(test_x, test_y_cat, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[0.41264120388031006, 0.8679999709129333]

In [115]:
# integer encoding
model_int = Sequential()
model_int.add(Dense(64, activation='relu', input_dim=P))
model_int.add(Dropout(0.5))
model_int.add(Dense(64, activation='relu'))
model_int.add(Dropout(0.5))
model_int.add(Dense(1, activation='relu'))

model_int.compile(
    ## or try sparse_categorical_crossentropy for integer targets
    loss='mean_squared_error',
    optimizer=sgd,
    metrics=['accuracy', 'mse'])

model_int.fit(train_x, train_y, epochs=20, batch_size=128)
score = model_int.evaluate(test_x, test_y, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[0.11300325906276702, 0.843999981880188, 0.11300323903560638]

In [116]:
# sparse encoding
model_sparse = Sequential()
model_sparse.add(Dense(64, activation='relu', input_dim=P))
model_sparse.add(Dropout(0.5))
model_sparse.add(Dense(64, activation='relu'))
model_sparse.add(Dropout(0.5))
model_sparse.add(Dense(K, activation='relu'))

model_sparse.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=sgd,
    metrics=['accuracy'])

model_sparse.fit(train_x, train_y, epochs=20, batch_size=128)
score = model_sparse.evaluate(test_x, test_y, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[1.098612057685852, 0.5649999976158142]

## Observations

- Sparse integer encoding is actually the worst!
- Accuracy about the same for cross_entropy and ordinal_cross_entropy loss functions + categorical encoding

This is still using 