OneHot Layer #3680

nhanitvn · 2016-09-03T06:48:03Z

From my current modeling tasks, I see that it would be useful to have the flexibility to encode a categorical feature either in one-hot format or embedding format (using Embedding layer) right in the model construction phase instead of creating dummy columns in advance in case of one-hot encoding (it is the zero-based integers in case of Embedding). Though we can use Lambda layer for that purpose, I think it would be more convenient to have a OneHot layer instead. I wrote the code for the propose OneHot layer already which just calls K.one_hot() internally. Feel free to give your thought on whether we should add such layer or not in Keras. I am happy to contribute the code via a PR. Thanks.

The pseudo-code would be like this

models = []
for feature in features:
        if is_categorical(feature):
            model = Sequential()
            if to_encode(feature) == 'one_hot':
                model.add(OneHot())
            else:
                model.add(Embedding())
            models.append(model)
        else:
            model = Sequential()
            model.add(Dense())
            models.append(model)

model = Sequential()
model.add(Merge(models, mode='concat'))
...more layers added...

I created a PR #3846

The text was updated successfully, but these errors were encountered:

nhanitvn · 2016-09-22T17:20:34Z

Lambda(K.one_hot()) instead as suggested by @fchollet

bzamecnik · 2016-12-18T06:06:33Z

There are a few catches when using Lambda(K.one_hot), but generally it's possible:

the input must be integer (uint8, int32, int64), not float32
you have to specify the number of classes explicitly
you have to specify the output shape explicitly

from keras import backend as K
from keras.layers import Input, Lambda

input_shape = (10, ) # sequences of length 10
nb_classes = 20
output_shape = (input_shape[0], nb_classes)
input = Input(shape=input_shape, dtype='uint8')
x_ohe = Lambda(K.one_hot, arguments={'nb_classes': nb_classes}, output_shape=output_shape)(input)

Try this like:

import numpy as np
from keras.models import Model
# 5 sequences of length 10
X_classes = np.random.randint(0, 20, size=(5, 10))
assert Model(input, x_ohe).predict(X_classes) == (5, 10, 20)

bzamecnik · 2016-12-18T06:17:42Z

Full example in a gist: https://gist.github.com/bzamecnik/a33052ec46ee7efeb217856d98a4fb5f

nhanitvn mentioned this issue Sep 22, 2016

Add OneHot layer for generating one hot embedding for zero-based integer features #3846

Closed

nhanitvn closed this as completed Sep 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OneHot Layer #3680

OneHot Layer #3680

nhanitvn commented Sep 3, 2016 •

edited

Loading

nhanitvn commented Sep 22, 2016

bzamecnik commented Dec 18, 2016

bzamecnik commented Dec 18, 2016

OneHot Layer #3680

OneHot Layer #3680

Comments

nhanitvn commented Sep 3, 2016 • edited Loading

nhanitvn commented Sep 22, 2016

bzamecnik commented Dec 18, 2016

bzamecnik commented Dec 18, 2016

nhanitvn commented Sep 3, 2016 •

edited

Loading