Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneHot Layer #3680

Closed
nhanitvn opened this issue Sep 3, 2016 · 3 comments
Closed

OneHot Layer #3680

nhanitvn opened this issue Sep 3, 2016 · 3 comments

Comments

@nhanitvn
Copy link

nhanitvn commented Sep 3, 2016

From my current modeling tasks, I see that it would be useful to have the flexibility to encode a categorical feature either in one-hot format or embedding format (using Embedding layer) right in the model construction phase instead of creating dummy columns in advance in case of one-hot encoding (it is the zero-based integers in case of Embedding). Though we can use Lambda layer for that purpose, I think it would be more convenient to have a OneHot layer instead. I wrote the code for the propose OneHot layer already which just calls K.one_hot() internally. Feel free to give your thought on whether we should add such layer or not in Keras. I am happy to contribute the code via a PR. Thanks.

The pseudo-code would be like this

models = []
for feature in features:
        if is_categorical(feature):
            model = Sequential()
            if to_encode(feature) == 'one_hot':
                model.add(OneHot())
            else:
                model.add(Embedding())
            models.append(model)
        else:
            model = Sequential()
            model.add(Dense())
            models.append(model)

model = Sequential()
model.add(Merge(models, mode='concat'))
...more layers added...

I created a PR #3846

@nhanitvn
Copy link
Author

Lambda(K.one_hot()) instead as suggested by @fchollet

@bzamecnik
Copy link
Contributor

There are a few catches when using Lambda(K.one_hot), but generally it's possible:

  • the input must be integer (uint8, int32, int64), not float32
  • you have to specify the number of classes explicitly
  • you have to specify the output shape explicitly
from keras import backend as K
from keras.layers import Input, Lambda

input_shape = (10, ) # sequences of length 10
nb_classes = 20
output_shape = (input_shape[0], nb_classes)
input = Input(shape=input_shape, dtype='uint8')
x_ohe = Lambda(K.one_hot, arguments={'nb_classes': nb_classes}, output_shape=output_shape)(input)

Try this like:

import numpy as np
from keras.models import Model
# 5 sequences of length 10
X_classes = np.random.randint(0, 20, size=(5, 10))
assert Model(input, x_ohe).predict(X_classes) == (5, 10, 20)

@bzamecnik
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants