[View in Colaboratory](https://colab.research.google.com/github/gmihaila/deep_learning_toolbox/blob/master/keras_embedding.ipynb)

### Keras embedding layer for input NN


Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

This layer can only be used as the first layer in a model.

Arguments

#### input_dim: int > 0. 
Size of the vocabulary, i.e. maximum integer index + 1.

#### output_dim: 
int >= 0. Dimension of the dense embedding.

#### embeddings_initializer: 
Initializer for the embeddings matrix (see initializers).

#### embeddings_regularizer: 
Regularizer function applied to the embeddings matrix (see regularizer).

#### embeddings_constraint: 
Constraint function applied to the embeddings matrix (see constraints).

#### mask_zero: 
Whether or not the input value 0 is a special "padding" value that should be masked out. This is useful when using recurrent layers which may take variable length input. If this is True then all subsequent layers in the model need to support masking or an exception will be raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal size of vocabulary + 1).

#### input_length: 
Length of input sequences, when it is constant. This argument is required if you are going to connect  Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed).

#### Input shape
2D tensor with shape: (batch_size, sequence_length).

#### Output shape
3D tensor with shape: (batch_size, sequence_length, output_dim).

In [8]:
from keras.layers import Embedding
from keras.models import Sequential

import numpy as np

model = Sequential()
model.add(Embedding(1000, 64, input_length=10))
# the model will take as input an integer matrix of size (batch, input_length).
# the largest integer (i.e. word index) in the input should be no larger than 999 (vocabulary size).
# now model.output_shape == (None, 10, 64), where None is the batch dimension.

input_array = np.random.randint(1000, size=(32, 10))

print input_array.shape

model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)

assert output_array.shape == (32, 10, 64)

print output_array.shape

print 'INPUT\n %s'%input_array
print '\n------------------------\n'
print 'OUTPUT\n %s'%output_array

(32, 10)
(32, 10, 64)
INPUT
 [[200 528 642  16 944 519 608 432 244 332]
 [600 988 600 305  69 632 937 758 329 931]
 [567 868 282 373 939 376 567 775 280 862]
 [229 315 486 496 280 251 289 971 997 795]
 [879 719 399  54 503 360 128 819 540 678]
 [848  91 247 228 526 379 602 419 541 504]
 [560 249 685 744 313 226 837 375 556 104]
 [122 763 751 930 762   8 258   4 934 701]
 [814 995 169 242 852 735 852  84 520 233]
 [359 985 103 308 878 122 519 151  98 569]
 [865 254   3 825 496 199 318  59 603 828]
 [ 32 314 634 805 257  75 864 320 388 800]
 [464 792 132 649 484  91 479 565 585 250]
 [432 576 203 678 241 794 616 219 555 553]
 [591 752 461 136 894 159 582 284 613 824]
 [657 425 884 698 338 966 481 661 818 197]
 [536 828 881 415 115 602 594 364 104 746]
 [982 993 200 104 576 370 772 860 427 941]
 [638 612 491 858 152 772 540 608 956 237]
 [880  89 599 124 857 325 841  51 411  44]
 [320 937  40 630  71 203 200 204 464 597]
 [800 836 545 175 986 223  15 262 732 851]
 [138 679 482 507  98 178

ref: https://keras.io/layers/embeddings/#embedding 