## Using wrappers for Gensim models for working with Keras

This tutorial is about using gensim models as a part of your Keras models with the help of wrappers found at ```gensim.keras_integration```.

The wrappers available (as of now) are :
* Word2Vec (```gensim.keras_integration.keras_wrapper_gensim_word2vec.KerasWrapperWord2VecModel```), which wraps Gensim's ```Word2Vec``` model.

### Word2Vec

To use Word2Vec, we import the corresponding wrapper

In [1]:
from gensim.keras_integration.keras_wrapper_gensim_word2vec import KerasWrapperWord2VecModel

Using TensorFlow backend.


Next we create a dummy set of sentences to train the Word2Vec model associated with the wrapper.

In [2]:
sentences = [
    ['human', 'interface', 'computer'],
    ['survey', 'user', 'computer', 'system', 'response', 'time'],
    ['eps', 'user', 'interface', 'system'],
    ['system', 'human', 'system', 'eps'],
    ['user', 'response', 'time'],
    ['trees'],
    ['graph', 'trees'],
    ['graph', 'minors', 'trees'],
    ['graph', 'minors', 'survey']
]

Then, we call the wrapper and pass appropriate parameters.

In [3]:
model = KerasWrapperWord2VecModel(sentences, size=100, min_count=1, hs=1)



We can use methods and atributes associated with the Word2Vec model on the model returned by the wrapper.

In [4]:
sims = model.most_similar('graph', topn=10)   #words most similar to 'graph'
print sims

[('human', 0.21846070885658264), ('eps', 0.14406149089336395), ('system', 0.12887781858444214), ('time', 0.12749384343624115), ('computer', 0.10715050995349884), ('minors', 0.08211944997310638), ('user', 0.031229227781295776), ('interface', 0.01625414937734604), ('trees', 0.005966886878013611), ('survey', -0.10215148329734802)]


As with Word2Vec models, the results obtained after training on small input can be unexpected. 

#### Integration with Keras

As an example of using the wrapper with Keras, we try to use the wrapper for word similarity task where we compute the cosine distance as a measure of similarity between the two words.

In [5]:
import numpy as np
from keras.engine import Input
from keras.models import Model
from keras.layers import merge

We would use the layer returned by the function `get_embedding_layer` in the Keras model.

In [6]:
embedding_layer = model.get_embedding_layer()

Next, we construct the Keras model. 

In [7]:
input_a = Input(shape=(1,), dtype='int32', name='input_a')
input_b = Input(shape=(1,), dtype='int32', name='input_b')
embedding_a = embedding_layer(input_a)
embedding_b = embedding_layer(input_b)
similarity = merge([embedding_a, embedding_b], mode='cos', dot_axes=2)

keras_model = Model(input=[input_a, input_b], output=similarity)
keras_model.compile(optimizer='sgd', loss='mse')

Now, we input the two words which we wish to compare and retrieve the value predicted by the model as the similarity score of the two words. 

In [8]:
word_a = 'graph'
word_b = 'trees'
output = keras_model.predict([np.asarray([model.wv.vocab[word_a].index]), np.asarray([model.wv.vocab[word_b].index])])    #prob of occuring together
print output

[[[[ 0.00596689]]]]
