Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading Tensor #2

Open
suatfk opened this issue Aug 21, 2018 · 1 comment
Open

Loading Tensor #2

suatfk opened this issue Aug 21, 2018 · 1 comment

Comments

@suatfk
Copy link

suatfk commented Aug 21, 2018

Thanks for perfect work inspired me a lot.

Here is the story.

Im currently working with tensorflow.datasets.imdb dataset. i decided to use Glove word embeddings(300d) with my toy project but imdb dataset contains only word indexes like [1,3,515,...] not the words which mean basically comes with internal word index.

So i decided to convert this indexes to glove word embedding indexes to use embeddings. Here is conversation which i try to implement totally in tensorflow for learning purposes.

imdb_dataset -> imdb_index_to_word_dict -> glove_word_to_index -> glove_word_embedding
[12, 325, 123,... ] -> ["the", "equal", "append"] -> [15, 645, 722,...] -> [[][]] (shape (n_word,300d))

Here is my code:


glove_word_dict, tf_embedding = glove_utils.as_tensor.load_embedding_and_dict(data_folder, glove_name,
                                                                              glove_dimension, sess)

glove_dict_array = []
for key, value in glove_word_dict.items():
    glove_dict_array.insert(value, key)

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

imdb_word_index_dict = imdb.get_word_index()

imdb_word_index_array = []
for key, value in imdb_word_index_dict.items():
    key = key.replace("\\", "").replace("'", "")
    imdb_word_index_array.insert(value, key)


indices = tf.placeholder(dtype=tf.int64, shape=(None,))

imdb_word_index_tf_table = tf.contrib.lookup.index_to_string_table_from_tensor(tf.constant(imdb_word_index_array),
                                                                               default_value="UNKNOWN")

glove_word_dict_tf_table = tf.contrib.lookup.index_table_from_tensor(mapping=tf.constant(glove_dict_array),
                                                                     default_value=1)

word_index_tf_indices = imdb_word_index_tf_table.lookup(indices)

glove_indices = glove_word_dict_tf_table.lookup(word_index_tf_indices)

result = tf.nn.embedding_lookup(params=tf_embedding, ids=glove_indices)

sess.run(tf.tables_initializer())

sess.run(tf.global_variables_initializer())

embedding, glove_indices_result  = sess.run([ result, glove_indices], feed_dict={
    indices: [4, 4, 4, 4, 4]
})

Here is the problem

When i tried to run every time this code block above glove_indices_result contains true values but somehow embedding return only default value with bunch of zeros.

and Here is the solution

i changed this code block when used loading tf_embedding ( embedding tensor)

 #1. Define the variable that will hold the embedding:
  tf_embedding = tf.Variable(
  tf.constant(1.0, shape=shape),
            trainable=False,
         name="Embedding"
     )

 # 2. Restore the embedding from disks to TensorFlow, GPU (or CPU if GPU unavailable):

with this

#1. Define the variable that will hold the embedding:
tf_embedding = tf.get_variable(
        name='Embedding',
        shape=shape,
        trainable=False)
 # 2. Restore the embedding from disks to TensorFlow, GPU (or CPU if GPU unavailable):

and it is working like charm now. Thanks for the perfect work and your time. This issue take half day from me so i dont want this issue take someone else's time.

Thank you.

@guillaume-chevalier
Copy link
Owner

Hi @suatfk, thanks for sharing! I'll wait before TensorFlow 2.0 to come before changing this.

Maybe that this problem was caused by an update or probably if you try to declare two tensors with the same name, hence why you would need get_variable. Or maybe the global_variables_initializer was overriding the values with zeros... interesting. I'll check that when refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants