Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use TensorFlow's Universal Sentence Encoder #21

Closed
drbh opened this issue Jul 22, 2019 · 10 comments
Closed

How to use TensorFlow's Universal Sentence Encoder #21

drbh opened this issue Jul 22, 2019 · 10 comments

Comments

@drbh
Copy link

drbh commented Jul 22, 2019

How would I load in the universal-sentence-encoder-large embedding model?

In Python

embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder-large/3")
embeddings = embed([
    "The quick brown fox jumps over the lazy dog.",
    "I am a sentence for which I would like to get its embedding"])

print session.run(embeddings)

In GO I've tried

model, err := tf.LoadSavedModel("universal-sentence-encoder-large", []string{"serve"}, nil)

if err != nil {
    fmt.Printf("Error loading saved model: %s\n", err.Error())
    return
}

but the program panics a when trying to load in the model 😕

when I use the saved_model_cli I get empty results

The given SavedModel contains the following tag-sets:

How would I use the model?
The directory looks like:

├── assets
├── saved_model.pb
├── tfhub_module.pb
└── variables

and the data was downloaded and unzipped from https://tfhub.dev/google/universal-sentence-encoder-large/3?tf-hub-format=compressed

@drbh
Copy link
Author

drbh commented Sep 30, 2019

bump

@galeone
Copy link
Owner

galeone commented Sep 30, 2019

Hi @drbh!

First of all, sorry for the late reply but I completely missed the issue!

However, I've just tried to execute the Python code, using TensorFlow 1.14 in a google colab and it doesn't work.

import tensorflow as tf
import tensorflow_hub as hub

embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder-large/3")
init = tf.global_variables_initializer()
embeddings = embed([
    "The quick brown fox jumps over the lazy dog.",
    "I am a sentence for which I would like to get its embedding"])

with tf.Session() as sess:
  sess.run(init)
  print(sess.run(embeddings))

I got an error about a non initialized table.

So I fixed the Python code in this way:

import tensorflow as tf
import tensorflow_hub as hub

embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder-large/3")
inits = [tf.global_variables_initializer(), tf.tables_initializer()]
embeddings = embed([
    "The quick brown fox jumps over the lazy dog.",
    "I am a sentence for which I would like to get its embedding"])

with tf.Session() as sess:
  sess.run(inits)
  print(sess.run(embeddings))

Now that I got a model working, I can try to create a SavedModel from Python and then use it from go.

From what I understand from the hub.Module documentation ( https://www.tensorflow.org/hub/api_docs/python/hub/Module ), a hub.Module is based on SavedModel, therefore we should be able to save the embed variable as a SavedModel to disk, and then load it from Go (hopefully).

The method export, perhasp, does what we need.

with tf.Session() as sess:
  sess.run(inits)
  embed.export("wat", sess)

Produces the following content in the wat folder:

total 4.5M
drwxr-xr-x 4 root root 4.0K Sep 30 17:24 .
drwxr-xr-x 1 root root 4.0K Sep 30 17:24 ..
drwxr-xr-x 2 root root 4.0K Sep 30 17:24 assets
-rw-r--r-- 1 root root 4.4M Sep 30 17:24 saved_model.pb
-rw-r--r-- 1 root root    2 Sep 30 17:24 tfhub_module.pb
drwxr-xr-x 2 root root 4.0K Sep 30 17:24 variables

Ok, it really looks like a SavedModel!

I don't have a PC with a Go installation right now (yeah, just formatted and I still have to setup the development env), but I guess that after extracting the needed information from the hub.Module (that is a SavedModel) we have all that we need to load from the wat folder the SavedModel and use it.

Here are the info I can get from Python:

print(embed.get_signature_names())
print(embed.get_input_info_dict())
print(embed.get_output_info_dict())

That gives

['default']
{'text': <hub.ParsedTensorInfo shape=(?,) dtype=string is_sparse=False>}
{'default': <hub.ParsedTensorInfo shape=(?, 512) dtype=float32 is_sparse=False>}

So (haven't tested, but I guess it should work or at least is a good starting point for further investigations), from Go you can load the model from the wat folder in this way (using tfgo and not the raw Go bindings):

package main

import (
        "fmt"
        tg "github.com/galeone/tfgo"
        tf "github.com/tensorflow/tensorflow/tensorflow/go"
)

func main() {
        model := tg.LoadModel("wat/", []string{"default"}, nil)
}

Then, maybe, you can use the test and default keys to get the full names of the input and output tensors, using saved_model_cli on the wat path.

Let me know if it helps and if you are able to load the model (and how!)

Sorry again for the huge delay

@drbh
Copy link
Author

drbh commented Oct 1, 2019

Wow thanks so much for the detailed and fast response! Also thank you for explaining how to save a model.

Sadly I've followed those instructions and re-saved the Tensorflow hub model into a SavedModel. However this model still comes up tagless.

The Python model import and resave -

import tensorflow as tf
import tensorflow_hub as hub

embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder-large/3")
inits = [tf.global_variables_initializer(), tf.tables_initializer()]

print(embed.get_signature_names())
print(embed.get_input_info_dict())
print(embed.get_output_info_dict())

with tf.Session() as sess:
  sess.run(inits)
  embed.export("wat", sess)

# $ python3 model.py

what saved_model_cli says about tags

saved_model_cli scan --dir wat/
# The given SavedModel contains the following tag-sets:

When I try to import in Golang

package main

import (
        "fmt"
        tg "github.com/galeone/tfgo"
        // tf "github.com/tensorflow/tensorflow/tensorflow/go"
)

func main() {
        model := tg.LoadModel("wat/", []string{"default"}, nil)
        fmt.Println(model)
}



// $ go run main.go
// 2019-10-01 08:22:32.184989: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: wat/
// 2019-10-01 08:22:32.210609: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { default }
// 2019-10-01 08:22:32.218247: I tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { default }; Status: fail. Took 33265 microseconds.
// panic: Could not find meta graph def matching supplied tags: { default }. To inspect available tag-sets in the SavedModel, please use the SavedModel CLI: `saved_model_cli`

Any idea on how to add or update the tagsets?

Thanks again for pointing me in the right direction 👍

@galeone
Copy link
Owner

galeone commented Oct 5, 2019

You're welcome!

However, perhaps the way to go is to use saved_model_cli correctly to inspect what's inside the wat folder and understand which are the available tags and their content.

saved_model_cli show --all --dir wat

It should give you all the information available for each tag (and so you will have the correct tag name).

Let me know if it helps 👍

@drbh
Copy link
Author

drbh commented Oct 9, 2019

@galeone

Thanks for the advice - sadly the above command did not yield any tags. However, I just needed to save the model differently (from Python) to get those tags in right place

with tf.Session() as sess:
    sess.run(inits)
    builder = tf.saved_model.builder.SavedModelBuilder("wat")
    builder.add_meta_graph_and_variables(sess, [tf.saved_model.tag_constants.SERVING],)
    builder.save()

# INFO:tensorflow:No assets to save.
# INFO:tensorflow:No assets to write.
# INFO:tensorflow:SavedModel written to: wat/saved_model.pb
# INFO:tensorflow:No assets to save.
# INFO:tensorflow:No assets to write.
# INFO:tensorflow:SavedModel written to: wat/saved_model.pb

then from Go I can finally load the model (note this took a couple mins to load into my GPU memory)

func main() {
        model := tg.LoadModel("wat/", []string{"serve"}, nil)
        fmt.Println(model)
}

// 2019-10-09 08:53:42.904583: I tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
// 2019-10-09 08:56:59.418976: I tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 196777906 microseconds.
// &{0xc42009a220}

profit 🎉

Any thoughts on why this way worked? Thanks again for your advice, it was extremely helpful.

@galeone
Copy link
Owner

galeone commented Oct 9, 2019

Great! Perhaps your solution worked because I supposed that hub.Module was a SavedModel and that the export method exported a correctly tagged SavedModel, but I was wrong :-)

Your solution instead, fallbacks on the tf.saved_model.builder.SavedModelBuilder API that was the correct way to create SavedModel objects.

I guess this makes sense. Let me know if you need any help or I can close this issue

@drbh
Copy link
Author

drbh commented Oct 9, 2019

A bit embarrassed to say, but I can't seem to Exec anything from the model 🤦‍♀️

At a high level I want to pass the model a list of strings and get the vector embeddings back...

i.e. in python

embeddings = embed([
    "The quick brown fox jumps over the lazy dog.",
    "I am a sentence for which I would like to get its embedding"])


...

sess.run(embeddings)

I've tried to pass a []string{} to model.Exec but not surprisingly, that did not work. I feel I am missing a critical step. Any ideas?

@galeone
Copy link
Owner

galeone commented Oct 9, 2019

You can use saved_model_cli show --all --dir wat to get the name of the input and output tensors and then use them as in the "Go code" section of the readme https://github.com/galeone/tfgo#train-in-python-serve-in-go

Let me know if it helps

@galeone
Copy link
Owner

galeone commented Feb 13, 2020

Closing for inactivity.

@galeone galeone closed this as completed Feb 13, 2020
@pengrongbo
Copy link

@drbh

Did you resolve the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants