Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance #8

Closed
guptam opened this issue May 13, 2020 · 18 comments
Closed

Slow performance #8

guptam opened this issue May 13, 2020 · 18 comments

Comments

@guptam
Copy link

guptam commented May 13, 2020

Hi,

Thanks for releasing this in open source. Wonderful concept and very different for other seq2seq or ln2sql like approaches.

I am facing performance issues when trying with sqa prediction notebook (using SQA Large). It takes more than 60 seconds on a dual gpu machine for evaluating the model and giving response to a query. Is this normal? How can we improve the prediction time?

Thanks,
Manish

@eisenjulian
Copy link
Collaborator

Hello @guptam Thanks for the interest and the question. To better understand your problem, can you confirm if the GPUs are being used? Is that time consistent to what you get on Google Colab? You can change the verbosity flag and look at the error log file to see if anything is going wrong with accelerator usage.

While the notebook is a good example to see some predictions, there's a lot of overhead since every time you run the prediction cell a new python runtime is spun up, and the full model has to be loaded from the checkpoint. That's probably taking the most of those 60 seconds, so if you want to predict for multiple examples you probably want do it at once for all of the examples, which is already supported by the evaluation script.

@guptam
Copy link
Author

guptam commented May 13, 2020

Hi @eisenjulian. Thanks for a quick response.

I am not using Colab. TF is using both the GPUs. I will try separating the model loading and evaluation. I was just trying to do a quick test, and might have missed this part. Will create a new predict function and use a pre-loaded model.

Regards,
Manish

@eldhosemjoy
Copy link

Hi, @guptam I believe the main overhead for prediction comes from the loading the checkpoints as well as the creation of a file as a tf example and on further writing the predictions to a file. The current code in the google collab is actually running a task to predict by generating 2 files out of which one is an empty file. I guess moving away the creation of files and model loading would help you to get improved performance. Are you looking at a script for just prediction?

@guptam
Copy link
Author

guptam commented May 16, 2020

Hi @eldhosemjoy. Yes, only looking for a script to predict using an already loaded model.

@eldhosemjoy
Copy link

Hi @guptam, a suggestion to get the prediction script implemented.

You will need to remove the dataset creation of file keeping proto to create features. collection. Convert those proto features using the already existing code on tapas/experiments/prediction_utils.py removing the tf session from lines 57 to 68. Generate features by converting them to (int32 - also change in the proto creation file i.e. tf_example_utils.py). Use the first example and send them to the separated estimator model function (def model_fn) from tapas/models/tapas_classifier_model.py on line 919 returning predictions. Use a session and graph to validate the prediction and implement the
def write_predictions from tapas/experiments/prediction_utils.py to return back a JSON.

I hope this would help you to build a prediction script and would definitely improve the performance.

On a later part, you could use tf.placeholders for your features and invoke the model_fn using a session and graph.

I believe there are even better ways and would love to know.

Thanks

@eisenjulian
Copy link
Collaborator

To add a clarification, the current script as used in the notebook should work as-is to predict on a large number of examples, with only a minimal change to dump all of your examples into a single tf_example file. Only when running it multiple times with just one example at a time is that there is a lot of overhead, both because the model is loaded again every time run_task_main.py is run, and because size 1 batches are wasting GPU/TPU parallelism.

On the other hand if you want to load the model in the notebook, the easiest way would be to copy the content of the run_task_main.py file into a notebook as a starting point. The estimator object which is defined here contains the model in memory and can be used to train and/or predict.

I hope this helps, otherwise please give us more info to help us understand your use case.

@eldhosemjoy
Copy link

eldhosemjoy commented May 18, 2020

@eisenjulian Absolutely. I had given the above option for a single example prediction on a realtime chat-based UI approach. To be more specific I guess @guptam was looking at more of leveraging the repo and the model for single predictions more towards a service/API level adaptation of the run_task_main.py if I have caught it right.

@eisenjulian
Copy link
Collaborator

Thanks for the clarification. If what you are looking for is to have a service to do predictions in real time, there are a few alternatives:

  • Create the estimator object as I commented before and predict when needed on a webserver. This is not the recommended approach for production usecases.
  • Export the a SavedModel from the estimator https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators and load it with a TensorFlow Serving server. That service receives serialized tf_examples, so you will need to create the tf_examples before calling it, just as it's done in the notebook, which you will likely need to do in a python server that loads the vocabulary file and the tapas lib.

@monuminu
Copy link

@eldhosemjoy

Can you please share your approach , as i am also trying for the same . Would save a lot of time .

@eldhosemjoy
Copy link

@monuminu, @guptam - I had tried to implement the prediction of the same as a service. You can have a look at the same -TAPAS Service Adaptation

@Akshaysharma29
Copy link

Hi, @eldhosemjoy thanks for sharing your work but while trying to use your code on colab I am facing below issue

path of model:/content/temp/tapas/model/tapas_sqa_base.pb
---------------------------------------------------------------------------
DecodeError                               Traceback (most recent call last)
<ipython-input-14-c82d684a8d52> in <module>()
----> 1 tapaspredictor  = TapasPredictor()

2 frames
<ipython-input-13-5e395447c7d6> in load_frozen_graph(self, frozen_graph_filename)
     80     with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
     81         graph_def = tf.GraphDef()
---> 82         graph_def.ParseFromString(f.read())
     83 
     84     # Then, we import the graph_def into a new Graph and returns it

DecodeError: Error parsing message

@eldhosemjoy
Copy link

@Akshaysharma29 you will need to do a git lfs pull or clone. The model is an LFS object.

@Akshaysharma29
Copy link

Akshaysharma29 commented Sep 25, 2020

@eldhosemjoy thanks for the quick response
ok I will try it

@eldhosemjoy
Copy link

git lfs clone https://github.com/eldhosemjoy/tapas.git
This will have the model pulled in to the repository.
@Akshaysharma29 You could take a look at this and you can directly run it from the directory - https://github.com/eldhosemjoy/tapas/blob/master/test/class_test.py

@Akshaysharma29
Copy link

Akshaysharma29 commented Sep 28, 2020

Thanks, @eldhosemjoy. it's working. which model version of SQA you have convert?

@eldhosemjoy
Copy link

@Akshaysharma29 The SQA Base - https://storage.googleapis.com/tapas_models/2020_04_21/tapas_sqa_base.zip

@rahulyadav02
Copy link

rahulyadav02 commented Dec 24, 2020

Hi @eldhosemjoy ,
I'm also facing issue with the slow performance.
In your repo TAPAS Service Adaptation, you have created a new class for prediction, where you are loading the saved model from config.json file.
Can you hep me with where exactly in the repo you are saving the model?

@TheurgicDuke771
Copy link

git lfs clone https://github.com/eldhosemjoy/tapas.git
This will have the model pulled in to the repository.
@Akshaysharma29 You could take a look at this and you can directly run it from the directory - https://github.com/eldhosemjoy/tapas/blob/master/test/class_test.py

Hi @eldhosemjoy while running class_test.py I am getting this error question_id = example["question_id"][0, 0].decode("utf-8") TypeError: 'Example' object is not subscriptable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants