Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single example inference seems slow #32

Closed
652994331 opened this issue Jun 3, 2020 · 8 comments
Closed

single example inference seems slow #32

652994331 opened this issue Jun 3, 2020 · 8 comments
Assignees

Comments

@652994331
Copy link

Hi, my environment is tf 1.13.1 . I already set up the fastertransformer v1 and used bert example. When i used original bert inference(input test file) and the predcit time per sample is around 0.0035s(time used in estimator.predict / num of sample). The original bert(without fastertransformer is around 0.007s)

However, when i used input fn builder(not file based) to inference only one sample, the time is 0.009s (same as one inference of original bert which is also 0.009s). Could u please help about this?

@byshiue
Copy link
Collaborator

byshiue commented Jun 3, 2020

Please try the tensorflow_bert demo in v2.

@652994331
Copy link
Author

@byshiue thank you for the quick reply, but any reason? which is part is different from v1 ? thanks

@byshiue
Copy link
Collaborator

byshiue commented Jun 3, 2020

The encoder of v1 and v2 are same. And v1 also provide the tensorflow_bert demo, but we do not demonstrate how to use in the README. So, I recommend you run the tensorflow_bert following by the README of v2 first.
There are many possible reasons, and I cannot give the answer because we do not have enough information.

@652994331
Copy link
Author

@ thank you, i went back to test result again, and i found, if we do not use fastertransformer, the original bert inference time for single example(not average inference time of a test files but one input) is around 15ms. so Actually we reduce the time from 15ms -> 9ms, so fastertransformer works.

I was thinking about one thing. for original bert without fastertransformer, we can use export model to save mode as pb and use feature to inference. it will reduce time a lot, Can we use export model here with faster transformer? like how?

thanks so much

@byshiue
Copy link
Collaborator

byshiue commented Jun 4, 2020

Yes. There are two ways.
First, you can restore the checkpoint, and get the variables by tf.get_tensor_by_name or other similar function, and put them into FasterTransformer. If you put the variables by the tf.tensor format, then the overhead of constructor of FasterTransformer would be smaller because it does not need to copy the memory.
Another way is put the weights as the numpy format. In this way, the overhead of constructor would be large, but there is no effect for inference time.

@652994331
Copy link
Author

@byshiue thank you.. but kinda overwhelming.... i am new for tensorflow and bert. Is there any code i can reference?

@byshiue
Copy link
Collaborator

byshiue commented Jun 4, 2020

You can first try by sample/tensorflow/encoder_sample.py and sample/tensorflow/utils/encoder.py.
This is an easy environment to verify the correctness and the inference speed.
For example, you can try to replace the "encoder_vars[val_off + 0]" of encoder.py by "tf.get_default_graph().get_tensor_by_name('layer_%d/attention/self/query/kernel:0' % layer_idx)".

Another sample is, using sess.run(all_var) to get the values of all variables as numpy format, and then put them into the FasteTransformer op.

After you understand how to use the FasterTransformer, you can modify the sample of tensorflow_bert to run the test on the BERT.

@byshiue
Copy link
Collaborator

byshiue commented Jun 25, 2020

closing due to inactivity

@byshiue byshiue transferred this issue from NVIDIA/DeepLearningExamples Apr 5, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants