-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
single example inference seems slow #32
Comments
Please try the tensorflow_bert demo in v2. |
@byshiue thank you for the quick reply, but any reason? which is part is different from v1 ? thanks |
The encoder of v1 and v2 are same. And v1 also provide the tensorflow_bert demo, but we do not demonstrate how to use in the README. So, I recommend you run the tensorflow_bert following by the README of v2 first. |
@ thank you, i went back to test result again, and i found, if we do not use fastertransformer, the original bert inference time for single example(not average inference time of a test files but one input) is around 15ms. so Actually we reduce the time from 15ms -> 9ms, so fastertransformer works. I was thinking about one thing. for original bert without fastertransformer, we can use export model to save mode as pb and use feature to inference. it will reduce time a lot, Can we use export model here with faster transformer? like how? thanks so much |
Yes. There are two ways. |
@byshiue thank you.. but kinda overwhelming.... i am new for tensorflow and bert. Is there any code i can reference? |
You can first try by Another sample is, using sess.run(all_var) to get the values of all variables as numpy format, and then put them into the FasteTransformer op. After you understand how to use the FasterTransformer, you can modify the sample of tensorflow_bert to run the test on the BERT. |
closing due to inactivity |
Hi, my environment is tf 1.13.1 . I already set up the fastertransformer v1 and used bert example. When i used original bert inference(input test file) and the predcit time per sample is around 0.0035s(time used in estimator.predict / num of sample). The original bert(without fastertransformer is around 0.007s)
However, when i used input fn builder(not file based) to inference only one sample, the time is 0.009s (same as one inference of original bert which is also 0.009s). Could u please help about this?
The text was updated successfully, but these errors were encountered: