Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up inference #50

Open
shubhank008 opened this issue Nov 17, 2018 · 1 comment
Open

Speed up inference #50

shubhank008 opened this issue Nov 17, 2018 · 1 comment

Comments

@shubhank008
Copy link

On most basic AWS instance (CPU) it takes anywhere 8-10 seconds to complete a reply. On a GPU node (K10), its around 2-3 secs (honestly not worth it considering price difference in both instances).
Was wondering, is there a way or options to tweak to speed up the inference output ? Tried playing with Beam but not worth the dent it makes compared to the quality of response.

Still a beginner in ML, but would using a TPU make a difference in inference, compared to using GPU ?

@pender
Copy link
Owner

pender commented Feb 17, 2019

Yes, TPU would probably be much faster. If you really know what you're doing, you could try replacing my beam search implementation with one that happens entirely on the GPU, which could make it run much faster with no degradation in quality -- but that's beyond my knowhow. Otherwise I'd just recommend playing around with the inference options I've included -- for example, beam width of 1 and topn of 5 might be worth trying. That will degrade quality somewhat, but probably any deviation from the default options will degrade quality, because I picked the default options to maximize quality :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants