Speed up inference #50

shubhank008 · 2018-11-17T08:38:47Z

On most basic AWS instance (CPU) it takes anywhere 8-10 seconds to complete a reply. On a GPU node (K10), its around 2-3 secs (honestly not worth it considering price difference in both instances).
Was wondering, is there a way or options to tweak to speed up the inference output ? Tried playing with Beam but not worth the dent it makes compared to the quality of response.

Still a beginner in ML, but would using a TPU make a difference in inference, compared to using GPU ?

pender · 2019-02-17T22:03:45Z

Yes, TPU would probably be much faster. If you really know what you're doing, you could try replacing my beam search implementation with one that happens entirely on the GPU, which could make it run much faster with no degradation in quality -- but that's beyond my knowhow. Otherwise I'd just recommend playing around with the inference options I've included -- for example, beam width of 1 and topn of 5 might be worth trying. That will degrade quality somewhat, but probably any deviation from the default options will degrade quality, because I picked the default options to maximize quality :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up inference #50

Speed up inference #50

shubhank008 commented Nov 17, 2018

pender commented Feb 17, 2019

Speed up inference #50

Speed up inference #50

Comments

shubhank008 commented Nov 17, 2018

pender commented Feb 17, 2019