-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed of nboost #68
Comments
@kaykanloo Thank you, got results close to yours and was wondering what am I doing wrong that there is such difference with reported time. |
@kaykanloo @vchulski The numbers I posted are on a T4 GPU on Google Cloud. The numbers I see on the AWS p3.2xlarge should be most comparable to this I would think. The biggest discrepancy there is the pt-tinybert-msmarco so it seems like it's not actually running the code through the GPU that's slowing it down. I would be curious if you call the model directly from like from nboost.plugins import resolve_plugin
model_dir = 'nboost/pt-bert-base-uncased-msmarco'
model_cls = 'PtTransformersRerankPlugin'
reranker = resolve_plugin(model_cls, model_dir=model_dir)
ranks, scores = model.rank(query, question_texts, filter_results=filter_results) Does it have the same latency? There was an update to the networking code that may have slowed it down a while ago. Sorry if this numbers are not up to date. |
@pertschuk , I did some code profiling a few weeks ago to investigate the issue further that resulted in my last pull request. The diagram below depicts the total cpu time spent in each function for processing 10 get requests: |
I have question about speed of processing for query.
I followed installation guide and wrote small script for testing nboost:
Result of this script on 8th gen i7 are the following:
mean time for query using
nboost/pt-tinybert-msmarco
is: 0.54 seconds while mean time for query usingnboost/pt-bert-base-uncased-msmarco
is about 4 seconds. Both of these values are much higher than ones provided in benchmark table.Could you please share the hardware specs at which you get provided results and recommendations how this time could be improved on CPU?
The text was updated successfully, but these errors were encountered: