Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed of nboost #68

Open
vchulski opened this issue May 13, 2020 · 4 comments
Open

Speed of nboost #68

vchulski opened this issue May 13, 2020 · 4 comments

Comments

@vchulski
Copy link

I have question about speed of processing for query.
I followed installation guide and wrote small script for testing nboost:

import timeit
import requests

source_query_text = "vegas"

def make_request():
    results_nboost = requests.get("http://localhost:8000/travel/_search?pretty&q=vegas&size=2").json()

time50 = timeit.timeit(make_request, number=50)
print(f"Time for 50 queries: {round(time50, 3)}, mean time for each iteration: {round(time50/50, 3)}")

Result of this script on 8th gen i7 are the following:
mean time for query using nboost/pt-tinybert-msmarco is: 0.54 seconds while mean time for query using nboost/pt-bert-base-uncased-msmarco is about 4 seconds. Both of these values are much higher than ones provided in benchmark table.

Could you please share the hardware specs at which you get provided results and recommendations how this time could be improved on CPU?

@kaykanloo
Copy link
Contributor

For the reference, I attach the query times that I get using the latest code from repo:

image

@vchulski
Copy link
Author

@kaykanloo Thank you, got results close to yours and was wondering what am I doing wrong that there is such difference with reported time.

@pertschuk
Copy link
Contributor

@kaykanloo @vchulski The numbers I posted are on a T4 GPU on Google Cloud.

The numbers I see on the AWS p3.2xlarge should be most comparable to this I would think.

The biggest discrepancy there is the pt-tinybert-msmarco so it seems like it's not actually running the code through the GPU that's slowing it down.

I would be curious if you call the model directly from like

from nboost.plugins import resolve_plugin

model_dir = 'nboost/pt-bert-base-uncased-msmarco' 
model_cls = 'PtTransformersRerankPlugin'

reranker = resolve_plugin(model_cls, model_dir=model_dir)
ranks, scores = model.rank(query, question_texts, filter_results=filter_results)

Does it have the same latency? There was an update to the networking code that may have slowed it down a while ago. Sorry if this numbers are not up to date.

@kaykanloo
Copy link
Contributor

@pertschuk , I did some code profiling a few weeks ago to investigate the issue further that resulted in my last pull request. The diagram below depicts the total cpu time spent in each function for processing 10 get requests:
NBoostProfiling
As you can see, the performance of ML models is comparable to your results. In fact, the majority of cpu time is spent in jsonpath-ng library's parser function. As you guessed, the networking code is slowing down the query response time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants