Speed of nboost #68

vchulski · 2020-05-13T16:22:37Z

I have question about speed of processing for query.
I followed installation guide and wrote small script for testing nboost:

import timeit
import requests

source_query_text = "vegas"

def make_request():
    results_nboost = requests.get("http://localhost:8000/travel/_search?pretty&q=vegas&size=2").json()

time50 = timeit.timeit(make_request, number=50)
print(f"Time for 50 queries: {round(time50, 3)}, mean time for each iteration: {round(time50/50, 3)}")

Result of this script on 8th gen i7 are the following:
mean time for query using nboost/pt-tinybert-msmarco is: 0.54 seconds while mean time for query using nboost/pt-bert-base-uncased-msmarco is about 4 seconds. Both of these values are much higher than ones provided in benchmark table.

Could you please share the hardware specs at which you get provided results and recommendations how this time could be improved on CPU?

The text was updated successfully, but these errors were encountered:

kaykanloo · 2020-05-15T16:51:12Z

For the reference, I attach the query times that I get using the latest code from repo:

vchulski · 2020-05-17T10:20:52Z

@kaykanloo Thank you, got results close to yours and was wondering what am I doing wrong that there is such difference with reported time.

pertschuk · 2020-05-28T18:18:36Z

@kaykanloo @vchulski The numbers I posted are on a T4 GPU on Google Cloud.

The numbers I see on the AWS p3.2xlarge should be most comparable to this I would think.

The biggest discrepancy there is the pt-tinybert-msmarco so it seems like it's not actually running the code through the GPU that's slowing it down.

I would be curious if you call the model directly from like

from nboost.plugins import resolve_plugin

model_dir = 'nboost/pt-bert-base-uncased-msmarco' 
model_cls = 'PtTransformersRerankPlugin'

reranker = resolve_plugin(model_cls, model_dir=model_dir)
ranks, scores = model.rank(query, question_texts, filter_results=filter_results)

Does it have the same latency? There was an update to the networking code that may have slowed it down a while ago. Sorry if this numbers are not up to date.

kaykanloo · 2020-05-28T19:26:24Z

@pertschuk , I did some code profiling a few weeks ago to investigate the issue further that resulted in my last pull request. The diagram below depicts the total cpu time spent in each function for processing 10 get requests:

As you can see, the performance of ML models is comparable to your results. In fact, the majority of cpu time is spent in jsonpath-ng library's parser function. As you guessed, the networking code is slowing down the query response time.

beatobongco mentioned this issue Sep 1, 2020

What GPU was used for the benchmarks? #92

Open

petulla mentioned this issue Sep 15, 2020

Latency benchmarks information #90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed of nboost #68

Speed of nboost #68

vchulski commented May 13, 2020

kaykanloo commented May 15, 2020

vchulski commented May 17, 2020

pertschuk commented May 28, 2020

kaykanloo commented May 28, 2020

Speed of nboost #68

Speed of nboost #68

Comments

vchulski commented May 13, 2020

kaykanloo commented May 15, 2020

vchulski commented May 17, 2020

pertschuk commented May 28, 2020

kaykanloo commented May 28, 2020