[NLP] Evaluate batched inference calls singularly #2538

davidkyle · 2023-06-19T16:37:57Z

Batch inference calls use more memory which can lead to OOM errors in extreme cases. This change iterates over the requests in a batch evaluating them one at a time.

Comparing batched to un-batched evaluation, benchmarking shows that memory usage is significantly lower and the total time for inference is similar in both cases. The benchmark data was generated with the ELSER model with different size batches. Each item in the batch contained 512 tokens. Inference Time is the time to process the entire batch whether singularly or all at once.

Num items in request	Memory Max RSS (MB) Batched	Memory Max RSS (MB)	Inference Time (ms) Batched	Inference Time (ms)
0	946	943	0	0
10	2605	1219	5022	5309
20	4237	1234	9717	9478
30	5960	1239	14434	14408
40	6032	1251	19902	19396
50	6616	1257	24853	24112

bin/pytorch_inference/evaluate.py

docs/CHANGELOG.asciidoc

Co-authored-by: David Roberts <dave.roberts@elastic.co>

droberts195

LGTM

davidkyle added 4 commits June 19, 2023 16:33

one at at time

e8e66ef

fix evaluate.py

2399668

more

6e1063a

format etc

afb78d6

davidkyle added >bug 3rd party models labels Jun 19, 2023

davidkyle requested a review from tveasey June 19, 2023 16:37

davidkyle commented Jun 19, 2023

View reviewed changes

bin/pytorch_inference/evaluate.py Outdated Show resolved Hide resolved

davidkyle added 2 commits June 19, 2023 17:38

Update bin/pytorch_inference/evaluate.py

b5a6972

add change log

7c97d14

droberts195 reviewed Jun 19, 2023

View reviewed changes

docs/CHANGELOG.asciidoc Outdated Show resolved Hide resolved

droberts195 added the v8.9.0 label Jun 19, 2023

Update docs/CHANGELOG.asciidoc

ec9beb5

Co-authored-by: David Roberts <dave.roberts@elastic.co>

droberts195 approved these changes Jun 20, 2023

View reviewed changes

davidkyle merged commit 23b6900 into elastic:main Jun 20, 2023
13 checks passed

davidkyle deleted the un-batch branch June 20, 2023 08:57

davidkyle mentioned this pull request Aug 3, 2023

Adapt Question Answering processing for non-batched evaluation elastic/elasticsearch#98167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NLP] Evaluate batched inference calls singularly #2538

[NLP] Evaluate batched inference calls singularly #2538

davidkyle commented Jun 19, 2023

droberts195 left a comment

[NLP] Evaluate batched inference calls singularly #2538

[NLP] Evaluate batched inference calls singularly #2538

Conversation

davidkyle commented Jun 19, 2023

droberts195 left a comment

Choose a reason for hiding this comment