Updated llm demo to report number of generated tokens in the last response #2373

bstrzele · 2024-03-19T14:47:18Z

CVS-135176

demos/python_demos/llm_text_generation/servable_stream/model.py

dkalinowski · 2024-03-19T15:27:55Z

demos/python_demos/llm_text_generation/servable_stream/model.py

        def generate():
-            ov_model_exec.generate(**tokens, **generate_kwargs)
+            result = ov_model_exec.generate(**tokens, **generate_kwargs)


What's the reason we do not include special tokens? What are those special tokens?

In this case special tokens are used for left/right padding. They are stripped to assure accurate token counting when batch_size > 1.

demos/python_demos/llm_text_generation/servable_unary/model.py

dtrawins · 2024-03-19T19:46:53Z

include output changes in readme

demos/python_demos/llm_text_generation/servable_unary/model.py

dtrawins · 2024-04-03T06:43:03Z

demos/python_demos/llm_text_generation/servable_stream/model.py

@@ -181,6 +181,7 @@ def generate():
        for partial_result in streamer:
            yield serialize_completions(batch_size, partial_result)
        t1.join()
+        token_count[0] -= len(tokens["input_ids"].flatten())
        yield [Tensor("token_count", np.array([token_count[0]], dtype=np.int32))]


why only 1st element of token_count? Was it tested with bs>1 with variable response size?

First element of token_count represents final number of tokens. This snipped uses list instead of single variable due to the mutable property of python's list type.

I think @dtrawins misunderstood token_count variable - it is always 1 element array in order to return it as numpy array.

But @dtrawins makes good point here - @bstrzele should we track token count per batch or just sum of tokens in all of batches?

why are you re-creating numpy array here? token_count is an array anyway

I currently do not understand the value in having token reporting fragmented per batch, it would be a simple change

dkalinowski · 2024-04-08T09:46:06Z

demos/python_demos/llm_text_generation/servable_unary/model.py

-
-        return serialize_completions(batch_size, completions)
+        t1.join()
+        return serialize_completions(batch_size, completions, token_count[0])


I'm not sure why are working on a numpy array, just to return a number in line 191. Then, in serialize_completions you create numpy array anyway Tensor("token_count", np.array([token_count], dtype=np.int32)) @bstrzele

demos/python_demos/llm_text_generation/README.md

bstrzele requested review from dtrawins and dkalinowski March 19, 2024 14:47

dkalinowski reviewed Mar 19, 2024

View reviewed changes

demos/python_demos/llm_text_generation/servable_stream/model.py Outdated Show resolved Hide resolved

dkalinowski reviewed Mar 19, 2024

View reviewed changes

demos/python_demos/llm_text_generation/servable_unary/model.py Outdated Show resolved Hide resolved

dtrawins reviewed Mar 19, 2024

View reviewed changes

demos/python_demos/llm_text_generation/servable_unary/model.py Outdated Show resolved Hide resolved

bstrzele requested review from dtrawins and dkalinowski March 20, 2024 08:37

dtrawins reviewed Mar 20, 2024

View reviewed changes

demos/python_demos/llm_text_generation/servable_unary/model.py Outdated Show resolved Hide resolved

bstrzele requested a review from dtrawins April 2, 2024 07:26

dtrawins reviewed Apr 3, 2024

View reviewed changes

dkalinowski reviewed Apr 8, 2024

View reviewed changes

dtrawins reviewed Apr 11, 2024

View reviewed changes

demos/python_demos/llm_text_generation/README.md Outdated Show resolved Hide resolved

bstrzele force-pushed the llm_token_reporting branch 2 times, most recently from 22b0135 to a899698 Compare April 15, 2024 13:56

bstrzele requested review from dtrawins and dkalinowski April 15, 2024 13:57

dtrawins requested review from ngrozae and mzegla April 15, 2024 14:05

bstrzele added 2 commits April 15, 2024 16:32

added token reporting

1dae114

s to ms

94f2cc8

bstrzele force-pushed the llm_token_reporting branch from 3c84f44 to 94f2cc8 Compare April 15, 2024 14:32

bstrzele added 2 commits April 15, 2024 17:12

updated readme

4023956

ci fix

a39d82f

ngrozae approved these changes Apr 16, 2024

View reviewed changes

fix regexp

5f5b1ef

dtrawins approved these changes Apr 16, 2024

View reviewed changes

dtrawins merged commit 8435706 into main Apr 16, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated llm demo to report number of generated tokens in the last response #2373

Updated llm demo to report number of generated tokens in the last response #2373

bstrzele commented Mar 19, 2024

dkalinowski Mar 19, 2024

bstrzele Mar 20, 2024

dtrawins commented Mar 19, 2024

dtrawins Apr 3, 2024

bstrzele Apr 8, 2024

dkalinowski Apr 8, 2024

dkalinowski Apr 8, 2024

bstrzele Apr 8, 2024

dkalinowski Apr 8, 2024 •

edited

Loading

Updated llm demo to report number of generated tokens in the last response #2373

Updated llm demo to report number of generated tokens in the last response #2373

Conversation

bstrzele commented Mar 19, 2024

dkalinowski Mar 19, 2024

Choose a reason for hiding this comment

bstrzele Mar 20, 2024

Choose a reason for hiding this comment

dtrawins commented Mar 19, 2024

dtrawins Apr 3, 2024

Choose a reason for hiding this comment

bstrzele Apr 8, 2024

Choose a reason for hiding this comment

dkalinowski Apr 8, 2024

Choose a reason for hiding this comment

dkalinowski Apr 8, 2024

Choose a reason for hiding this comment

bstrzele Apr 8, 2024

Choose a reason for hiding this comment

dkalinowski Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

dkalinowski Apr 8, 2024 •

edited

Loading