[Pipeline Refactor] Additional Operators, Route update and completed generation functionality #1356

dsikka · 2023-10-26T02:20:30Z

Summary

Further adding operators specific to generating a new token
Updates router to consider the new steps

Testing

So far tested locally using different inputs

Example:

from deepsparse.transformers.pipelines.text_generation import TextGenerationInput
from deepsparse.v2.text_generation.pipeline import TextGenerationPipeline

prompt = "Hello there, how are you?"
model_path = "hf:mgoin/TinyStories-1M-deepsparse"
pipeline = TextGenerationPipeline(model_path, prompt_sequence_length=16)
input_values = TextGenerationInput(prompt=prompt)
output = pipeline(input_values)
print(output)

Output:

created=datetime.datetime(2023, 10, 26, 23, 8, 51, 790177) prompts='Hello there, how are you?' generations=[GeneratedText(text=" I'm just so happy to see you. I", score=None, finished=True, finished_reason='length')]

bfineran

looks great - really like how we're able to build cleanly on top of the base and stick within the framework. only part that I'm still working through is the use of state.

will be great once we can get each of these operators unit tested

src/deepsparse/v2/text_generation/compile_generated_tokens.py

src/deepsparse/v2/text_generation/generate_new_token.py

bfineran · 2023-11-02T13:48:16Z

src/deepsparse/v2/text_generation/generate_new_token.py

+            finish_reason = FinishReason.CALLBACK
+
+        max_tokens = inference_state.current_state.get("max_tokens")
+        if len(inference_state.current_state.get("generated_tokens")) + 1 == max_tokens:


is there a reason we don't go from the generation config directly here?

src/deepsparse/v2/text_generation/pipeline.py

dbogunowicz · 2023-11-03T08:29:53Z

src/deepsparse/v2/text_generation/__init__.py

@@ -13,12 +13,19 @@
 # limitations under the License.
 # flake8: noqa
 from .autoregressive_preprocess_operator import *


autoregressive_preprocess_operator essentially does the same thing as multi_engine_prefill_operator, but they have slightly different names. Let's standardize it.

Also, why some scripts are named [name].py and some are named [name]_operator.py? This indicates that some scripts contain operators and some don't, and that's a false assumption here. Let's also standardize this.

Also, it would be great to have those scripts arranged in some subdirectories, so that they are logically grouped and reduce mental strain of parsing them.

Until we've unit tested everything, the design will change. Happy to harden names and file locations once we've doen that.

dbogunowicz · 2023-11-03T08:30:38Z

src/deepsparse/v2/text_generation/autoregressive_preprocess_operator.py

@@ -51,16 +50,19 @@ def can_operate(self, inp: Any) -> bool:
        tokens = inp.get("tokens")
        kv_cache = inp.get("kv_cache")

+        if inp.get("in_generation"):


what is "in_generation"?

flag to figure out if we're in the generation loop or prompt inference

dbogunowicz · 2023-11-03T08:32:01Z

src/deepsparse/v2/text_generation/generate_new_token.py

+class GenerateNewTokenOperator(Operator):
+    def __init__(
+        self, tokenizer: transformers.PreTrainedTokenizerBase, force_max_tokens: bool
+    ):


Can we make sure that we have all the docstrings available for the new classes/methods?

yes, see comment above about doing this in a follow-up

dbogunowicz · 2023-11-03T08:33:00Z

src/deepsparse/v2/text_generation/nl_engine_operator.py

@@ -36,6 +36,7 @@ class NlEngineInput(BaseModel):
    engine_inputs: List = Field(description="engine inputs")
    kv_cache: Any = Field(description="kv_cache object")
    tokens: List = Field(description="tokens")
+    in_generation: bool = Field(description="in_generation", default=None)


This is roughly the final version of the code, right? Let's make sure that the descriptions are informative.

No, not until we've until tested everything.

src/deepsparse/v2/text_generation/pipeline.py

src/deepsparse/v2/text_generation/process_inputs.py

* Pipelines Refactor - Initial Impl (#1287) * [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325) * initial functionality and working example with image classification * remove testing image * update args * initial functionality and working example with image classification * remove testing image * pr comments * defines schemas for operators and test * add image classification test, PR comments * fix input/output handling in pipeline and operator base classes to be more generic; remove context * add additional operator input message * typo fix * [v2] EngineOperator updates to make continuous batching easier (#1371) * [v2] EngineOperator updates to make continuous batching easier * test fixes * [Pipeline Refactor] Update routes, text generation initial functionality (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes * [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * move map to base class * [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392) * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * fix name * [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373) * [Continuous Batching] Queue Implementation to support batching grouping and prioritization * has_key method * thread safety * add blocking option for pop_batch * update docstring * allow mutex to be shared across continuous batching objects * revert last commit * [Continuous Batching] Executor thread for running continuous batching (#1374) * [Continuous Batching] Executor thread for running continuous batching * quality * ensure that executor stops when main thread does - clean up test hack * [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375) * [ContinuousBatching] ContinuousBatchingScheduler Implementation * cleanup unnecessary stop condition * [continuous batching] singleton pattern for scheduler (#1391) * [continuous batching] singleton pattern for scheduler * catch from review * [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364) * rebasing off my initial commit * cleanups * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> * [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * pipeline runs, but incorrectly * Revert "pipeline runs, but incorrectly" This reverts commit 51c4ee6. * PR review comments --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> * [Text Generation][V2] End-to-end tests (#1402) * initial commit * initial commit * its working now * beautification * thank you Dipika <3 * ready to review * [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409) * update split/join * use map * update * run end-to-end * clean-up * fix bug with batch size, introduce SplitRoute dataclass * update tests to use new inputs/outputs * use the normal scheduler for internal kv_cache * add pipeline inpuits * clean-up * change engine type, update docstrings, update override function to be more generic * move subgraph functionality to its own function; clean-up cont batching in text gen pipeline * update linear pathway to also use subgraph execution * rebase fix * fix tests * [Pipeline Refactor] Operator Registry (#1420) * initial registry functionality * use sparsezoo mixin * [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution (#1453) * fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type * fix warning message * [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457) * add pipeline create method for pipeline creation using the operator registry * add instance check * [Pipeline Refactor] async (#1380) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * async initial functionality * fix capacity settting again * add blocking * more testing * update to use split/join * fix * rebase fix * remove index * change event loop * rebase fix * update async run to use new operator scheduling properly * rebase fixes (#1458) * more fixes (#1459) --------- Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

…generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes

* Pipelines Refactor - Initial Impl (#1287) * [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325) * initial functionality and working example with image classification * remove testing image * update args * initial functionality and working example with image classification * remove testing image * pr comments * defines schemas for operators and test * add image classification test, PR comments * fix input/output handling in pipeline and operator base classes to be more generic; remove context * add additional operator input message * typo fix * [v2] EngineOperator updates to make continuous batching easier (#1371) * [v2] EngineOperator updates to make continuous batching easier * test fixes * [Pipeline Refactor] Update routes, text generation initial functionality (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * move map to base class * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392) * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * fix name * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373) * [Continuous Batching] Queue Implementation to support batching grouping and prioritization * has_key method * thread safety * add blocking option for pop_batch * update docstring * allow mutex to be shared across continuous batching objects * revert last commit * [Continuous Batching] Executor thread for running continuous batching (#1374) * [Continuous Batching] Executor thread for running continuous batching * quality * ensure that executor stops when main thread does - clean up test hack * [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375) * [ContinuousBatching] ContinuousBatchingScheduler Implementation * cleanup unnecessary stop condition * [continuous batching] singleton pattern for scheduler (#1391) * [continuous batching] singleton pattern for scheduler * catch from review * [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364) * rebasing off my initial commit * cleanups * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> * pipeline runs, but incorrectly * it works for a single sequence * cleanup. now lets figure out how to run multiple sequences * [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * pipeline runs, but incorrectly * Revert "pipeline runs, but incorrectly" This reverts commit 51c4ee6. * PR review comments --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> * [Text Generation][V2] End-to-end tests (#1402) * initial commit * initial commit * its working now * beautification * thank you Dipika <3 * ready to review * integration tests pass * [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409) * update split/join * use map * update * run end-to-end * clean-up * fix bug with batch size, introduce SplitRoute dataclass * update tests to use new inputs/outputs * use the normal scheduler for internal kv_cache * add pipeline inpuits * clean-up * change engine type, update docstrings, update override function to be more generic * move subgraph functionality to its own function; clean-up cont batching in text gen pipeline * update linear pathway to also use subgraph execution * rebase fix * fix tests * [Pipeline Refactor] Operator Registry (#1420) * initial registry functionality * use sparsezoo mixin * fix tricky rebase * one more cleanup * got tests to work after rebase. implementing SPLIT and JOIN in linearouter now * pipeline working, with GraphRouter. Needs some more testing * ready for review * cleanup * simplify after PR review round * [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution (#1453) * fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type * fix warning message * [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457) * add pipeline create method for pipeline creation using the operator registry * add instance check * [Pipeline Refactor] async (#1380) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * async initial functionality * fix capacity settting again * add blocking * more testing * update to use split/join * fix * rebase fix * remove index * change event loop * rebase fix * update async run to use new operator scheduling properly * rebase fixes (#1458) * more fixes (#1459) * bring back functionalities that were lost in v2 during rebasing * Update src/deepsparse/transformers/helpers.py * ready for review * bring tests back" * quality * original readme * addressing Dipikas comments * Update src/deepsparse/transformers/pipelines/text_generation/pipeline_no_kv_cache.py * addressing PR review --------- Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

dsikka marked this pull request as ready for review October 27, 2023 03:04

dsikka changed the title ~~[Pipeline Refactor] Additional Operators, Route Update and Generation functionality~~ [Pipeline Refactor] Additional Operators, Route update and completed generation functionality Oct 27, 2023

dsikka requested review from bfineran, dbogunowicz, rahul-tuli and Satrat October 27, 2023 03:13

dsikka mentioned this pull request Oct 27, 2023

[Pipeline Refactor] Feature branch for v2 text-generation #1358

Closed

dsikka force-pushed the features/v2/prompt_inference branch from 3721907 to 6007a75 Compare November 1, 2023 00:50

dsikka added 11 commits November 1, 2023 12:36

initial functionality and working example with image classification

6c75b65

remove testing image

75de103

rebase fixes

aa5d885

initial functionality and working example with image classification

8cc63ee

text gen

ab2b711

updates func

00cb85e

prompt inference, initial functionality

5cf4b3f

remove image; update state docstring

1b951dc

Fix typo

809cfc1

add todo for split/join

6336d8e

remove context, clean-up args, remove prefill_preprocess_operaator

3f2193d

dsikka force-pushed the features/v2/prompt_inference branch from 625a1c3 to 3f2193d Compare November 1, 2023 16:45

dsikka added 10 commits November 1, 2023 14:13

fix docstrings

216ceea

initial functionality and working example with image classification

02b74d4

updates func

37f090c

prompt inference, initial functionality

7bd25da

finish generation operators and update routes

98bc123

further breakdown operators

ef8277b

add operators

664abdd

fix can_operate condition

754ce2c

update can_operate to not rely on the inference_state

ed7cd58

rebase + update

5d56421

dsikka force-pushed the features/v2/generation branch from 50d0aa3 to 5d56421 Compare November 1, 2023 20:38

bfineran reviewed Nov 2, 2023

View reviewed changes

dsikka added 2 commits November 2, 2023 13:14

fix condition

5086e1f

fix capacity settting again

740eb67

Base automatically changed from features/v2/prompt_inference to v2 November 3, 2023 00:47

Merge branch 'v2' into features/v2/generation

c991c30

dbogunowicz reviewed Nov 3, 2023

View reviewed changes

typo fixes

209c0ec

dsikka merged commit 59457b7 into v2 Nov 3, 2023

dsikka deleted the features/v2/generation branch November 3, 2023 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipeline Refactor] Additional Operators, Route update and completed generation functionality #1356

[Pipeline Refactor] Additional Operators, Route update and completed generation functionality #1356

dsikka commented Oct 26, 2023 •

edited

Loading

bfineran left a comment

bfineran Nov 2, 2023

dbogunowicz Nov 3, 2023

dsikka Nov 3, 2023

dbogunowicz Nov 3, 2023

dsikka Nov 3, 2023

dbogunowicz Nov 3, 2023

dsikka Nov 3, 2023

dbogunowicz Nov 3, 2023

dsikka Nov 3, 2023

[Pipeline Refactor] Additional Operators, Route update and completed generation functionality #1356

[Pipeline Refactor] Additional Operators, Route update and completed generation functionality #1356

Conversation

dsikka commented Oct 26, 2023 • edited Loading

Summary

Testing

bfineran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsikka commented Oct 26, 2023 •

edited

Loading