Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipeline Refactor] Additional Operators, Route update and completed generation functionality #1356

Merged
merged 25 commits into from
Nov 3, 2023

Conversation

dsikka
Copy link
Contributor

@dsikka dsikka commented Oct 26, 2023

Summary

Screenshot 2023-11-01 at 5 03 32 PM

  • Further adding operators specific to generating a new token
  • Updates router to consider the new steps

Testing

  • So far tested locally using different inputs

Example:

from deepsparse.transformers.pipelines.text_generation import TextGenerationInput
from deepsparse.v2.text_generation.pipeline import TextGenerationPipeline

prompt = "Hello there, how are you?"
model_path = "hf:mgoin/TinyStories-1M-deepsparse"
pipeline = TextGenerationPipeline(model_path, prompt_sequence_length=16)
input_values = TextGenerationInput(prompt=prompt)
output = pipeline(input_values)
print(output)

Output:

created=datetime.datetime(2023, 10, 26, 23, 8, 51, 790177) prompts='Hello there, how are you?' generations=[GeneratedText(text=" I'm just so happy to see you. I", score=None, finished=True, finished_reason='length')]

@dsikka dsikka marked this pull request as ready for review October 27, 2023 03:04
@dsikka dsikka changed the title [Pipeline Refactor] Additional Operators, Route Update and Generation functionality [Pipeline Refactor] Additional Operators, Route update and completed generation functionality Oct 27, 2023
@dsikka dsikka force-pushed the features/v2/prompt_inference branch from 3721907 to 6007a75 Compare November 1, 2023 00:50
@dsikka dsikka force-pushed the features/v2/prompt_inference branch from 625a1c3 to 3f2193d Compare November 1, 2023 16:45
Copy link
Member

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great - really like how we're able to build cleanly on top of the base and stick within the framework. only part that I'm still working through is the use of state.

will be great once we can get each of these operators unit tested

src/deepsparse/v2/text_generation/generate_new_token.py Outdated Show resolved Hide resolved
finish_reason = FinishReason.CALLBACK

max_tokens = inference_state.current_state.get("max_tokens")
if len(inference_state.current_state.get("generated_tokens")) + 1 == max_tokens:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we don't go from the generation config directly here?

src/deepsparse/v2/text_generation/pipeline.py Show resolved Hide resolved
src/deepsparse/v2/text_generation/pipeline.py Outdated Show resolved Hide resolved
src/deepsparse/v2/text_generation/pipeline.py Outdated Show resolved Hide resolved
Base automatically changed from features/v2/prompt_inference to v2 November 3, 2023 00:47
@@ -13,12 +13,19 @@
# limitations under the License.
# flake8: noqa
from .autoregressive_preprocess_operator import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoregressive_preprocess_operator essentially does the same thing as multi_engine_prefill_operator, but they have slightly different names. Let's standardize it.

Also, why some scripts are named [name].py and some are named [name]_operator.py? This indicates that some scripts contain operators and some don't, and that's a false assumption here. Let's also standardize this.

Also, it would be great to have those scripts arranged in some subdirectories, so that they are logically grouped and reduce mental strain of parsing them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we've unit tested everything, the design will change. Happy to harden names and file locations once we've doen that.

@@ -51,16 +50,19 @@ def can_operate(self, inp: Any) -> bool:
tokens = inp.get("tokens")
kv_cache = inp.get("kv_cache")

if inp.get("in_generation"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "in_generation"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag to figure out if we're in the generation loop or prompt inference

class GenerateNewTokenOperator(Operator):
def __init__(
self, tokenizer: transformers.PreTrainedTokenizerBase, force_max_tokens: bool
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure that we have all the docstrings available for the new classes/methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, see comment above about doing this in a follow-up

@@ -36,6 +36,7 @@ class NlEngineInput(BaseModel):
engine_inputs: List = Field(description="engine inputs")
kv_cache: Any = Field(description="kv_cache object")
tokens: List = Field(description="tokens")
in_generation: bool = Field(description="in_generation", default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is roughly the final version of the code, right? Let's make sure that the descriptions are informative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not until we've until tested everything.

src/deepsparse/v2/text_generation/pipeline.py Outdated Show resolved Hide resolved
src/deepsparse/v2/text_generation/pipeline.py Show resolved Hide resolved
src/deepsparse/v2/text_generation/process_inputs.py Outdated Show resolved Hide resolved
@dsikka dsikka merged commit 59457b7 into v2 Nov 3, 2023
@dsikka dsikka deleted the features/v2/generation branch November 3, 2023 15:15
bfineran added a commit that referenced this pull request Dec 6, 2023
* Pipelines Refactor - Initial Impl (#1287)

* [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325)

* initial functionality and working example with image classification

* remove testing image

* update args

* initial functionality and working example with image classification

* remove testing image

* pr comments

* defines schemas for operators and test

* add image classification test, PR comments

* fix input/output handling in pipeline and operator base classes to be more generic; remove context

* add additional operator input message

* typo fix

* [v2] EngineOperator updates to make continuous batching easier (#1371)

* [v2] EngineOperator updates to make continuous batching easier

* test fixes

* [Pipeline Refactor] Update routes, text generation initial functionality (#1348)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* fix capacity settting again

* typo fixes

* [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384)

* add split/join functionality

* update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function

* process multiple generations

* move map to base class

* [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392)

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* fix name

* [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373)

* [Continuous Batching] Queue Implementation to support batching grouping and prioritization

* has_key method

* thread safety

* add blocking option for pop_batch

* update docstring

* allow mutex to be shared across continuous batching objects

* revert last commit

* [Continuous Batching] Executor thread for running continuous batching (#1374)

* [Continuous Batching] Executor thread for running continuous batching

* quality

* ensure that executor stops when main thread does - clean up test hack

* [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375)

* [ContinuousBatching] ContinuousBatchingScheduler Implementation

* cleanup unnecessary stop condition

* [continuous batching] singleton pattern for scheduler (#1391)

* [continuous batching] singleton pattern for scheduler

* catch from review

* [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364)

* rebasing off my initial commit

* cleanups

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

* [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394)

* add split/join functionality

* update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function

* process multiple generations

* initial commit

* fix error

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

* pipeline runs, but incorrectly

* Revert "pipeline runs, but incorrectly"

This reverts commit 51c4ee6.

* PR review comments

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

* [Text Generation][V2] End-to-end tests (#1402)

* initial commit

* initial commit

* its working now

* beautification

* thank you Dipika <3

* ready to review

* [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409)

* update split/join

* use map

* update

* run end-to-end

* clean-up

* fix bug with batch size, introduce SplitRoute dataclass

* update tests to use new inputs/outputs

* use the normal scheduler for internal kv_cache

* add pipeline inpuits

* clean-up

* change engine type, update docstrings, update override function to be more generic

* move subgraph functionality to its own function; clean-up cont batching in text gen pipeline

* update linear pathway to also use subgraph execution

* rebase fix

* fix tests

* [Pipeline Refactor] Operator Registry (#1420)

* initial registry functionality

* use sparsezoo mixin

* [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution  (#1453)

* fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type

* fix warning message

* [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457)

* add pipeline create method for pipeline creation using the operator registry

* add instance check

* [Pipeline Refactor] async (#1380)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* async initial functionality

* fix capacity settting again

* add blocking

* more testing

* update to use split/join

* fix

* rebase fix

* remove index

* change event loop

* rebase fix

* update async run to use new operator scheduling properly

* rebase fixes (#1458)

* more fixes (#1459)

---------

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
dbogunowicz pushed a commit that referenced this pull request Dec 18, 2023
…generation functionality (#1356)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* fix capacity settting again

* typo fixes
dbogunowicz added a commit that referenced this pull request Jan 2, 2024
* Pipelines Refactor - Initial Impl (#1287)

* [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325)

* initial functionality and working example with image classification

* remove testing image

* update args

* initial functionality and working example with image classification

* remove testing image

* pr comments

* defines schemas for operators and test

* add image classification test, PR comments

* fix input/output handling in pipeline and operator base classes to be more generic; remove context

* add additional operator input message

* typo fix

* [v2] EngineOperator updates to make continuous batching easier (#1371)

* [v2] EngineOperator updates to make continuous batching easier

* test fixes

* [Pipeline Refactor] Update routes, text generation initial functionality (#1348)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* fix capacity settting again

* typo fixes

* add split/join functionality

* update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function

* process multiple generations

* initial commit

* fix error

* [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384)

* add split/join functionality

* update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function

* process multiple generations

* move map to base class

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392)

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* fix name

* Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

* [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373)

* [Continuous Batching] Queue Implementation to support batching grouping and prioritization

* has_key method

* thread safety

* add blocking option for pop_batch

* update docstring

* allow mutex to be shared across continuous batching objects

* revert last commit

* [Continuous Batching] Executor thread for running continuous batching (#1374)

* [Continuous Batching] Executor thread for running continuous batching

* quality

* ensure that executor stops when main thread does - clean up test hack

* [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375)

* [ContinuousBatching] ContinuousBatchingScheduler Implementation

* cleanup unnecessary stop condition

* [continuous batching] singleton pattern for scheduler (#1391)

* [continuous batching] singleton pattern for scheduler

* catch from review

* [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364)

* rebasing off my initial commit

* cleanups

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

* pipeline runs, but incorrectly

* it works for a single sequence

* cleanup. now lets figure out how to run multiple sequences

* [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394)

* add split/join functionality

* update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function

* process multiple generations

* initial commit

* fix error

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

* pipeline runs, but incorrectly

* Revert "pipeline runs, but incorrectly"

This reverts commit 51c4ee6.

* PR review comments

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

* [Text Generation][V2] End-to-end tests (#1402)

* initial commit

* initial commit

* its working now

* beautification

* thank you Dipika <3

* ready to review

* integration tests pass

* [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409)

* update split/join

* use map

* update

* run end-to-end

* clean-up

* fix bug with batch size, introduce SplitRoute dataclass

* update tests to use new inputs/outputs

* use the normal scheduler for internal kv_cache

* add pipeline inpuits

* clean-up

* change engine type, update docstrings, update override function to be more generic

* move subgraph functionality to its own function; clean-up cont batching in text gen pipeline

* update linear pathway to also use subgraph execution

* rebase fix

* fix tests

* [Pipeline Refactor] Operator Registry (#1420)

* initial registry functionality

* use sparsezoo mixin

* fix tricky rebase

* one more cleanup

* got tests to work after rebase. implementing SPLIT and JOIN in linearouter now

* pipeline working, with GraphRouter. Needs some more testing

* ready for review

* cleanup

* simplify after PR review round

* [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution  (#1453)

* fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type

* fix warning message

* [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457)

* add pipeline create method for pipeline creation using the operator registry

* add instance check

* [Pipeline Refactor] async (#1380)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* async initial functionality

* fix capacity settting again

* add blocking

* more testing

* update to use split/join

* fix

* rebase fix

* remove index

* change event loop

* rebase fix

* update async run to use new operator scheduling properly

* rebase fixes (#1458)

* more fixes (#1459)

* bring back functionalities that were lost in v2 during rebasing

* Update src/deepsparse/transformers/helpers.py

* ready for review

* bring tests back"

* quality

* original readme

* addressing Dipikas comments

* Update src/deepsparse/transformers/pipelines/text_generation/pipeline_no_kv_cache.py

* addressing PR review

---------

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants