Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kvcache config for Mistral #1766

Merged
merged 3 commits into from
Oct 28, 2023
Merged

Add kvcache config for Mistral #1766

merged 3 commits into from
Oct 28, 2023

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented Oct 16, 2023

git clone https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
sparseml.transformers.export_onnx --model_path ./zephyr-7b-alpha --task text-generation --sequence_length 512 --trust_remote_code
cp deployment/model.onnx deployment/model-orig.onnx
python ~/onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx
>>> from deepsparse import TextGeneration
>>> model = TextGeneration(model="deployment")
>>> out = model("Once upon a time,", max_new_tokens=100)
>>> out.generations[0].text
'there was a young woman named Lily. She was a kind and gentle soul, with a heart full of love and compassion. Lily had always been fascinated by the natural world, and she spent most of her free time exploring the forests and fields around her home.\n\nOne day, as she was wandering through the woods, Lily stumbled upon a small clearing. In the center of the clearing, she saw a beautiful butterfly fluttering its wings. The butterfly was unlike any she had'

HF Baseline :)
Screenshot 2023-10-16 at 2 29 51 PM

@mgoin mgoin marked this pull request as ready for review October 16, 2023 18:33
dbogunowicz
dbogunowicz previously approved these changes Oct 16, 2023
Copy link
Contributor

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explanation/comment explaining why this is needed and how this differs than the other llama models?

Also, will this work for the 70b model? In that case, can we update the name of the config to cover what group of llama models this will cover?

@@ -138,6 +138,16 @@ class Config:
multiply_batch_by_num_att_heads=False,
)

MISTRAL_CONFIG = KeyValueCacheConfig(
model_name="mistral",
additional_transforms=AdditionalTransformsLLAMA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A warning will show up every time about the number of updated nodes if they aren't the standard, just an fyi

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should fix this for all Llama models

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsikka could you elaborate?

@dbogunowicz
Copy link
Contributor

@dsikka it should work for 70b model, but I am not sure whether @mgoin has the capacity to test it. I'd take a leap of faith and merge it regardless.

@mgoin
Copy link
Member Author

mgoin commented Oct 16, 2023

@dsikka it should work for 70b model, but I am not sure whether @mgoin has the capacity to test it. I'd take a leap of faith and merge it regardless.

Mistral actually uses a "completely" new model/config class so the config.json actually holds a "MistralForCausalLM" architecture. Here is an example config. There is only one pretrained Mistral arch model and it is 7b.

Based on how we're currently architecting this, I assume we need to make a new config entry for every new "model_type" we see in config.json files. This makes the Mistral models not exactly connected to the Llama model support for GQA, aka the 70b model y'all are talking about. So it is disconnected in practice, but I will put a comment that Mistral is very connected to Llama in origin.

We should make a separate PR in order to support Llama models with GQA enabled.

@dsikka
Copy link
Contributor

dsikka commented Oct 17, 2023

@mgoin from what we've seen though, it seems like the config you added could also be used for GQA?

@mgoin
Copy link
Member Author

mgoin commented Oct 17, 2023

@dsikka yes, it should work for Llama models with GQA if the name is changed to "llama". My position is that Llama GQA support is a different issue than Mistral support, but if y'all want to add it into this diff I'm good with that. I'm not sure how to structure the change without affecting non-GQA Llama models

@mgoin mgoin merged commit 955ae11 into main Oct 28, 2023
11 checks passed
@mgoin mgoin deleted the mistral-kvinject branch October 28, 2023 22:56
rahul-tuli pushed a commit that referenced this pull request Oct 30, 2023
* Add kvcache config for Mistral

* Update configs.py

* Update configs.py
rahul-tuli added a commit that referenced this pull request Oct 31, 2023
* - Update `src/sparseml/modifiers/obcq/pytorch.py`
to use layer prefix for from model
- Remove `layer_prefix` from `SparseGPTModifier` base
- Update ModelMetaData to include layer_prefix
- Added a convenience function to update missing
values in RecipeMetaData instance from another RecipeMetaData instance
- Update simplify recipe to also include metadata
- Update simplify_combine_recipes to include metadata
- Add layer_prefix property to `ModifiableModel`
- propagate `layer_prefix` to superclass
- update session.py to set_layer_prefix on the model
before initializing modifiers
- Update example recipe to include layer_prefix in metadata

* Add missing docstring

* - address review comment
- update docstring
- add test for `update_missing_metadata`

* Add test

* Style

* Fix tests

* Style

* [modifier refactor] Add constant pruning tests  (#1752)

* Initial commit

* Add end to end tests

* Add e2e tests for constant pruning modifier

* Move imports inside the test fuctions so
that torch isn't imported unless running the tests

* Update setup.py to not run modifier tests unless pytorch is specified

* [Bugfix] .dict() method on Recipe (#1753)

* Bugfix .dict() method on Recipe

* Remove extraneous local test, [faulty commit]

* [modifier refactor] Add serialization tests (#1755)

* Add serialization tests

* Clean up

* Keep original stage and group names
Clean up _get_yaml_dict

* fix comment

* Typo

* [Unit Tests][Modifier Refactor] (#1756)

* Move valid recipes to a helper file
Add tests for session.py

* Increase test coverage of src/sparseml/core/session.py
to 100%
Run Style
Add logs to .gitignore

* Increase coverage of tests/sparseml/core/test_state.py
to 100%

* add tests for lifecycle/event.py

* Increase code coverage of lifecycle/event to
100%

* increase lifecycle/session.py code coverage to 93%

* Address review comments from @Satrat

* Address review comments on 1752 (#1772)

Update makefile to only ignore *pytorch.py files in modifier dir
Fix order in test
Add regex to makefile
Add helper function to determine if torch tests should be run
Check masks
Make transformers import optional in sparsegpt.py

* Fix merge conflict

* Add more tests to check valid modifiers are created (#1774)

* [Bug][ConstantPruningModifier] Fix mask de register bug (#1773)

* Fix mask de-register logic

* forgot to remove commented out line

* Move tests inside pytorch directory as requested

* Fix session reset (#1790)

* fix datasets version to be compatible with fsspec (#1797)

* Add kvcache config for Mistral (#1766)

* Add kvcache config for Mistral

* Update configs.py

* Update configs.py

* Fix reset logic

* Style after resolving merge conflicts

---------

Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
bfineran pushed a commit that referenced this pull request Nov 16, 2023
* - Update `src/sparseml/modifiers/obcq/pytorch.py`
to use layer prefix for from model
- Remove `layer_prefix` from `SparseGPTModifier` base
- Update ModelMetaData to include layer_prefix
- Added a convenience function to update missing
values in RecipeMetaData instance from another RecipeMetaData instance
- Update simplify recipe to also include metadata
- Update simplify_combine_recipes to include metadata
- Add layer_prefix property to `ModifiableModel`
- propagate `layer_prefix` to superclass
- update session.py to set_layer_prefix on the model
before initializing modifiers
- Update example recipe to include layer_prefix in metadata

* Add missing docstring

* - address review comment
- update docstring
- add test for `update_missing_metadata`

* Add test

* Style

* Fix tests

* Style

* [modifier refactor] Add constant pruning tests  (#1752)

* Initial commit

* Add end to end tests

* Add e2e tests for constant pruning modifier

* Move imports inside the test fuctions so
that torch isn't imported unless running the tests

* Update setup.py to not run modifier tests unless pytorch is specified

* [Bugfix] .dict() method on Recipe (#1753)

* Bugfix .dict() method on Recipe

* Remove extraneous local test, [faulty commit]

* [modifier refactor] Add serialization tests (#1755)

* Add serialization tests

* Clean up

* Keep original stage and group names
Clean up _get_yaml_dict

* fix comment

* Typo

* [Unit Tests][Modifier Refactor] (#1756)

* Move valid recipes to a helper file
Add tests for session.py

* Increase test coverage of src/sparseml/core/session.py
to 100%
Run Style
Add logs to .gitignore

* Increase coverage of tests/sparseml/core/test_state.py
to 100%

* add tests for lifecycle/event.py

* Increase code coverage of lifecycle/event to
100%

* increase lifecycle/session.py code coverage to 93%

* Address review comments from @Satrat

* Address review comments on 1752 (#1772)

Update makefile to only ignore *pytorch.py files in modifier dir
Fix order in test
Add regex to makefile
Add helper function to determine if torch tests should be run
Check masks
Make transformers import optional in sparsegpt.py

* Fix merge conflict

* Add more tests to check valid modifiers are created (#1774)

* [Bug][ConstantPruningModifier] Fix mask de register bug (#1773)

* Fix mask de-register logic

* forgot to remove commented out line

* Move tests inside pytorch directory as requested

* Fix session reset (#1790)

* fix datasets version to be compatible with fsspec (#1797)

* Add kvcache config for Mistral (#1766)

* Add kvcache config for Mistral

* Update configs.py

* Update configs.py

* Fix reset logic

* Style after resolving merge conflicts

---------

Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
bfineran pushed a commit that referenced this pull request Nov 16, 2023
* Add kvcache config for Mistral

* Update configs.py

* Update configs.py
bfineran pushed a commit that referenced this pull request Nov 16, 2023
* - Update `src/sparseml/modifiers/obcq/pytorch.py`
to use layer prefix for from model
- Remove `layer_prefix` from `SparseGPTModifier` base
- Update ModelMetaData to include layer_prefix
- Added a convenience function to update missing
values in RecipeMetaData instance from another RecipeMetaData instance
- Update simplify recipe to also include metadata
- Update simplify_combine_recipes to include metadata
- Add layer_prefix property to `ModifiableModel`
- propagate `layer_prefix` to superclass
- update session.py to set_layer_prefix on the model
before initializing modifiers
- Update example recipe to include layer_prefix in metadata

* Add missing docstring

* - address review comment
- update docstring
- add test for `update_missing_metadata`

* Add test

* Style

* Fix tests

* Style

* [modifier refactor] Add constant pruning tests  (#1752)

* Initial commit

* Add end to end tests

* Add e2e tests for constant pruning modifier

* Move imports inside the test fuctions so
that torch isn't imported unless running the tests

* Update setup.py to not run modifier tests unless pytorch is specified

* [Bugfix] .dict() method on Recipe (#1753)

* Bugfix .dict() method on Recipe

* Remove extraneous local test, [faulty commit]

* [modifier refactor] Add serialization tests (#1755)

* Add serialization tests

* Clean up

* Keep original stage and group names
Clean up _get_yaml_dict

* fix comment

* Typo

* [Unit Tests][Modifier Refactor] (#1756)

* Move valid recipes to a helper file
Add tests for session.py

* Increase test coverage of src/sparseml/core/session.py
to 100%
Run Style
Add logs to .gitignore

* Increase coverage of tests/sparseml/core/test_state.py
to 100%

* add tests for lifecycle/event.py

* Increase code coverage of lifecycle/event to
100%

* increase lifecycle/session.py code coverage to 93%

* Address review comments from @Satrat

* Address review comments on 1752 (#1772)

Update makefile to only ignore *pytorch.py files in modifier dir
Fix order in test
Add regex to makefile
Add helper function to determine if torch tests should be run
Check masks
Make transformers import optional in sparsegpt.py

* Fix merge conflict

* Add more tests to check valid modifiers are created (#1774)

* [Bug][ConstantPruningModifier] Fix mask de register bug (#1773)

* Fix mask de-register logic

* forgot to remove commented out line

* Move tests inside pytorch directory as requested

* Fix session reset (#1790)

* fix datasets version to be compatible with fsspec (#1797)

* Add kvcache config for Mistral (#1766)

* Add kvcache config for Mistral

* Update configs.py

* Update configs.py

* Fix reset logic

* Style after resolving merge conflicts

---------

Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
bfineran pushed a commit that referenced this pull request Nov 16, 2023
* Add kvcache config for Mistral

* Update configs.py

* Update configs.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants