Add kvcache config for Mistral #1766

mgoin · 2023-10-16T18:26:07Z

git clone https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
sparseml.transformers.export_onnx --model_path ./zephyr-7b-alpha --task text-generation --sequence_length 512 --trust_remote_code
cp deployment/model.onnx deployment/model-orig.onnx
python ~/onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx

>>> from deepsparse import TextGeneration
>>> model = TextGeneration(model="deployment")
>>> out = model("Once upon a time,", max_new_tokens=100)
>>> out.generations[0].text
'there was a young woman named Lily. She was a kind and gentle soul, with a heart full of love and compassion. Lily had always been fascinated by the natural world, and she spent most of her free time exploring the forests and fields around her home.\n\nOne day, as she was wandering through the woods, Lily stumbled upon a small clearing. In the center of the clearing, she saw a beautiful butterfly fluttering its wings. The butterfly was unlike any she had'

HF Baseline :)

dsikka

Can you add an explanation/comment explaining why this is needed and how this differs than the other llama models?

Also, will this work for the 70b model? In that case, can we update the name of the config to cover what group of llama models this will cover?

dsikka · 2023-10-16T18:52:02Z

src/sparseml/exporters/transforms/kv_cache/configs.py

@@ -138,6 +138,16 @@ class Config:
    multiply_batch_by_num_att_heads=False,
 )

+MISTRAL_CONFIG = KeyValueCacheConfig(
+    model_name="mistral",
+    additional_transforms=AdditionalTransformsLLAMA,


A warning will show up every time about the number of updated nodes if they aren't the standard, just an fyi

Yeah, we should fix this for all Llama models

@dsikka could you elaborate?

dbogunowicz · 2023-10-16T19:10:32Z

@dsikka it should work for 70b model, but I am not sure whether @mgoin has the capacity to test it. I'd take a leap of faith and merge it regardless.

mgoin · 2023-10-16T19:22:34Z

@dsikka it should work for 70b model, but I am not sure whether @mgoin has the capacity to test it. I'd take a leap of faith and merge it regardless.

Mistral actually uses a "completely" new model/config class so the config.json actually holds a "MistralForCausalLM" architecture. Here is an example config. There is only one pretrained Mistral arch model and it is 7b.

Based on how we're currently architecting this, I assume we need to make a new config entry for every new "model_type" we see in config.json files. This makes the Mistral models not exactly connected to the Llama model support for GQA, aka the 70b model y'all are talking about. So it is disconnected in practice, but I will put a comment that Mistral is very connected to Llama in origin.

We should make a separate PR in order to support Llama models with GQA enabled.

dsikka · 2023-10-17T13:32:13Z

@mgoin from what we've seen though, it seems like the config you added could also be used for GQA?

mgoin · 2023-10-17T14:07:47Z

@dsikka yes, it should work for Llama models with GQA if the name is changed to "llama". My position is that Llama GQA support is a different issue than Mistral support, but if y'all want to add it into this diff I'm good with that. I'm not sure how to structure the change without affecting non-GQA Llama models

* Add kvcache config for Mistral * Update configs.py * Update configs.py

@Satrat

* - Update `src/sparseml/modifiers/obcq/pytorch.py` to use layer prefix for from model - Remove `layer_prefix` from `SparseGPTModifier` base - Update ModelMetaData to include layer_prefix - Added a convenience function to update missing values in RecipeMetaData instance from another RecipeMetaData instance - Update simplify recipe to also include metadata - Update simplify_combine_recipes to include metadata - Add layer_prefix property to `ModifiableModel` - propagate `layer_prefix` to superclass - update session.py to set_layer_prefix on the model before initializing modifiers - Update example recipe to include layer_prefix in metadata * Add missing docstring * - address review comment - update docstring - add test for `update_missing_metadata` * Add test * Style * Fix tests * Style * [modifier refactor] Add constant pruning tests (#1752) * Initial commit * Add end to end tests * Add e2e tests for constant pruning modifier * Move imports inside the test fuctions so that torch isn't imported unless running the tests * Update setup.py to not run modifier tests unless pytorch is specified * [Bugfix] .dict() method on Recipe (#1753) * Bugfix .dict() method on Recipe * Remove extraneous local test, [faulty commit] * [modifier refactor] Add serialization tests (#1755) * Add serialization tests * Clean up * Keep original stage and group names Clean up _get_yaml_dict * fix comment * Typo * [Unit Tests][Modifier Refactor] (#1756) * Move valid recipes to a helper file Add tests for session.py * Increase test coverage of src/sparseml/core/session.py to 100% Run Style Add logs to .gitignore * Increase coverage of tests/sparseml/core/test_state.py to 100% * add tests for lifecycle/event.py * Increase code coverage of lifecycle/event to 100% * increase lifecycle/session.py code coverage to 93% * Address review comments from @Satrat * Address review comments on 1752 (#1772) Update makefile to only ignore *pytorch.py files in modifier dir Fix order in test Add regex to makefile Add helper function to determine if torch tests should be run Check masks Make transformers import optional in sparsegpt.py * Fix merge conflict * Add more tests to check valid modifiers are created (#1774) * [Bug][ConstantPruningModifier] Fix mask de register bug (#1773) * Fix mask de-register logic * forgot to remove commented out line * Move tests inside pytorch directory as requested * Fix session reset (#1790) * fix datasets version to be compatible with fsspec (#1797) * Add kvcache config for Mistral (#1766) * Add kvcache config for Mistral * Update configs.py * Update configs.py * Fix reset logic * Style after resolving merge conflicts --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>

@Satrat

* - Update `src/sparseml/modifiers/obcq/pytorch.py` to use layer prefix for from model - Remove `layer_prefix` from `SparseGPTModifier` base - Update ModelMetaData to include layer_prefix - Added a convenience function to update missing values in RecipeMetaData instance from another RecipeMetaData instance - Update simplify recipe to also include metadata - Update simplify_combine_recipes to include metadata - Add layer_prefix property to `ModifiableModel` - propagate `layer_prefix` to superclass - update session.py to set_layer_prefix on the model before initializing modifiers - Update example recipe to include layer_prefix in metadata * Add missing docstring * - address review comment - update docstring - add test for `update_missing_metadata` * Add test * Style * Fix tests * Style * [modifier refactor] Add constant pruning tests (#1752) * Initial commit * Add end to end tests * Add e2e tests for constant pruning modifier * Move imports inside the test fuctions so that torch isn't imported unless running the tests * Update setup.py to not run modifier tests unless pytorch is specified * [Bugfix] .dict() method on Recipe (#1753) * Bugfix .dict() method on Recipe * Remove extraneous local test, [faulty commit] * [modifier refactor] Add serialization tests (#1755) * Add serialization tests * Clean up * Keep original stage and group names Clean up _get_yaml_dict * fix comment * Typo * [Unit Tests][Modifier Refactor] (#1756) * Move valid recipes to a helper file Add tests for session.py * Increase test coverage of src/sparseml/core/session.py to 100% Run Style Add logs to .gitignore * Increase coverage of tests/sparseml/core/test_state.py to 100% * add tests for lifecycle/event.py * Increase code coverage of lifecycle/event to 100% * increase lifecycle/session.py code coverage to 93% * Address review comments from @Satrat * Address review comments on 1752 (#1772) Update makefile to only ignore *pytorch.py files in modifier dir Fix order in test Add regex to makefile Add helper function to determine if torch tests should be run Check masks Make transformers import optional in sparsegpt.py * Fix merge conflict * Add more tests to check valid modifiers are created (#1774) * [Bug][ConstantPruningModifier] Fix mask de register bug (#1773) * Fix mask de-register logic * forgot to remove commented out line * Move tests inside pytorch directory as requested * Fix session reset (#1790) * fix datasets version to be compatible with fsspec (#1797) * Add kvcache config for Mistral (#1766) * Add kvcache config for Mistral * Update configs.py * Update configs.py * Fix reset logic * Style after resolving merge conflicts --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Add kvcache config for Mistral * Update configs.py * Update configs.py

@Satrat

* - Update `src/sparseml/modifiers/obcq/pytorch.py` to use layer prefix for from model - Remove `layer_prefix` from `SparseGPTModifier` base - Update ModelMetaData to include layer_prefix - Added a convenience function to update missing values in RecipeMetaData instance from another RecipeMetaData instance - Update simplify recipe to also include metadata - Update simplify_combine_recipes to include metadata - Add layer_prefix property to `ModifiableModel` - propagate `layer_prefix` to superclass - update session.py to set_layer_prefix on the model before initializing modifiers - Update example recipe to include layer_prefix in metadata * Add missing docstring * - address review comment - update docstring - add test for `update_missing_metadata` * Add test * Style * Fix tests * Style * [modifier refactor] Add constant pruning tests (#1752) * Initial commit * Add end to end tests * Add e2e tests for constant pruning modifier * Move imports inside the test fuctions so that torch isn't imported unless running the tests * Update setup.py to not run modifier tests unless pytorch is specified * [Bugfix] .dict() method on Recipe (#1753) * Bugfix .dict() method on Recipe * Remove extraneous local test, [faulty commit] * [modifier refactor] Add serialization tests (#1755) * Add serialization tests * Clean up * Keep original stage and group names Clean up _get_yaml_dict * fix comment * Typo * [Unit Tests][Modifier Refactor] (#1756) * Move valid recipes to a helper file Add tests for session.py * Increase test coverage of src/sparseml/core/session.py to 100% Run Style Add logs to .gitignore * Increase coverage of tests/sparseml/core/test_state.py to 100% * add tests for lifecycle/event.py * Increase code coverage of lifecycle/event to 100% * increase lifecycle/session.py code coverage to 93% * Address review comments from @Satrat * Address review comments on 1752 (#1772) Update makefile to only ignore *pytorch.py files in modifier dir Fix order in test Add regex to makefile Add helper function to determine if torch tests should be run Check masks Make transformers import optional in sparsegpt.py * Fix merge conflict * Add more tests to check valid modifiers are created (#1774) * [Bug][ConstantPruningModifier] Fix mask de register bug (#1773) * Fix mask de-register logic * forgot to remove commented out line * Move tests inside pytorch directory as requested * Fix session reset (#1790) * fix datasets version to be compatible with fsspec (#1797) * Add kvcache config for Mistral (#1766) * Add kvcache config for Mistral * Update configs.py * Update configs.py * Fix reset logic * Style after resolving merge conflicts --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>

* Add kvcache config for Mistral * Update configs.py * Update configs.py

Add kvcache config for Mistral

da86c89

mgoin requested review from dsikka and dbogunowicz October 16, 2023 18:26

mgoin marked this pull request as ready for review October 16, 2023 18:33

dbogunowicz previously approved these changes Oct 16, 2023

View reviewed changes

dsikka requested changes Oct 16, 2023

View reviewed changes

dsikka reviewed Oct 16, 2023

View reviewed changes

mgoin dismissed dbogunowicz’s stale review via 73aa65a October 16, 2023 19:27

Update configs.py

73aa65a

Update configs.py

50cbb94

mgoin merged commit 955ae11 into main Oct 28, 2023
11 checks passed

mgoin deleted the mistral-kvinject branch October 28, 2023 22:56

rahul-tuli pushed a commit that referenced this pull request Oct 30, 2023

Add kvcache config for Mistral (#1766)

037e302

* Add kvcache config for Mistral * Update configs.py * Update configs.py

bfineran pushed a commit that referenced this pull request Nov 16, 2023

Add kvcache config for Mistral (#1766)

d53b435

* Add kvcache config for Mistral * Update configs.py * Update configs.py

bfineran pushed a commit that referenced this pull request Nov 16, 2023

Add kvcache config for Mistral (#1766)

58405c3

* Add kvcache config for Mistral * Update configs.py * Update configs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kvcache config for Mistral #1766

Add kvcache config for Mistral #1766

mgoin commented Oct 16, 2023 •

edited

dsikka left a comment •

edited

dsikka Oct 16, 2023

mgoin Oct 16, 2023

dbogunowicz Oct 17, 2023

dbogunowicz commented Oct 16, 2023

mgoin commented Oct 16, 2023 •

edited

dsikka commented Oct 17, 2023 •

edited

mgoin commented Oct 17, 2023

Add kvcache config for Mistral #1766

Add kvcache config for Mistral #1766

Conversation

mgoin commented Oct 16, 2023 • edited

dsikka left a comment • edited

Choose a reason for hiding this comment

dsikka Oct 16, 2023

Choose a reason for hiding this comment

mgoin Oct 16, 2023

Choose a reason for hiding this comment

dbogunowicz Oct 17, 2023

Choose a reason for hiding this comment

dbogunowicz commented Oct 16, 2023

mgoin commented Oct 16, 2023 • edited

dsikka commented Oct 17, 2023 • edited

mgoin commented Oct 17, 2023

mgoin commented Oct 16, 2023 •

edited

dsikka left a comment •

edited

mgoin commented Oct 16, 2023 •

edited

dsikka commented Oct 17, 2023 •

edited