-
Notifications
You must be signed in to change notification settings - Fork 60
support automatic mixed bits assignment #851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
118 commits
Select commit
Hold shift + click to select a range
6ffcf60
try to enable auto_scheme API
wenhuach21 5d80825
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a4ef495
update a little
wenhuach21 4173c3e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 87e9454
update a little
wenhuach21 f86eedb
Merge branch 'main' into auto_scheme
wenhuach21 242d1ee
try to refine parse layer config code
wenhuach21 4fc6b64
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 63de904
Merge branch 'main' into auto_scheme
wenhuach21 bb4d4ca
Merge branch 'main' into auto_scheme
wenhuach21 7f76db2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] ae8837b
fix
wenhuach21 44ca92d
Merge branch 'auto_scheme' of https://github.com/intel/auto-round int…
wenhuach21 531224d
fix
wenhuach21 c9fa408
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6453200
fix
wenhuach21 5b2dd60
Merge branch 'auto_scheme' of https://github.com/intel/auto-round int…
wenhuach21 3811010
tmp_change
wenhuach21 4de7b08
commit
wenhuach21 a9f0e44
commit
wenhuach21 59a9f5d
update a little
wenhuach21 1b7e911
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] e068049
fix
wenhuach21 1b84bf2
Merge branch 'auto_scheme' of https://github.com/intel/auto-round int…
wenhuach21 0357c0b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7c034bd
Merge branch 'main' into auto_scheme
wenhuach21 602421c
merge autoscheme to scheme
wenhuach21 091c5ad
refine layer_config code
wenhuach21 90b6fa1
Merge branch 'main' into auto_scheme
wenhuach21 f027801
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] c6b78c6
tiny change
wenhuach21 1b9f24e
tiny fix
wenhuach21 2c0075a
tmp change
wenhuach21 97198f0
tmp change
wenhuach21 27b4b4d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2d3095a
update
wenhuach21 35a298b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4a594cd
fix
wenhuach21 dcd08d6
fix uts, still one left
wenhuach21 9172264
fix gguf issue
wenhuach21 1d9e593
Merge branch 'main' into auto_scheme
wenhuach21 f98092c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 033d1f6
update a little
wenhuach21 8ae1dfa
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a3756ce
fix some issues
wenhuach21 2f93471
fix some issues
wenhuach21 e0c3d4b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 0130932
Merge branch 'main' into auto_scheme
wenhuach21 6e04d10
update
wenhuach21 04c604c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 3880038
Merge branch 'main' into auto_scheme
wenhuach21 87d3694
fix one bug
wenhuach21 fa85d42
Merge branch 'main' into auto_scheme
wenhuach21 3855c8f
fix
wenhuach21 d3e28c2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 706df03
Merge branch 'main' into auto_scheme
wenhuach21 2d557d0
set up the first version, there are many details to be handled
wenhuach21 567ebb8
Merge branch 'auto_scheme' of https://github.com/intel/auto-round int…
wenhuach21 cedad47
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 0c3a0e2
fix one bug
wenhuach21 cced6d8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 58d5ae2
uncomment ut
wenhuach21 e9bcd4a
Merge branch 'auto_scheme' of https://github.com/intel/auto-round int…
wenhuach21 ea489c3
rename functions
wenhuach21 c763761
Merge branch 'main' into auto_scheme
wenhuach21 f74fcb4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 982202b
Merge branch 'main' into auto_scheme
wenhuach21 9cfa4e5
update
wenhuach21 a7efdf6
Merge branch 'main' into auto_scheme
wenhuach21 ac4036e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8a8cb61
fix
wenhuach21 0e2be6c
fix a bug
wenhuach21 8d854db
update
wenhuach21 d7908f4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 21dd1c2
support multiple gpu via device_map
wenhuach21 ab81181
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 48e4feb
update ut
wenhuach21 da7eac1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2b9c5bb
support large models
wenhuach21 f6e214d
Merge branch 'main' into auto_scheme
wenhuach21 2b1bf59
Merge branch 'auto_scheme' of https://github.com/intel/auto-round int…
wenhuach21 fcfb9c6
support shared layers
wenhuach21 a100823
Merge branch 'main' into auto_scheme
wenhuach21 91ec73d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 9a2738a
update a little
wenhuach21 f95840d
Merge branch 'main' into auto_scheme
wenhuach21 6132faa
fix gguf issue
wenhuach21 c111608
support gguf
wenhuach21 07c3eb4
Merge branch 'auto_scheme' of https://github.com/intel/auto-round int…
wenhuach21 c74df50
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4f7cd55
Merge branch 'main' into auto_scheme
wenhuach21 ae87b77
revert test
wenhuach21 2682247
update
wenhuach21 30fee22
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 248cd21
Merge branch 'main' into auto_scheme
wenhuach21 fd18a9f
Merge branch 'main' into auto_scheme
wenhuach21 8ce5b1e
fix merge issue
wenhuach21 8e16325
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] decdcce
fix merge issue
wenhuach21 978082c
Merge branch 'auto_scheme' of https://github.com/intel/auto-round int…
wenhuach21 f9e80ab
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] efc69de
update
wenhuach21 4d1f8de
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f1ed097
update
wenhuach21 bae0354
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] e014f41
update
wenhuach21 dc56dc6
Merge branch 'main' into auto_scheme
wenhuach21 5e1c4e8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7715eab
support torch enable compile
wenhuach21 6f61ee1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7bd6273
add so file and cpu ut
wenhuach21 9c0eb06
Merge branch 'auto_scheme' of https://github.com/intel/auto-round int…
wenhuach21 815de02
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2959aa9
correct model path
wenhuach21 019991f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 76f4f8b
update so
wenhuach21 bdf5421
update readme
wenhuach21 5348376
Merge branch 'auto_scheme' of https://github.com/intel/auto-round int…
wenhuach21 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # Copyright (c) 2025 Intel Corporation | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| AUTO_SCHEME_METHODS = {} | ||
|
|
||
|
|
||
| def register_scheme_methods(names): | ||
| """Class decorator to register a mixed precision algorithm to the registry. | ||
|
|
||
| Decorator function used before a Pattern subclass. | ||
|
|
||
| Args: | ||
| names: A string. Define the export type. | ||
|
|
||
| Returns: | ||
| cls: The class of register. | ||
| """ | ||
|
|
||
| def register(alg): | ||
| if isinstance(names, (tuple, list)): | ||
| for name in names: | ||
| AUTO_SCHEME_METHODS[name] = alg | ||
| else: | ||
| AUTO_SCHEME_METHODS[names] = alg | ||
|
|
||
| return alg | ||
|
|
||
| return register | ||
|
|
||
|
|
||
| import auto_round.auto_scheme.default_alg |
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| # Copyright (c) 2025 Intel Corporation | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| import math | ||
| from dataclasses import asdict | ||
| from typing import Iterable, Union | ||
|
|
||
| import torch | ||
|
|
||
| from auto_round import AutoScheme | ||
| from auto_round.auto_scheme import AUTO_SCHEME_METHODS | ||
| from auto_round.auto_scheme.utils import compute_avg_bits_for_scheme | ||
| from auto_round.export.export_to_gguf.config import GGUF_INNER_CONFIG | ||
| from auto_round.logger import logger | ||
| from auto_round.utils import _gguf_type_fallback, get_layer_features, get_module | ||
|
|
||
|
|
||
| class GenScheme: | ||
| """Generate and validate quantization schemes for model layers.""" | ||
|
|
||
| def __init__( | ||
| self, | ||
| auto_scheme: AutoScheme, # TODO support shared layer | ||
| model: torch.nn.Module, | ||
| quant_layer_names: Iterable[str], | ||
| fixed_layer_scheme: dict[str, dict], | ||
| dataset: str = "pile-10k", # TODO use auto-round dataset | ||
| device_map: Union[str, torch.device, int, dict, None] = None, | ||
| tokenizer=None, | ||
| enable_torch_compile=False, | ||
| ): | ||
| self.auto_scheme = auto_scheme | ||
| self.model = model | ||
| self.tokenizer = tokenizer | ||
| self.quant_layer_names = quant_layer_names | ||
| self.fixed_layer_scheme = fixed_layer_scheme | ||
| self.dataset = dataset | ||
| self.device_map = device_map if self.auto_scheme.device_map is None else self.auto_scheme.device_map | ||
| self.enable_torch_compile = enable_torch_compile | ||
| self._check_configs() | ||
|
|
||
| def _check_configs(self) -> None: | ||
| """Validate auto_scheme configuration and ensure avg_bits target is valid.""" | ||
| if isinstance(self.model, torch.nn.Module) and self.tokenizer is None: | ||
| raise ValueError("tokenizer must not be None if model is nn.Module") | ||
|
|
||
| if not isinstance(self.dataset, str): | ||
| raise TypeError("`dataset` must be a string, got {type(self.dataset).__name__}.") | ||
|
|
||
| min_avg_bit, max_avg_bit = self.compute_avg_bit_range() | ||
| target = self.auto_scheme.avg_bits | ||
|
|
||
| logger.info("Average bits range: [%.3f, %.3f], target = %.3f", min_avg_bit, max_avg_bit, target) | ||
| if abs(target - min_avg_bit) < 1e-3 or abs(target - max_avg_bit) < 1e-3: | ||
| if abs(target - min_avg_bit) < 1e-3: | ||
| target = min_avg_bit | ||
| else: | ||
| target = max_avg_bit | ||
| self.auto_scheme.avg_bits = target | ||
|
|
||
| if not (min_avg_bit <= target <= max_avg_bit): | ||
| raise ValueError( | ||
| f"Target avg_bits={target:.3f} is outside the valid range " f"[{min_avg_bit:.3f}, {max_avg_bit:.3f}]." | ||
| ) | ||
|
|
||
| def get_layer_config(self) -> dict[str, dict]: | ||
| method_name = self.auto_scheme.method | ||
| method_func = AUTO_SCHEME_METHODS[method_name] | ||
| layer_config = method_func( | ||
| self.auto_scheme, | ||
| self.model, | ||
| self.quant_layer_names, | ||
| self.fixed_layer_scheme, | ||
| self.dataset, | ||
| self.tokenizer, | ||
| device_map=self.device_map, | ||
| enable_torch_compile=self.enable_torch_compile, | ||
| ) | ||
| layer_config = self.fallback_gguf_layer_config(layer_config) | ||
| return layer_config | ||
|
|
||
| def fallback_gguf_layer_config(self, layer_config: dict[str, dict]) -> dict[str, dict]: | ||
| """ | ||
| Apply fallback configurations for GGUF quantized layers when the current | ||
| layer configuration is incompatible with input feature alignment. | ||
|
|
||
| Args: | ||
| layer_config (dict[str, dict]): Mapping from layer name to its quantization scheme. | ||
|
|
||
| Returns: | ||
| dict[str, dict]: Updated layer configuration with applied fallbacks if necessary. | ||
| """ | ||
| for name, scheme in layer_config.items(): # TODO: add unit test (wenhua), the code is a little tricky | ||
| if scheme.get("super_bits") is None: | ||
| continue # Skip non-GGUF k-quant layers | ||
|
|
||
| layer = get_module(self.model, name) | ||
| input_features, out_features = get_layer_features(layer) | ||
| if input_features is None: | ||
| continue | ||
| if input_features % 256 == 0 or isinstance(layer, torch.nn.Embedding): | ||
| continue | ||
|
|
||
| # Determine fallback quantization type | ||
| if input_features % 256 != 0 and input_features % 32 != 0: | ||
| new_type = "gguf:bf16" | ||
| elif input_features % 256 != 0: | ||
| bits = scheme["bits"] | ||
| prefix_idx = 0 if scheme["sym"] else 1 | ||
| new_type = f"gguf:q{bits}_" + f"{prefix_idx}" | ||
| if new_type not in GGUF_INNER_CONFIG: | ||
| new_type = f"gguf:q{bits}_" + f"{1 - prefix_idx}" | ||
| if new_type not in GGUF_INNER_CONFIG: | ||
| current_type = f"gguf:q{bits}_k" | ||
| new_type = _gguf_type_fallback(current_type) | ||
|
|
||
| # Apply fallback configuration | ||
| target_config = GGUF_INNER_CONFIG[new_type] | ||
| for key in scheme.keys(): | ||
| if key in target_config: | ||
| scheme[key] = target_config[key] | ||
|
|
||
| logger.warning(f"Fallback applied: {name} → {new_type}") | ||
|
|
||
| return layer_config | ||
|
|
||
| def compute_avg_bit_range(self) -> tuple[float, float]: | ||
| """Compute the min and max average bitwidths among candidate quantization options.""" | ||
| avg_bits = [ | ||
| compute_avg_bits_for_scheme( | ||
| self.model, | ||
| self.quant_layer_names, | ||
| self.fixed_layer_scheme, | ||
| option, | ||
| self.auto_scheme.ignore_scale_zp_bits, | ||
| )[0] | ||
| for option in self.auto_scheme.options | ||
| ] | ||
| return min(avg_bits), max(avg_bits) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.