Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: OnnxQuantization #573

Closed
akarym-sl opened this issue Sep 15, 2023 · 8 comments
Closed

[Bug]: OnnxQuantization #573

akarym-sl opened this issue Sep 15, 2023 · 8 comments
Labels
bug Something isn't working python Pull requests that update Python code

Comments

@akarym-sl
Copy link

What happened?

When running first OnnxQuantization pass with default parameters and a pass then with QUInt weight and activation type, the model parameters are not quantized to QUInt.
To clarify, running pass:

  • OnnxConversion->OnnxQuantization (QUInt8)

yields different accuracy than running two passes:

  • OnnxConversion->OnnxQuantization
  • OnnxConversion->OnnxQuantization (QUInt8)

Version?

0.3.1

@akarym-sl akarym-sl added the bug Something isn't working label Sep 15, 2023
@guotuofeng
Copy link
Collaborator

@akarym-sl , what's the config json you used to run the optimization for quantization?

@guotuofeng guotuofeng added the waiting for response Waiting for response label Sep 17, 2023
@akarym-sl
Copy link
Author

Here is the config. For the second case I prepend ["onnx_conv", "onnx_quant"] to the "pass_flows" list

{
    "input_model":{
        "type":"PyTorchModel",
        "config":{
            "model_path":"model.pt",
            "model_loader":"load_state_dict",
            "model_script":"save.py",
            "dummy_inputs_func":"get_dummy_inputs",
            "io_config":{
                "input_names":[
                    "input"
                ],
                "output_names":[
                    "output"
                ],
                "dynamic_axes":{
                    "input":{
                        "0":"batch"
                    },
                    "output":{
                        "0":"batch"
                    }
                }
            }
        }
    },
    "systems":{
        "local_system":{
            "type":"LocalSystem",
            "config":{
                "accelerators":[
                    "cpu"
                ]
            }
        }
    },
    "evaluators":{
        "custom_evaluator":{
            "metrics":[
                {
                    "name":"custom",
                    "type":"custom",
                    "user_config":{
                        "user_script":"user_script.py",
                        "batch_size":1,
                        "dataloader_func":"create_dataloader",
                        "evaluate_func":"evaluate"
                    },
                    "sub_types":[
                        {
                            "name":"latency",
                            "priority":1,
                            "higher_is_better":false
                        },
                        {
                            "name":"accuracy",
                            "priority":2,
                            "higher_is_better":true
                        }
                    ]
                }
            ]
        }
    },
    "engine":{
        "clean_cache":true,
        "cache_dir":".cache",
        "output_dir":"optimization",
        "host":"local_system",
        "target":"local_system",
        "execution_providers":[
            "CPUExecutionProvider"
        ],
        "evaluator":"custom_evaluator",
        "evaluate_input_model":false
    },
    "passes":{
        "onnx_conv":{
            "type":"OnnxConversion",
            "config":{
                "target_opset":15
            }
        },
        "onnx_quant":{
            "type":"OnnxQuantization",
            "config":{
                "user_script":"user_script.py",
                "dataloader_func":"create_calibrator",
            }
        },
        "onnx_quant_u":{
            "type":"OnnxQuantization",
            "config":{
                "user_script":"user_script.py",
                "dataloader_func":"create_calibrator",
                "weight_type":"QUInt8",
                "activation_type":"QUInt8"
            }
        },
    },
    "pass_flows"[["onnx_conv", "onnx_quant_u"]]
}

@guotuofeng
Copy link
Collaborator

guotuofeng commented Sep 18, 2023

@akarym-sl, do you mean the accuracy from model optimized by pass_flows is different with that one without default onnx_quant?

From your description, it seems the accuracy from [["onnx_conv", "onnx_quant_u"]] is different from [ ["onnx_conv", "onnx_quant"], ["onnx_conv", "onnx_quant_u"]]? your question should be the two run for same pass group ["onnx_conv", "onnx_quant_u"] should be same. Is my understanding correct?

@akarym-sl
Copy link
Author

Yes, in my understanding, the previous passes shouldn't affect the current. I observe that adding ["onnx_conv", "onnx_quant"] pass affects the accuracy on the ["onnx_conv", "onnx_quant_u"] pass. My guess would be that the model is not quantized to QUInt in the second pass, as it should, but is rather quantized to QInt or not changed at all and loaded from the last pass.

@guotuofeng
Copy link
Collaborator

@trajepl is helping looking at

@trajepl
Copy link
Contributor

trajepl commented Sep 18, 2023

Thanks for raising it up. It should be a bug from olive side.
The root cause is that: olive uses pass's class name(OnnxQuantization) as key to access pass instance(onnx_quant, onnx_quant_u).
When there are the same passes but with different configs(onnx_quant, onnx_quant_u), only the first one is used to run quantization(onnx_quant).

I changed the key to pass name in following PR and tested with bert case, it worked well for me.

{
    "input_model":{
        "type": "PyTorchModel",
        "config": {
            "hf_config": {
                "model_name": "Intel/bert-base-uncased-mrpc",
                "task": "text-classification",
                "dataset": {
                    "data_name":"glue",
                    "subset": "mrpc",
                    "split": "validation",
                    "input_cols": ["sentence1", "sentence2"],
                    "label_cols": ["label"],
                    "batch_size": 1
                }
            }
        }
    },
    "evaluators": {
        "common_evaluator": {
            "metrics":[
                {
                    "name": "accuracy",
                    "type": "accuracy",
                    "backend": "huggingface_metrics",
                    "sub_types": [
                        {"name": "accuracy", "priority": 1, "goal": {"type": "max-degradation", "value": 0.01}},
                        {"name": "f1"}
                    ]
                },
                {
                    "name": "latency",
                    "type": "latency",
                    "sub_types": [
                        {"name": "avg", "priority": 2, "goal": {"type": "percent-min-improvement", "value": 20}},
                        {"name": "max"},
                        {"name": "min"}
                    ]
                }
            ]
        }
    },
    "passes": {
        "conversion": {
            "type": "OnnxConversion",
            "config": {
                "target_opset": 13
            }
        },
        "onnx_quant": {
            "type": "OnnxQuantization",
            "config": {
                "data_config": "__input_model_data_config__"
            }
        },
        "onnx_quant_u": {
            "type": "OnnxQuantization",
            "config": {
                "data_config": "__input_model_data_config__",
                "weight_type":"QUInt8",
                "activation_type":"QUInt8"
            }
        }
    },
    "pass_flows": [
        ["conversion", "onnx_quant_u"]
    ],
    "engine": {
        "evaluator": "common_evaluator",
        "execution_providers": ["CPUExecutionProvider"],
        "cache_dir": "cache",
        "output_dir" : "models/bert_ptq_cpu",
        "clean_cache": true
    }
}

Could you help take a try with this PR? @akarym-sl #577

git clone https://github.com/microsoft/Olive
pip install .

trajepl added a commit that referenced this issue Sep 18, 2023
## Describe your changes

This PR is used fix following issue where the same pass with different
config appears in one olive run config.
#573

[Root Cause]: 
Olive uses the pass's class name as key to identify the pass instance.
When there are passes with same pass_class, the first one defined in
olive run config will be picked always.
https://github.com/microsoft/Olive/blob/main/olive/engine/engine.py#L436

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Format your code by running `pre-commit run --all-files`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.

## (Optional) Issue link
@guotuofeng
Copy link
Collaborator

@akarym-sl, please let us know whether the bug is fixed or not.

@guotuofeng guotuofeng added python Pull requests that update Python code and removed waiting for response Waiting for response labels Sep 19, 2023
@akarym-sl
Copy link
Author

I tested the new version (0.4.0) on the same setup and can confirm that the issue is gone! Therefore, closing the issue. Thank you!

trajepl added a commit that referenced this issue Sep 22, 2023
…598)

## Describe your changes
Unit tests for same pass with different config in one olive config. To
cover this case #573

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Format your code by running `pre-commit run --all-files`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.

## (Optional) Issue link
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Pull requests that update Python code
Projects
None yet
Development

No branches or pull requests

3 participants