-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add changes for strided calibration #20949
Add changes for strided calibration #20949
Conversation
Please fill in motivation and context |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline |
/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, Windows x64 QNN CI Pipeline, Linux MIGraphX CI Pipeline, Big Models |
/azp run ONNX Runtime React Native CI Pipeline, orttraining-amd-gpu-ci-pipeline, Linux Android Emulator QNN CI Pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 3 pipeline(s). |
Azure Pipelines successfully started running 10 pipeline(s). |
@RuomeiMS It looks like this same feature was already implemented in the past using a different option called However, that PR was not tested, and several follow-up PRs broke its functionality: #19416 I think that all we have to do is add the following new line to def collect_data(self, data_reader: CalibrationDataReader):
while True:
inputs = data_reader.get_next()
if not inputs:
break
self.intermediate_outputs.append(self.infer_session.run(None, inputs))
if (
self.max_intermediate_outputs is not None
and len(self.intermediate_outputs) == self.max_intermediate_outputs
):
self.compute_data() # ADD this line
self.clear_collected_data() If you do this, then we may be able to just use the |
* reading quant data in chunk and calculate/accumulate ranges for resolving OOM * one unit test test_stride_effect_on_data_collection using ort style
c986b09
to
d619de0
Compare
onnxruntime/python/tools/quantization/execution_providers/qnn/quant_config.py
Fixed
Show fixed
Hide fixed
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline |
/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, Windows x64 QNN CI Pipeline, Linux MIGraphX CI Pipeline, Big Models |
/azp run ONNX Runtime React Native CI Pipeline, orttraining-amd-gpu-ci-pipeline, Linux Android Emulator QNN CI Pipeline |
Azure Pipelines successfully started running 3 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 10 pipeline(s). |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline |
/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, Windows x64 QNN CI Pipeline, Linux MIGraphX CI Pipeline, Big Models |
/azp run ONNX Runtime React Native CI Pipeline, orttraining-amd-gpu-ci-pipeline, Linux Android Emulator QNN CI Pipeline |
Azure Pipelines successfully started running 3 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 10 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great!
We need to make sure all linting errors are fixed before we merge. Could you please run the following in your root onnxruntime directory and fix all issues?
lintrunner -a
https://github.com/microsoft/onnxruntime/actions/runs/9478693880/job/26136037422?pr=20949
@microsoft-github-policy-service agree |
@microsoft-github-policy-service agree |
d570ab4
to
92ab4bf
Compare
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline |
/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, Windows x64 QNN CI Pipeline, Linux MIGraphX CI Pipeline, Big Models |
/azp run ONNX Runtime React Native CI Pipeline, orttraining-amd-gpu-ci-pipeline, Linux Android Emulator QNN CI Pipeline |
Azure Pipelines successfully started running 3 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 10 pipeline(s). |
Context and motivation:
When quantizing large transformer models, we faced OOM issue when the number of calibration samples goes up. To resolve this, in the PR we want to add support for reading quantization data in chunck, calculating ranges for intermediate tensors, then accumulating results for the final ranges.