Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NV-ModelOPT INT4 quantization #1135

Merged
merged 5 commits into from
May 9, 2024
Merged

Conversation

riyadshairi979
Copy link
Contributor

@riyadshairi979 riyadshairi979 commented May 3, 2024

Describe your changes

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
  • Is this PR including examples changes? If yes, please remember to update example documentation in a follow-up PR.

Tests

  • Unit test: pytest test/unit_test/passes/onnx/test_nvmo_quantization.py
  • Example: python -m olive.workflows.run --config bert_nvmo_ptq.json

(Optional) Issue link

@riyadshairi979 riyadshairi979 force-pushed the rislam/nv-ptq branch 3 times, most recently from 9b80683 to 803301e Compare May 4, 2024 19:05
@riyadshairi979
Copy link
Contributor Author

@microsoft-github-policy-service agree company="Nvidia"

@riyadshairi979 riyadshairi979 force-pushed the rislam/nv-ptq branch 2 times, most recently from b8f6abc to 3a53972 Compare May 4, 2024 21:09
@trajepl
Copy link
Contributor

trajepl commented May 6, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

test/requirements-test.txt Outdated Show resolved Hide resolved
@jambayk
Copy link
Contributor

jambayk commented May 6, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@riyadshairi979
Copy link
Contributor Author

/azp run

Copy link

Commenter does not have sufficient privileges for PR 1135 in repo microsoft/Olive

@jambayk
Copy link
Contributor

jambayk commented May 6, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@guotuofeng
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jambayk
Copy link
Contributor

jambayk commented May 8, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

jambayk
jambayk previously approved these changes May 8, 2024
Copy link
Contributor

@jambayk jambayk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Will merge once the CI passes.

@jambayk
Copy link
Contributor

jambayk commented May 8, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@riyadshairi979 riyadshairi979 force-pushed the rislam/nv-ptq branch 2 times, most recently from d83156d to 171bfe0 Compare May 8, 2024 20:24
@@ -99,6 +100,14 @@ This workflow performs BERT optimization on GPU with CUDA/TensorRT. It performs
2. TensorRT: `TensorrtExecutionProvider`
- *PyTorch Model -> Onnx Model -> ONNX Runtime performance tuning with trt_fp16_enable*
Config file: [bert_trt_gpu.json](bert_trt_gpu.json)

### BERT optimization with TensorRT-Model-Optimizer on CPU/GPU
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jambayk should I add something like below about deployment?
Updated:
Users can deploy the quantized ONNX model using TensorRT 10.x but that is not supported in ORT right now, stay tuned! or Deployment support for TensorRT-Model-Optimizer quantized models is coming soon in ORT, in the meantime try TensorRT 10.x

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about the deployment would probably be about ORT and not Olive? Since olive only produces the model and it gets deployed using ort, tensorrt or some other engine. I will let @EmmaNingMS comment on this.

Copy link
Member

@EmmaNingMS EmmaNingMS May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vote for this version if ORT support is in the plan. "Deployment support for TensorRT-Model-Optimizer quantized models is coming soon in ORT, in the meantime try TensorRT 10.x"

Does ORT with TRT EP support TensorRT-Model-Optimizer quantized models? It is expected that ORT-TRT has equivalent capabilities to the TRT engine. If not, what's the gap for ORT-TRT to run TensorRT-Model-Optimizer quantized models? Will Nvidia teams support that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EmmaNingMS I will ask my team about the roadmap regarding these supports. Thanks.

@jambayk
Copy link
Contributor

jambayk commented May 9, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@jambayk jambayk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jambayk jambayk merged commit 40845a3 into microsoft:main May 9, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants