Skip to content

Decouple quantization and activation dtypes for non-xnnpack #8652

@jackzhxng

Description

@jackzhxng

🚀 The feature, motivation and pitch

After #8488 gets in, which allows us to decouple the dtype used at weight quantization time and the dtype used during op computation for XNNPack 8da4w only, enable this for other backends and quantization settings as well. This should be possible with the new quantize_ API.

cc @mergennachin @iseeyuan @lucylq @helunwencser @tarun292 @kimishpatel @andrewor14

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

Metadata

Metadata

Assignees

Labels

module: examplesIssues related to demos under examples/triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions