Skip to content

Conversation

@jiqing-feng
Copy link
Contributor

@jiqing-feng jiqing-feng commented Aug 21, 2025

Update quantization overview for XPU.
Keep in draft until optimum-quanto PR: 395 merged.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng
Copy link
Contributor Author

run-slow: aqlm_integration

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@Rocketknight1
Copy link
Member

cc @IlyasMoutawwakil @MekkCyber

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
out = model.generate(**text, max_new_tokens=10)

EXPECTED_TEXT = "Hello, I am a 20 year old male"
EXPECTED_TEXT = "Helloab, I am a 1000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this th expected output on cpu ?

Copy link
Contributor Author

@jiqing-feng jiqing-feng Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this output on the Intel Xeon CPU. The new ground truth does not very reaonsble, but you can see otehr gound truth have same issue, like here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it's reasonable, we need take a look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very weird indeed

Copy link
Contributor Author

@jiqing-feng jiqing-feng Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've reverted this change. Will track it in a separate issue.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the update ! Only a small question

Comment on lines +425 to +428
EXPECTED_OUTPUTS = [
"Hello my name is John, I am a professional photographer, I", # CUDA output
"Hello my name is Nils, I am a student of the University", # XPU output
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the only test where outputs differ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's mainly because XPU enables a new way in this PR 395

@jiqing-feng jiqing-feng marked this pull request as ready for review August 27, 2025 01:18
@MekkCyber MekkCyber enabled auto-merge (squash) August 28, 2025 09:04
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aqlm_integration, ggml, quanto_integration

@MekkCyber MekkCyber merged commit f9b9a5e into huggingface:main Aug 28, 2025
14 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@jiqing-feng jiqing-feng deleted the quant_overview branch December 15, 2025 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants