Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@Jack-Khuu
Copy link
Contributor

As titled, originally the message was shown regardless of whether the scheme was used; tested on Mac (ARM)


Not Using Quant

> python3 torchchat.py generate llama3.1 --device cpu
Using device=cpu Apple M1 Max
Loading model...
Time to load model: 2.24 seconds
-----------------------------------------------------------

Using Quant

> OMP_NUM_THREADS=6 python3 torchchat.py generate llama3.1 --device cpu --dtype float32 --quantize '{"linear:a8wxdq": {"bitwidth": 4, "groupsize": 256, "has_weight_zeros": false}}'

w/o Loading

Using device=cpu Apple M1 Max
Loading model...
Time to load model: 2.74 seconds
Quantizing the model with: {'linear:a8wxdq': {'bitwidth': 4, 'groupsize': 256, 'has_weight_zeros': False}}
Time to quantize model: 0.00 seconds
Traceback (most recent call last):
  File "/Users/jackkhuu/Desktop/oss/torchchat/torchchat.py", line 83, in <module>
    generate_main(args)
  File "/Users/jackkhuu/Desktop/oss/torchchat/torchchat/generate.py", line 1093, in main
    gen = Generator(
          ^^^^^^^^^^
  File "/Users/jackkhuu/Desktop/oss/torchchat/torchchat/generate.py", line 284, in __init__
    self.model = _initialize_model(self.builder_args, self.quantize, self.tokenizer)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jackkhuu/Desktop/oss/torchchat/torchchat/cli/builder.py", line 568, in _initialize_model
    quantize_model(
  File "/Users/jackkhuu/Desktop/oss/torchchat/torchchat/utils/quantize.py", line 84, in quantize_model
    raise Exception(f"Note: Failed to load torchao experimental a8wxdq quantizer with error: {a8wxdq_load_error}")
Exception: Note: Failed to load torchao experimental a8wxdq quantizer with error: [Errno 2] No such file or directory: '/Users/jackkhuu/Desktop/oss/torchchat/torchao-build/src/ao/torchao/experimental/quant_api.py'

w/ Loading

Using device=cpu Apple M1 Max
Loading model...
Time to load model: 2.98 seconds
Quantizing the model with: {'linear:a8wxdq': {'bitwidth': 4, 'groupsize': 256, 'has_weight_zeros': False}}

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1231

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 6e1cdd5 with merge base 8278aa2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 29, 2024
@Jack-Khuu Jack-Khuu requested a review from byjlw September 29, 2024 22:23
@Jack-Khuu Jack-Khuu merged commit 9bbbc87 into main Sep 30, 2024
52 checks passed
metascroy pushed a commit that referenced this pull request Sep 30, 2024
* Show a8wxdq load error only when the quant is used

* Update Error check
@Jack-Khuu Jack-Khuu deleted the a8wx-message branch October 5, 2024 02:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants