Skip to content

Conversation

jerryzh168
Copy link
Contributor

Summary:
use_cache option in lm_eval will read the results for (model_id, task) pair if it's already evaluated, but during development sometimes we'll update the model and need to re-evaluate, so have to disable cache to get new eval results.

This PR changes eval_quality.sh to not use cache by default, user can still enable it by explicitly passing --use_cache

don't use cache:
sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4"

use cache:
sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4" --use_cache

Test Plan:

sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4" --use_cache

Logs in /home/jerryzh/local/ao/.github/scripts/torchao_model_releases/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu.log:
MLoading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]^MLoading checkpoint shards:  50%|\█████     | 1/2 [00:01<00:01,  1.34s/it]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:02<00:\00,  1.41s/it]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.40s/it]
INFO:lm_eval.models.huggingface:Model type is 'gemma3', part of the Gemma family--a BOS token will b\e used as Gemma underperforms without it.
INFO:lm_eval.evaluator:Using cache at /tmp/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu_rank0.db

sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4"
Logs in /home/jerryzh/local/ao/.github/scripts/torchao_model_releases/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu.log:
^MLoading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]^MLoading checkpoint shards:  50%|\█████     | 1/2 [00:00<00:00,  1.04it/s]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:01<00:\00,  1.23it/s]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.20it/s]
INFO:lm_eval.models.huggingface:Model type is 'gemma3', part of the Gemma family--a BOS token will b\e used as Gemma underperforms without it.
INFO:lm_eval.api.task:Building contexts for mmlu_abstract_algebra on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1051.54it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_anatomy on rank 0...
^M  0%|          | 0/135 [00:00<?, ?it/s]^M 79%|███████▉  | 107/135 [00:00<00:00, 1067.20it/s]^M100%\|██████████| 135/135 [00:00<00:00, 1066.27it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_astronomy on rank 0...
^M  0%|          | 0/152 [00:00<?, ?it/s]^M 70%|██████▉   | 106/152 [00:00<00:00, 1056.01it/s]^M100%\|██████████| 152/152 [00:00<00:00, 1054.96it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_biology on rank 0...
^M  0%|          | 0/144 [00:00<?, ?it/s]^M 63%|██████▎   | 91/144 [00:00<00:00, 903.99it/s]^M100%|█\█████████| 144/144 [00:00<00:00, 938.81it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_chemistry on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1081.13it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_computer_science on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1109.74it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_mathematics on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1111.38it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_physics on rank 0...

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:
use_cache option in lm_eval will read the results for (model_id, task) pair if it's already evaluated,
but during development sometimes we'll update the model and need to re-evaluate, so have to disable cache
to get new eval results.

This PR changes eval_quality.sh to not use cache by default, user can still enable it by explicitly passing
`--use_cache`

don't use cache:
sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4"

use cache:
sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4" --use_cache

Test Plan:
```
sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4" --use_cache

Logs in /home/jerryzh/local/ao/.github/scripts/torchao_model_releases/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu.log:
MLoading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]^MLoading checkpoint shards:  50%|\█████     | 1/2 [00:01<00:01,  1.34s/it]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:02<00:\00,  1.41s/it]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.40s/it]
INFO:lm_eval.models.huggingface:Model type is 'gemma3', part of the Gemma family--a BOS token will b\e used as Gemma underperforms without it.
INFO:lm_eval.evaluator:Using cache at /tmp/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu_rank0.db

sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4"
Logs in /home/jerryzh/local/ao/.github/scripts/torchao_model_releases/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu.log:
^MLoading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]^MLoading checkpoint shards:  50%|\█████     | 1/2 [00:00<00:00,  1.04it/s]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:01<00:\00,  1.23it/s]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.20it/s]
INFO:lm_eval.models.huggingface:Model type is 'gemma3', part of the Gemma family--a BOS token will b\e used as Gemma underperforms without it.
INFO:lm_eval.api.task:Building contexts for mmlu_abstract_algebra on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1051.54it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_anatomy on rank 0...
^M  0%|          | 0/135 [00:00<?, ?it/s]^M 79%|███████▉  | 107/135 [00:00<00:00, 1067.20it/s]^M100%\|██████████| 135/135 [00:00<00:00, 1066.27it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_astronomy on rank 0...
^M  0%|          | 0/152 [00:00<?, ?it/s]^M 70%|██████▉   | 106/152 [00:00<00:00, 1056.01it/s]^M100%\|██████████| 152/152 [00:00<00:00, 1054.96it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_biology on rank 0...
^M  0%|          | 0/144 [00:00<?, ?it/s]^M 63%|██████▎   | 91/144 [00:00<00:00, 903.99it/s]^M100%|█\█████████| 144/144 [00:00<00:00, 938.81it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_chemistry on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1081.13it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_computer_science on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1109.74it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_mathematics on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1111.38it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_physics on rank 0...
```

Reviewers:

Subscribers:

Tasks:

Tags:
Copy link

pytorch-bot bot commented Sep 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3073

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit dc5cd6e with merge base 8c5c33e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025
@jerryzh168 jerryzh168 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Sep 26, 2025
@jerryzh168 jerryzh168 merged commit e850079 into pytorch:main Sep 26, 2025
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants