Don't use_cache for lm_eval by default #3073

jerryzh168 · 2025-09-26T01:15:43Z

Summary:
use_cache option in lm_eval will read the results for (model_id, task) pair if it's already evaluated, but during development sometimes we'll update the model and need to re-evaluate, so have to disable cache to get new eval results.

This PR changes eval_quality.sh to not use cache by default, user can still enable it by explicitly passing --use_cache

don't use cache:
sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4"

use cache:
sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4" --use_cache

Test Plan:

sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4" --use_cache 
Logs in /home/jerryzh/local/ao/.github/scripts/torchao_model_releases/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu.log:
MLoading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]^MLoading checkpoint shards:  50%|\█████     | 1/2 [00:01<00:01,  1.34s/it]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:02<00:\00,  1.41s/it]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.40s/it]
INFO:lm_eval.models.huggingface:Model type is 'gemma3', part of the Gemma family--a BOS token will b\e used as Gemma underperforms without it.
INFO:lm_eval.evaluator:Using cache at /tmp/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu_rank0.db

sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4"
Logs in /home/jerryzh/local/ao/.github/scripts/torchao_model_releases/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu.log:
^MLoading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]^MLoading checkpoint shards:  50%|\█████     | 1/2 [00:00<00:00,  1.04it/s]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:01<00:\00,  1.23it/s]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.20it/s]
INFO:lm_eval.models.huggingface:Model type is 'gemma3', part of the Gemma family--a BOS token will b\e used as Gemma underperforms without it.
INFO:lm_eval.api.task:Building contexts for mmlu_abstract_algebra on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1051.54it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_anatomy on rank 0...
^M  0%|          | 0/135 [00:00<?, ?it/s]^M 79%|███████▉  | 107/135 [00:00<00:00, 1067.20it/s]^M100%\|██████████| 135/135 [00:00<00:00, 1066.27it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_astronomy on rank 0...
^M  0%|          | 0/152 [00:00<?, ?it/s]^M 70%|██████▉   | 106/152 [00:00<00:00, 1056.01it/s]^M100%\|██████████| 152/152 [00:00<00:00, 1054.96it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_biology on rank 0...
^M  0%|          | 0/144 [00:00<?, ?it/s]^M 63%|██████▎   | 91/144 [00:00<00:00, 903.99it/s]^M100%|█\█████████| 144/144 [00:00<00:00, 938.81it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_chemistry on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1081.13it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_computer_science on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1109.74it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_mathematics on rank 0...
^M  0%|          | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1111.38it/s]
INFO:lm_eval.api.task:Building contexts for mmlu_college_physics on rank 0...

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: use_cache option in lm_eval will read the results for (model_id, task) pair if it's already evaluated, but during development sometimes we'll update the model and need to re-evaluate, so have to disable cache to get new eval results. This PR changes eval_quality.sh to not use cache by default, user can still enable it by explicitly passing `--use_cache` don't use cache: sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4" use cache: sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4" --use_cache Test Plan: ``` sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4" --use_cache  Logs in /home/jerryzh/local/ao/.github/scripts/torchao_model_releases/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu.log: MLoading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]^MLoading checkpoint shards: 50%|\█████ | 1/2 [00:01<00:01, 1.34s/it]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:02<00:\00, 1.41s/it]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00, 1.40s/it] INFO:lm_eval.models.huggingface:Model type is 'gemma3', part of the Gemma family--a BOS token will b\e used as Gemma underperforms without it. INFO:lm_eval.evaluator:Using cache at /tmp/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu_rank0.db sh eval.sh --eval_type quality --model_ids "$QMODEL_PREFIX-AWQ-INT4" Logs in /home/jerryzh/local/ao/.github/scripts/torchao_model_releases/jerryzh168_gemma-3-12b-it-AWQ-INT4_quality_mmlu.log: ^MLoading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]^MLoading checkpoint shards: 50%|\█████ | 1/2 [00:00<00:00, 1.04it/s]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:01<00:\00, 1.23it/s]^MLoading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.20it/s] INFO:lm_eval.models.huggingface:Model type is 'gemma3', part of the Gemma family--a BOS token will b\e used as Gemma underperforms without it. INFO:lm_eval.api.task:Building contexts for mmlu_abstract_algebra on rank 0... ^M 0%| | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1051.54it/s] INFO:lm_eval.api.task:Building contexts for mmlu_anatomy on rank 0... ^M 0%| | 0/135 [00:00<?, ?it/s]^M 79%|███████▉ | 107/135 [00:00<00:00, 1067.20it/s]^M100%\|██████████| 135/135 [00:00<00:00, 1066.27it/s] INFO:lm_eval.api.task:Building contexts for mmlu_astronomy on rank 0... ^M 0%| | 0/152 [00:00<?, ?it/s]^M 70%|██████▉ | 106/152 [00:00<00:00, 1056.01it/s]^M100%\|██████████| 152/152 [00:00<00:00, 1054.96it/s] INFO:lm_eval.api.task:Building contexts for mmlu_college_biology on rank 0... ^M 0%| | 0/144 [00:00<?, ?it/s]^M 63%|██████▎ | 91/144 [00:00<00:00, 903.99it/s]^M100%|█\█████████| 144/144 [00:00<00:00, 938.81it/s] INFO:lm_eval.api.task:Building contexts for mmlu_college_chemistry on rank 0... ^M 0%| | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1081.13it/s] INFO:lm_eval.api.task:Building contexts for mmlu_college_computer_science on rank 0... ^M 0%| | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1109.74it/s] INFO:lm_eval.api.task:Building contexts for mmlu_college_mathematics on rank 0... ^M 0%| | 0/100 [00:00<?, ?it/s]^M100%|██████████| 100/100 [00:00<00:00, 1111.38it/s] INFO:lm_eval.api.task:Building contexts for mmlu_college_physics on rank 0... ``` Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2025-09-26T01:15:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3073

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit dc5cd6e with merge base 8c5c33e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025

jerryzh168 requested a review from metascroy September 26, 2025 01:15

jerryzh168 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Sep 26, 2025

metascroy approved these changes Sep 26, 2025

View reviewed changes

jerryzh168 merged commit e850079 into pytorch:main Sep 26, 2025
19 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't use_cache for lm_eval by default #3073

Don't use_cache for lm_eval by default #3073

Uh oh!

jerryzh168 commented Sep 26, 2025

Uh oh!

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Don't use_cache for lm_eval by default #3073

Don't use_cache for lm_eval by default #3073

Uh oh!

Conversation

jerryzh168 commented Sep 26, 2025

Uh oh!

pytorch-bot bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3073

✅ No Failures

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading