Skip to content

[Bug]: limit doesn't work in task-by-task mode #1319

@xin3he

Description

@xin3he

Problem Description

In eval model, limit=100 is set but all samples are evaluated.

Reproduction Steps

auto-round facebook/opt-125m --eval --eval_task_by_task --tasks lambada_openai,piqa --limit 100

Environment Information

Linux

Error Logs

root@ip-10-0-146-1: auto-round facebook/opt-125m --eval --eval_task_by_task --tasks lambada_openai,piqa --limit 100
100%|????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 5153/5153 [00:05<00:00, 901.36it/s]
Running loglikelihood requests:   0%|                                                  | 0/5153 [00:00<?, ?it/s]Passed argument batch_size = auto:8.0. Detecting largest batch size
Determined largest batch size: 64
Running loglikelihood requests:  11%|??????????                                  | 578/5153 [00:01<00:07, 642.86it/s]Passed argument batch_size = auto:8.0. Detecting largest batch size
Determined largest batch size: 64
Running loglikelihood requests: 100%|??????????????????????????????????????????????????????????????????????????| 5153/5153 [00:03<00:00, 1550.92it/s]bootstrapping for stddev: perplexity
100%|????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 100/100 [00:00<00:00, 185.14it/s]
|    Tasks     |Version|Filter|n-shot|  Metric  |   | Value |   |Stderr|
|--------------|------:|------|-----:|----------|---|------:|---|-----:|
|lambada_openai|      1|none  |     0|acc       |??  | 0.3788|?  |0.0068|
|              |       |none  |     0|perplexity|??  |26.0217|?  |0.9382|

100%|??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 1838/1838 [00:01<00:00, 1832.37it/s]
Running loglikelihood requests: 100%|??????????????????????????????????????????????????????????????????????????| 3676/3676 [00:01<00:00, 3374.97it/s]
|    Tasks     |Version|Filter|n-shot|  Metric  |   | Value |   |Stderr|
|--------------|------:|------|-----:|----------|---|------:|---|-----:|
|lambada_openai|      1|none  |     0|acc       |??  | 0.3788|?  |0.0068|
|              |       |none  |     0|perplexity|??  |26.0217|?  |0.9382|
|piqa          |      1|none  |     0|acc       |??  | 0.6295|?  |0.0113|
|              |       |none  |     0|acc_norm  |??  | 0.6197|?  |0.0113|

total eval time: 34.39299297332764

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions