Currently we use the word "benchmark" for both individual benchmarks of one model, and for the benchmark suite, or the app as a whole.
It would be easier to understand if we rename individual benchmarks to "tasks".
We already use this name in some places.