You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
where default feature input types for evaluate.Metric is nothing and we get something like this in our llm_harness_mistral_arc/llm_harness_mistral_arc.py
importevaluateimportdatasetsimportlm_eval@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)classllm_harness_mistral_arc(evaluate.Metric):
def_info(self):
# TODO: Specifies the evaluate.EvaluationModuleInfo objectreturnevaluate.MetricInfo(
# This is the description that will appear on the modules page.module_type="metric",
description="",
citation="",
inputs_description="",
# This defines the format of each prediction and referencefeatures={},
)
def_compute(self, pretrained=None, tasks=[]):
outputs=lm_eval.simple_evaluate(
model="hf",
model_args={"pretrained":pretrained},
tasks=tasks,
num_fewshot=0,
)
results= {}
fortaskinoutputs['results']:
results[task] = {'acc':outputs['results'][task]['acc,none'],
'acc_norm':outputs['results'][task]['acc_norm,none']}
returnresults
And in our expected user-behavior is something like, [in]:
But the evaluate.Metric.compute() somehow expects a default batch and module.compute(pretrained="mistralai/Mistral-7B-Instruct-v0.2", tasks=["arc_easy"]) throws an error:
---------------------------------------------------------------------------ValueErrorTraceback (mostrecentcalllast)
[<ipython-input-20-bd94e5882ca5>](https://localhost:8080/#) in <cell line: 1>()---->1module.compute(pretrained="mistralai/Mistral-7B-Instruct-v0.2",
2tasks=["arc_easy"])
2frames
[/usr/local/lib/python3.10/dist-packages/evaluate/module.py](https://localhost:8080/#) in _get_all_cache_files(self)309ifself.num_process==1:
310ifself.cache_file_nameisNone:
-->311raiseValueError(
312"Evaluation module cache file doesn't exist. Please make sure that you call `add` or `add_batch` "313"at least once before calling `compute`."ValueError: Evaluationmodulecachefiledoesn'texist. Pleasemakesurethatyoucall`add`or`add_batch`atleastoncebeforecalling`compute`.
Q: Is it possible for the .compute() to expect no features?
I've also tried this but somehow the evaluate.Metric.compute is still looking for some sort of predictions variable.
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
class llm_harness_mistral_arc(evaluate.Metric):
def _info(self):
# TODO: Specifies the evaluate.EvaluationModuleInfo object
return evaluate.MetricInfo(
# This is the description that will appear on the modules page.
module_type="metric",
description="",
citation="",
inputs_description="",
# This defines the format of each prediction and reference
features=[
datasets.Features(
{
"pretrained": datasets.Value("string", id="sequence"),
"tasks": datasets.Sequence(datasets.Value("string", id="sequence"), id="tasks"),
}
)]
)
def _compute(self, pretrained, tasks):
outputs = lm_eval.simple_evaluate(
model="hf",
model_args={"pretrained":pretrained},
tasks=tasks,
num_fewshot=0,
)
results = {}
for task in outputs['results']:
results[task] = {'acc':outputs['results'][task]['acc,none'],
'acc_norm':outputs['results'][task]['acc_norm,none']}
return results
We've a use-case https://huggingface.co/spaces/alvations/llm_harness_mistral_arc/blob/main/llm_harness_mistral_arc.py
where default feature input types for
evaluate.Metric
is nothing and we get something like this in ourllm_harness_mistral_arc/llm_harness_mistral_arc.py
And in our expected user-behavior is something like, [in]:
And the expected output as per our
tests.py
, https://huggingface.co/spaces/alvations/llm_harness_mistral_arc/blob/main/tests.py [out]:But the
evaluate.Metric.compute()
somehow expects a default batch andmodule.compute(pretrained="mistralai/Mistral-7B-Instruct-v0.2", tasks=["arc_easy"])
throws an error:Q: Is it possible for the
.compute()
to expect no features?I've also tried this but somehow the
evaluate.Metric.compute
is still looking for some sort ofpredictions
variable.then:
[out]:
The text was updated successfully, but these errors were encountered: