Hi guys,
I want to evaluate models like ModernBERT, Llama and many others on SuperGLUE and my own benchmark. In my setting, every model has to be fine-tuned for the specific task, even decoder models.
Is this currently supported by LightEval? Looking at the code, my feeling is that evaluations are only done by prompting.
Thanks.