Skip to content

Conversation

LysandreJik
Copy link
Member

This PR proposes a change in the way optional dependencies are managed so that it's easier to shield tasks that have specific dependencies. See below for an explanation of the different changes.

The first change renames the is_xxx_available() -> bool methods to a more versatile is_package_available(package_name: str) -> bool.

This unbloats the imports.py module while adding two additional methods that work on top of it:

  • is_multilingual_package_available(language_code: Optional[str] = None) -> bool which checks the presence of spacy, and optionally takes in a language code to check whether the language-related dependencies are installed.
  • the @requires(package_name) decorator which can be added on top of classes/methods to raise an error as soon as any method of the class/the method is called, if the package isn't installed.

FYI: this is a draft PR that isn't tested at this point, and that I need to test in a real-world scenario. I'd like to validate the direction before adding relevant tests.

Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM logic wise, one nit on extended (either they are part of the core deps or they are not then they require protection) - imo you can merge when tests pass


from lighteval.utils.imports import can_load_extended_tasks

import lighteval.tasks.extended.hle.main as hle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we're now adding their requs as core deps? works for me since they are worth it for the community

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are not core deps, but you can run them without having installed the needed deps for every extended tasks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm then I think we could have a is_package_available (or even) are_dependencies_available for each of them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we could map package to dependencies, for exemple:

DEPENDENCIES_MAPPING = {
    "accelerate": ["accelerate"],
    "extended:hle": [list of deps],
    ...
} 

then edit the is_package_available to iterate on the list and check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have a @requires on the prompt functions (for example for ifeval)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm i don't understand the link - wdyt of the dict approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clefourrier they're not core deps, still optional! but we have access to their declaration as the imports are now protected at the top level.

IMO this is good as it enables visibility into all tasks, even without some optional dependencies. I don't think visibility of the available tasks should be linked to the state of the installed dependencies.

import string

import langdetect
from ....utils.imports import is_package_available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not fond of the ...., would rather have an explicit path (unless there's a reason for this?)

Comment on lines 60 to 73
elif package_name == Extras.EXTENDED:
return "You are trying to run an evaluation requiring additional extensions. Please install the required extra: `pip install lighteval[extended] "
elif package_name == "text_generation":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is that for is the protection on extended was removed?

Comment on lines 101 to 111
raise_if_package_not_available("accelerate")
elif self.launcher_type == ParallelismManager.VLLM:
if not is_vllm_available():
raise ImportError(NO_VLLM_ERROR_MSG)
raise_if_package_not_available("vllm")
elif self.launcher_type == ParallelismManager.SGLANG:
if not is_sglang_available():
raise ImportError(NO_SGLANG_ERROR_MSG)
raise_if_package_not_available("sglang")
elif self.launcher_type == ParallelismManager.TGI:
if not is_tgi_available():
raise ImportError(NO_TGI_ERROR_MSG)
raise_if_package_not_available("tgi")
elif self.launcher_type == ParallelismManager.NANOTRON:
if not is_nanotron_available():
raise ImportError(NO_NANOTRON_ERROR_MSG)
raise_if_package_not_available("nanotron")
elif self.launcher_type == ParallelismManager.OPENAI:
if not is_openai_available():
raise ImportError(NO_OPENAI_ERROR_MSG)
raise_if_package_not_available("openai")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are checking for packages in 1000 diferent places, this maybe not related to this PR but i feel like we should harmonize this

Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

"""
if not is_spacy_available():
raise ImportError(NO_SPACY_ERROR_MSG)
raise_if_package_not_available("spacy")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can actually be removed

tokenizer = get_word_tokenizer(lang)

def _inner_normalizer(text: str) -> str:
tokenizer = get_word_tokenizer(lang)
Copy link
Member Author

@LysandreJik LysandreJik Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This move has it lazy init; it doesn't change anything except the scope as get_word_tokenizer is LRU cached

super().__init__()
if not can_load_stanza_tokenizer():
raise ImportError(NO_STANZA_TOKENIZER_ERROR_MSG)
raise_if_package_not_available("stanza")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, can be removed

Comment on lines 70 to 79
elif package_name == "text_generation":
return "You are trying to start a text generation inference endpoint, but TGI is not present in your local environment. Please install it using pip."
elif package_name in ["bitsandbytes", "auto-gptq"]:
return f"You are trying to load a model quantized with `{package_name}`, which is not available in your local environment. Please install it using pip."
elif package_name == "peft":
return "You are trying to use adapter weights models, for which you need `peft`, which is not available in your environment. Please install it using pip."
elif package_name == "openai":
return "You are trying to use an Open AI LLM as a judge, for which you need `openai`, which is not available in your environment. Please install it using pip."

CANNOT_USE_MULTILINGUAL_TASKS_MSG = "If you want to use multilingual tasks, make sure you installed their dependencies using `pip install -e .[multilingual]`."
return f"You requested the use of `{package_name}` for this evaluation, but it is not available in your current environment. Please install it using pip."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nit with this system is that we don't suggest package versions to the user, when often they are fixed in the pyproject.toml (ex with vllm where we're quite strict on version allowed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the user should install trough the extras API so no need to suggest a version imo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package versions specified in the pyproject.toml now get recommended

imports.append(importlib.util.find_spec(package))
return all(cur_import is not None for cur_import in imports)

class DummyObject(type):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what it's for? Is it only for tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a metaclass for all "dummy objects". Dummy objects are placeholders that will raise an import error as soon as you try to access any method tied to it.

This basically allows defining any class with any dependency; as you assign a @requires decorator to it, it will check the required dependencies. If you don't have the required dependencies installed, the decorator will return such placeholders instead of the original class.

The current doc is the following:

    Metaclass for the dummy objects. Any class inheriting from it will return the ImportError generated by
    `requires_backend` each time a user tries to access any method of that class.

Let me know if unclear and happy to provide some more

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit clearer but 'm still unsure in which case I should use this

Comment on lines +32 to +36
if is_package_available("syllapy"):
import syllapy

if is_package_available("spacy"):
import spacy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dreaming about being able to import them with no checks even if not installed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does add a bit of complexity to do so, but happy to do it if dependencies keep being added

}


@requires("langdetect")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the @requires here too?

@NathanHB NathanHB merged commit 6ee4ff8 into main Sep 19, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants