-
Notifications
You must be signed in to change notification settings - Fork 359
[RFC] Rework the dependencies to be more versatile #951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM logic wise, one nit on extended (either they are part of the core deps or they are not then they require protection) - imo you can merge when tests pass
|
||
from lighteval.utils.imports import can_load_extended_tasks | ||
|
||
import lighteval.tasks.extended.hle.main as hle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we're now adding their requs as core deps? works for me since they are worth it for the community
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are not core deps, but you can run them without having installed the needed deps for every extended tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm then I think we could have a is_package_available
(or even) are_dependencies_available
for each of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we could map package to dependencies, for exemple:
DEPENDENCIES_MAPPING = {
"accelerate": ["accelerate"],
"extended:hle": [list of deps],
...
}
then edit the is_package_available
to iterate on the list and check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do have a @requires
on the prompt functions (for example for ifeval)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm i don't understand the link - wdyt of the dict approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clefourrier they're not core deps, still optional! but we have access to their declaration as the imports are now protected at the top level.
IMO this is good as it enables visibility into all tasks, even without some optional dependencies. I don't think visibility of the available tasks should be linked to the state of the installed dependencies.
import string | ||
|
||
import langdetect | ||
from ....utils.imports import is_package_available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not fond of the ...., would rather have an explicit path (unless there's a reason for this?)
src/lighteval/utils/imports.py
Outdated
elif package_name == Extras.EXTENDED: | ||
return "You are trying to run an evaluation requiring additional extensions. Please install the required extra: `pip install lighteval[extended] " | ||
elif package_name == "text_generation": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is that for is the protection on extended was removed?
src/lighteval/pipeline.py
Outdated
raise_if_package_not_available("accelerate") | ||
elif self.launcher_type == ParallelismManager.VLLM: | ||
if not is_vllm_available(): | ||
raise ImportError(NO_VLLM_ERROR_MSG) | ||
raise_if_package_not_available("vllm") | ||
elif self.launcher_type == ParallelismManager.SGLANG: | ||
if not is_sglang_available(): | ||
raise ImportError(NO_SGLANG_ERROR_MSG) | ||
raise_if_package_not_available("sglang") | ||
elif self.launcher_type == ParallelismManager.TGI: | ||
if not is_tgi_available(): | ||
raise ImportError(NO_TGI_ERROR_MSG) | ||
raise_if_package_not_available("tgi") | ||
elif self.launcher_type == ParallelismManager.NANOTRON: | ||
if not is_nanotron_available(): | ||
raise ImportError(NO_NANOTRON_ERROR_MSG) | ||
raise_if_package_not_available("nanotron") | ||
elif self.launcher_type == ParallelismManager.OPENAI: | ||
if not is_openai_available(): | ||
raise ImportError(NO_OPENAI_ERROR_MSG) | ||
raise_if_package_not_available("openai") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are checking for packages in 1000 diferent places, this maybe not related to this PR but i feel like we should harmonize this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
2b88b41
to
d0554a3
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
""" | ||
if not is_spacy_available(): | ||
raise ImportError(NO_SPACY_ERROR_MSG) | ||
raise_if_package_not_available("spacy") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can actually be removed
tokenizer = get_word_tokenizer(lang) | ||
|
||
def _inner_normalizer(text: str) -> str: | ||
tokenizer = get_word_tokenizer(lang) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This move has it lazy init; it doesn't change anything except the scope as get_word_tokenizer
is LRU cached
super().__init__() | ||
if not can_load_stanza_tokenizer(): | ||
raise ImportError(NO_STANZA_TOKENIZER_ERROR_MSG) | ||
raise_if_package_not_available("stanza") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, can be removed
src/lighteval/utils/imports.py
Outdated
elif package_name == "text_generation": | ||
return "You are trying to start a text generation inference endpoint, but TGI is not present in your local environment. Please install it using pip." | ||
elif package_name in ["bitsandbytes", "auto-gptq"]: | ||
return f"You are trying to load a model quantized with `{package_name}`, which is not available in your local environment. Please install it using pip." | ||
elif package_name == "peft": | ||
return "You are trying to use adapter weights models, for which you need `peft`, which is not available in your environment. Please install it using pip." | ||
elif package_name == "openai": | ||
return "You are trying to use an Open AI LLM as a judge, for which you need `openai`, which is not available in your environment. Please install it using pip." | ||
|
||
CANNOT_USE_MULTILINGUAL_TASKS_MSG = "If you want to use multilingual tasks, make sure you installed their dependencies using `pip install -e .[multilingual]`." | ||
return f"You requested the use of `{package_name}` for this evaluation, but it is not available in your current environment. Please install it using pip." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nit with this system is that we don't suggest package versions to the user, when often they are fixed in the pyproject.toml (ex with vllm where we're quite strict on version allowed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the user should install trough the extras API so no need to suggest a version imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package versions specified in the pyproject.toml now get recommended
imports.append(importlib.util.find_spec(package)) | ||
return all(cur_import is not None for cur_import in imports) | ||
|
||
class DummyObject(type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what it's for? Is it only for tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a metaclass for all "dummy objects". Dummy objects are placeholders that will raise an import error as soon as you try to access any method tied to it.
This basically allows defining any class with any dependency; as you assign a @requires
decorator to it, it will check the required dependencies. If you don't have the required dependencies installed, the decorator will return such placeholders instead of the original class.
The current doc is the following:
Metaclass for the dummy objects. Any class inheriting from it will return the ImportError generated by
`requires_backend` each time a user tries to access any method of that class.
Let me know if unclear and happy to provide some more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit clearer but 'm still unsure in which case I should use this
if is_package_available("syllapy"): | ||
import syllapy | ||
|
||
if is_package_available("spacy"): | ||
import spacy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dreaming about being able to import them with no checks even if not installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does add a bit of complexity to do so, but happy to do it if dependencies keep being added
} | ||
|
||
|
||
@requires("langdetect") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the @requires here too?
095fa52
to
f1c5ffe
Compare
3eca42d
to
98201ab
Compare
02609dc
to
5c79e2b
Compare
This PR proposes a change in the way optional dependencies are managed so that it's easier to shield tasks that have specific dependencies. See below for an explanation of the different changes.
The first change renames the
is_xxx_available() -> bool
methods to a more versatileis_package_available(package_name: str) -> bool
.This unbloats the
imports.py
module while adding two additional methods that work on top of it:is_multilingual_package_available(language_code: Optional[str] = None) -> bool
which checks the presence ofspacy
, and optionally takes in a language code to check whether the language-related dependencies are installed.@requires(package_name)
decorator which can be added on top of classes/methods to raise an error as soon as any method of the class/the method is called, if the package isn't installed.FYI: this is a draft PR that isn't tested at this point, and that I need to test in a real-world scenario. I'd like to validate the direction before adding relevant tests.