You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When you try to download a dataset that uses biopython, like load_dataset("InstaDeepAI/multi_species_genomes"), you get the error:
>>> from datasets import load_dataset
>>> dataset = load_dataset("InstaDeepAI/multi_species_genomes")
/home/j.vangoey/.pyenv/versions/multi_species_genomes/lib/python3.10/site-packages/datasets/load.py:1454: FutureWarning: The repository for InstaDeepAI/multi_species_genomes contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/InstaDeepAI/multi_species_genomes
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
Downloading builder script: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.51k/7.51k [00:00<00:00, 7.67MB/s]
Downloading readme: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17.2k/17.2k [00:00<00:00, 11.0MB/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/j.vangoey/.pyenv/versions/multi_species_genomes/lib/python3.10/site-packages/datasets/load.py", line 2548, in load_dataset
builder_instance = load_dataset_builder(
File "/home/j.vangoey/.pyenv/versions/multi_species_genomes/lib/python3.10/site-packages/datasets/load.py", line 2220, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/home/j.vangoey/.pyenv/versions/multi_species_genomes/lib/python3.10/site-packages/datasets/load.py", line 1871, in dataset_module_factory
raise e1 from None
File "/home/j.vangoey/.pyenv/versions/multi_species_genomes/lib/python3.10/site-packages/datasets/load.py", line 1844, in dataset_module_factory
).get_module()
File "/home/j.vangoey/.pyenv/versions/multi_species_genomes/lib/python3.10/site-packages/datasets/load.py", line 1466, in get_module
local_imports = _download_additional_modules(
File "/home/j.vangoey/.pyenv/versions/multi_species_genomes/lib/python3.10/site-packages/datasets/load.py", line 346, in _download_additional_modules
raise ImportError(
ImportError: To be able to use InstaDeepAI/multi_species_genomes, you need to install the following dependency: Bio.
Please install it using 'pip install Bio' for instance.
>>>
Bio comes from the biopython package that can be installed with pip install biopython, not with pip install Bio as suggested.
This PR adds special logic to show the correct package name in the error message of _download_additional_modules, similarly as is done for sklearn / scikit-learn already.
There are more packages where importable module name differs from the PyPI package name, so this could be made more generic, like:
# Mapping of importable module names to their PyPI package names
package_map = {
"sklearn": "scikit-learn",
"Bio": "biopython",
"PIL": "Pillow",
"bs4": "beautifulsoup4"
}
for module_name, pypi_name in package_map.items():
if module_name in needs_to_be_installed.keys():
needs_to_be_installed[module_name] = pypi_name
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When you try to download a dataset that uses biopython, like
load_dataset("InstaDeepAI/multi_species_genomes")
, you get the error:Bio
comes from thebiopython
package that can be installed withpip install biopython
, not withpip install Bio
as suggested.This PR adds special logic to show the correct package name in the error message of
_download_additional_modules
, similarly as is done forsklearn
/scikit-learn
already.There are more packages where importable module name differs from the PyPI package name, so this could be made more generic, like: