Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lazy init to stop hiding errors in import #14124

Merged
merged 1 commit into from Oct 25, 2021
Merged

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Oct 22, 2021

What does this PR do?

The problem

As was pointed out in #13007 and reported more recently in the internal slack, the lazy init used in Transformers hide the error messages one get at import time. A reproducer showed is:

pyenv virtualenv 3.8.9 test-bug
pyenv activate 3.8.9

pip install datasets huggingface_hub
pip install torch transformers==4.9.2
python -c "from transformers import pipeline"

This will only return ImportError: cannot import name 'pipeline' from 'transformers' with no other information. The underlying error comes from a mismatch between Datasets and huggingface-hub in this env (for all details, pipeline tries to import AutoModel which tries to import rag which tries to import Dataset), but Transformers hides it. If one does

python -c "from datasets import Dataset"

one will see the full error message

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/sgugger/.pyenv/versions/test-bug/lib/python3.8/site-packages/datasets/__init__.py", line 37, in <module>
    from .builder import ArrowBasedBuilder, BeamBasedBuilder, BuilderConfig, DatasetBuilder, GeneratorBasedBuilder
  File "/home/sgugger/.pyenv/versions/test-bug/lib/python3.8/site-packages/datasets/builder.py", line 44, in <module>
    from .data_files import DataFilesDict, _sanitize_patterns
  File "/home/sgugger/.pyenv/versions/test-bug/lib/python3.8/site-packages/datasets/data_files.py", line 120, in <module>
    dataset_info: huggingface_hub.hf_api.DatasetInfo,
AttributeError: module 'huggingface_hub.hf_api' has no attribute 'DatasetInfo'

That particular problem is fixed in later versions of Transformers since we decoupled the auto module from the others, but it's still a general bug in the lazy init.

The solution

The underlying problem comes from the fact that there are some errors being silently ignored in the import machinery of Python. @aphedges gave a preliminary report in #13007 which helped me pinpoint the problem to the _get_module private function which errors when we try to import the pipelines module in the env mentioned above, but that error is then discarded and no good error message is sent.

Changing that error to a RuntimeError sovles the issue, and with the modifications suggested in this PR, the line

python -c "from transformers import pipeline"

then gives the more informative

Traceback (most recent call last):
  File "/home/sgugger/git/transformers/src/transformers/file_utils.py", line 1996, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/home/sgugger/.pyenv/versions/3.8.9/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/sgugger/git/transformers/src/transformers/pipelines/__init__.py", line 33, in <module>
    from .automatic_speech_recognition import AutomaticSpeechRecognitionPipeline
  File "/home/sgugger/git/transformers/src/transformers/pipelines/automatic_speech_recognition.py", line 20, in <module>
    from .base import Pipeline
  File "/home/sgugger/git/transformers/src/transformers/pipelines/base.py", line 43, in <module>
    from ..models.auto.modeling_auto import AutoModel
  File "/home/sgugger/git/transformers/src/transformers/models/auto/modeling_auto.py", line 230, in <module>
    from ..rag.modeling_rag import (  # noqa: F401 - need to import all RagModels to be in globals() function
  File "/home/sgugger/git/transformers/src/transformers/models/rag/modeling_rag.py", line 30, in <module>
    from .retrieval_rag import RagRetriever
  File "/home/sgugger/git/transformers/src/transformers/models/rag/retrieval_rag.py", line 33, in <module>
    from datasets import Dataset, load_dataset, load_from_disk
  File "/home/sgugger/.pyenv/versions/test-bug/lib/python3.8/site-packages/datasets/__init__.py", line 37, in <module>
    from .builder import ArrowBasedBuilder, BeamBasedBuilder, BuilderConfig, DatasetBuilder, GeneratorBasedBuilder
  File "/home/sgugger/.pyenv/versions/test-bug/lib/python3.8/site-packages/datasets/builder.py", line 44, in <module>
    from .data_files import DataFilesDict, _sanitize_patterns
  File "/home/sgugger/.pyenv/versions/test-bug/lib/python3.8/site-packages/datasets/data_files.py", line 120, in <module>
    dataset_info: huggingface_hub.hf_api.DatasetInfo,
AttributeError: module 'huggingface_hub.hf_api' has no attribute 'DatasetInfo'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 1039, in _handle_fromlist
  File "/home/sgugger/git/transformers/src/transformers/file_utils.py", line 1985, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/sgugger/git/transformers/src/transformers/file_utils.py", line 1998, in _get_module
    raise RuntimeError(f"Failed to import {module_name} because of the following error (look up to see its traceback):\n{e}") from e
RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
module 'huggingface_hub.hf_api' has no attribute 'DatasetInfo'

Fixes #13007

@aphedges
Copy link
Contributor

@sgugger, thank you very much for creating a fix to this issue! I'm glad my write-up in #13007 helped.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, very nice! LGTM!

@sgugger sgugger merged commit 8560b55 into master Oct 25, 2021
@sgugger sgugger deleted the better_import_error branch October 25, 2021 20:53
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Importing hides underlying error
4 participants