Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create transformer pipeline because pytorch failed to be detected #31454

Closed
2 of 4 tasks
dannikay opened this issue Jun 17, 2024 · 9 comments
Closed
2 of 4 tasks

Comments

@dannikay
Copy link

System Info

Ubuntu 22.04
Python 3.12.3

Who can help?

@Narsil @zucchini-nlp

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import transformers
from transformers import is_torch_available
import torch
print(torch.__version__)
print(is_torch_available())

# Define the task that we want to use (required for proper pipeline construction)
task = "text2text-generation"

# Define the pipeline, using the task and a model instance that is applicable for our task.
generation_pipeline = transformers.pipeline(
    task=task,
    model="declare-lab/flan-alpaca-large",
)

# Define a simple input example that will be recorded with the model in MLflow, giving
# users of the model an indication of the expected input format.
input_example = ["prompt 1", "prompt 2", "prompt 3"]

# Define the parameters (and their defaults) for optional overrides at inference time.
parameters = {"max_length": 512, "do_sample": True, "temperature": 0.4}
2.3.1+cu121
False

...

RuntimeError: At least one of TensorFlow 2.0 or PyTorch should be installed. To install TensorFlow 2.0, read the instructions at https://www.tensorflow.org/install/ To install PyTorch, read the instructions at https://pytorch.org/.

Expected behavior

I do not expect the above error since pytorch has been installed in my system.

@amyeroberts
Copy link
Collaborator

cc @ydshieh

@dannikay
Copy link
Author

I can no longer reproduce this after restarting my notebook kernel. I suppose that the pytorch detection relies on some system environment which requires notebook kernel to restart to pickup. Feel free to close this.

@dannikay
Copy link
Author

It seems that _torch_available is a global variable and set at initialization time: https://github.com/huggingface/transformers/blob/02300273e220932a449a47ebbe453e7789be454b/src/transformers/utils/import_utils.py#L180C1-L180C17

So it won't be re-evaluated in the same notebook kernel when it was re-evaluated (until I restart my kernel). One improvement I can think of is to reevaluate this variable in is_torch_available() but I don't know if that breaks other things or not.

@amyeroberts
Copy link
Collaborator

amyeroberts commented Jun 18, 2024

So it won't be re-evaluated in the same notebook kernel when it was re-evaluated (until I restart my kernel).

Does this mean that torch is installed in the notebook after importing transformers?

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 18, 2024

In general, for notebook (well, I usually use Google Colab), if there is/are some installation(s), it is recommended to restart the kernel.

reevaluate this variable in is_torch_available()

We make use of _torch_available is (probably) and similar ideas for many other is_xxx_available is we try to avoid re-evaluation which might slowdown a lot.

I am open to a PR that improve stuff (the mentioned issue) but keep things fast (regarding what I mentioned above)

@dannikay
Copy link
Author

@amyeroberts torch is installed in the notebook kernel ("!pip install ...") and I re-executed the block to import transformer multiple times (with no avail).

@amyeroberts
Copy link
Collaborator

orch is installed in the notebook kernel ("!pip install ...") and I re-executed the block to import transformer multiple times (with no avail).

@dannikay Just to be clear, this means that transformers was already imported, torch installed, then the cell to import transformers re-executed?

As @ydshieh notes above, we have a cache which means is_torch_available is executed once and its result stored within a python session. This helps speed things up within the transformers library - we have lots of is_xxx_available flags which enable us to safely guard for different framework and modality usage e.g. PyTorch vs TensorFlow.

If you or anyone else wants to submit a PR which would make this more dynamic whilst maintaining speed, we'd be very happy to review!

@dannikay
Copy link
Author

@amyeroberts correct.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants