-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Description
System Info
(torch) (base) anxiang.zhang@n214-176-142:~/DeepSeek-Coder$ transformers-cli env
WARNING:tensorflow:From /data02/home/anxiang.zhang/miniconda3/envs/torch/lib/python3.10/site-packages/transformers/commands/env.py:100: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.config.list_physical_devices('GPU') instead.
An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
transformersversion: 4.36.0- Platform: Linux-5.4.56.bsk.9-amd64-x86_64-with-glibc2.28
- Python version: 3.10.13
- Huggingface_hub version: 0.20.1
- Safetensors version: 0.4.1
- Accelerate version: 0.25.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.2 (True)
- Tensorflow version (GPU?): 2.9.3 (True)
- Flax version (CPU?/GPU?/TPU?): 0.7.4 (cpu)
- Jax version: 0.4.18
- JaxLib version: 0.4.18
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
At transformers.modeling_attn_mask_utils.py:238.
The code is
tmp = torch.arange(attention_mask.shape[1], 0, -1)
indices = torch.argmax(attention_mask.cpu() * tmp, 1, keepdim=True)
The attention_mask.cpu() is clear an error when the global default tensor type is not a cpu dtype. For example, if you set torch.set_default_tensor_type(torch.cuda.HalfTensor). then the torch.arange(attention_mask.shape[1], 0, -1) would return a tensor on cuda instead of CPU, which will lead to error by multiplying a CPU tensor with CUDA tensor.
A simple fix would be replace attention_mask.cpu() as attention_mask.to(tmp.device)
Expected behavior
A simple fix would be replace attention_mask.cpu() as attention_mask.to(tmp.device)