You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
We are seeing the below error with the command. optimum-cli export onnx --model meta-llama/Llama-2-7b-chat-hf Llama-2-7b-chat-onnx/ --dtype bf16
(py311) D:\Users\anilm\hf>optimum-cli export onnx --model meta-llama/Llama-2-7b-chat-hf Llama-2-7b-chat-onnx/ --dtype bf16
C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\transformers\utils\hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
Framework not specified. Using pt to export the model.
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 614/614 [00:00<?, ?B/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 614/614 [00:00<00:00, 612kB/s]
model.safetensors.index.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 2.37MB/s]
model-00001-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.98G/9.98G [02:10<00:00, 76.5MB/s]
model-00002-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [00:41<00:00, 84.3MB/s]
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:52<00:00, 86.35s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00, 5.04s/it]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 188/188 [00:00<?, ?B/s]
Automatic task detection to text-generation-with-past (possible synonyms are: causal-lm-with-past).
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.62k/1.62k [00:00<?, ?B/s]
tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<?, ?B/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 8.42MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<?, ?B/s]
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Exporting the model LlamaForCausalLM in bfloat16 float dtype. After the export, ONNX Runtime InferenceSession with CPU/CUDA execution provider likely does not implement all operators for the bfloat16 data type, and the loading is likely to fail.
Using framework PyTorch: 2.2.1+cpu
Overriding 1 configuration item(s)
- use_cache -> True
C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\transformers\models\llama\modeling_llama.py:1057: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.causal_mask.shape[-1]:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\ProgramData\anaconda3\envs\py311\Scripts\optimum-cli.exe\__main__.py", line 7, in <module>
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\optimum\commands\optimum_cli.py", line 163, in main
service.run()
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\optimum\commands\export\onnx.py", line 261, in run
main_export(
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\optimum\exporters\onnx\__main__.py", line 351, in main_export
onnx_export_from_model(
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 1152, in onnx_export_from_model
_, onnx_outputs = export_models(
^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 763, in export_models
export(
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 868, in export
export_output = export_pytorch(
^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 577, in export_pytorch
onnx_export(
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\torch\onnx\utils.py", line 516, in export
_export(
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\torch\onnx\utils.py", line 1613, in _export
graph, params_dict, torch_out = _model_to_graph(
^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\torch\onnx\utils.py", line 1139, in _model_to_graph
graph = _optimize_graph(
^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\torch\onnx\utils.py", line 677, in _optimize_graph
graph = _C._jit_pass_onnx(graph, operator_export_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\torch\onnx\utils.py", line 1957, in _run_symbolic_function
return symbolic_fn(graph_context, *inputs, **attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\torch\onnx\symbolic_helper.py", line 306, in wrapper
return fn(g, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\py311\Lib\site-packages\torch\onnx\symbolic_opset14.py", line 197, in scaled_dot_product_attention
raise ValueError(
ValueError: Unsupported type for attn_mask: 15
Expected behavior
The latest optimum tool supports bfloat16 data type. We expect that exporting using bfloat16 will be successful.
The text was updated successfully, but these errors were encountered:
We could add an optional option in Optimum to choose that the export is done with the manual attention implementation, not torch.nn.functional.scaled_dot_product_attention. Would that help you?
As an alternative, you could downgrade to torch==2.1.0 (for which SDPA is not picked in Transformers).
System Info
Optimum version: installed from source (python -m pip install git+https://github.com/huggingface/optimum.git) Os: Windows 11 Pro Python: 3.11.7
Who can help?
@MiCh
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
We are seeing the below error with the command.
optimum-cli export onnx --model meta-llama/Llama-2-7b-chat-hf Llama-2-7b-chat-onnx/ --dtype bf16
Expected behavior
The latest optimum tool supports bfloat16 data type. We expect that exporting using bfloat16 will be successful.
The text was updated successfully, but these errors were encountered: