Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oneflow._softmax_backward_data is not implemented #93

Closed
Yaodada12 opened this issue Feb 6, 2023 · 24 comments
Closed

oneflow._softmax_backward_data is not implemented #93

Yaodada12 opened this issue Feb 6, 2023 · 24 comments

Comments

@Yaodada12
Copy link

Description

Traceback (most recent call last):
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1050, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 783, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 27, in
from ...modeling_utils import PreTrainedModel
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/modeling_utils.py", line 41, in
from .generation_utils import GenerationMixin
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/generation_utils.py", line 61, in
from .pytorch_utils import torch_int_div
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/pytorch_utils.py", line 19, in
from torch import _softmax_backward_data, nn
File "", line 1039, in _handle_fromlist
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/oneflow/mock_torch/init.py", line 42, in getattr
raise NotImplementedError(self.module.name + "." + name + error_msg)
NotImplementedError: oneflow._softmax_backward_data is not implemented, please submit an issue at
'https://github.com/Oneflow-Inc/oneflow/issues' including the log information of the error, the
minimum reproduction code, and the system information.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/user/Desktop/yao/projects/Text2img/AIdraw/en/utils/scheduling_ddim_oneflow.py", line 9, in
from diffusers.configuration_utils import ConfigMixin, register_to_config
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/diffusers/init.py", line 22, in
from transformers import CLIPTextModel, CLIPFeatureExtractor
File "", line 1039, in _handle_fromlist
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1041, in getattr
value = getattr(module, name)
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1040, in getattr
module = self._get_module(self._class_to_module[name])
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1052, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
oneflow._softmax_backward_data is not implemented, please submit an issue at
'https://github.com/Oneflow-Inc/oneflow/issues' including the log information of the error, the
minimum reproduction code, and the system information.

@strint
Copy link
Collaborator

strint commented Feb 6, 2023

You may need to update the transformers.

Delete the local oneflow version of transformers, directly use the official

python3 -m pip install transformers>=4.26

cd diffusers

python3 -m pip install -e .[oneflow]

Reference: https://github.com/Oneflow-Inc/diffusers/pull/83#discussion_r1092913239

@yuanms2
Copy link
Collaborator

yuanms2 commented Feb 6, 2023

The cause of the error is explained here:

huggingface/transformers#20796

@Yaodada12
Copy link
Author

Yaodada12 commented Feb 6, 2023

You may need to update the transformers.

Delete the local oneflow version of transformers, directly use the official

python3 -m pip install transformers>=4.26

cd diffusers

python3 -m pip install -e .[oneflow]

Reference: Oneflow-Inc/diffusers#83 (comment)

好的,多谢大佬。我发现里面谈到编译共享的问题,现在静态图编译是已经支持动态尺寸推理了吗?

@Yaodada12
Copy link
Author

The cause of the error is explained here:

huggingface/transformers#20796

多谢大佬!!

@strint
Copy link
Collaborator

strint commented Feb 6, 2023

我发现里面谈到编译共享的问题,现在静态图编译是已经支持动态尺寸推理了吗?

可以参见这里更新的评论:https://github.com/Oneflow-Inc/diffusers/issues/75#issuecomment-1418789541

@Yaodada12

@Yaodada12
Copy link
Author

我发现里面谈到编译共享的问题,现在静态图编译是已经支持动态尺寸推理了吗?

可以参见这里更新的评论:Oneflow-Inc/diffusers#75 (comment)

@Yaodada12

好的,点赞。

@terrancewang
Copy link

Hello, I had the same issue as this post. After updating the transformers package, I now have a similar error:

Traceback (most recent call last):
File "/home/terrance/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1110, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/terrance/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 27, in
from ...modeling_utils import PreTrainedModel
File "/home/terrance/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 83, in
from accelerate import version as accelerate_version
File "/home/terrance/.local/lib/python3.10/site-packages/accelerate/init.py", line 7, in
from .accelerator import Accelerator
File "/home/terrance/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 27, in
import torch.utils.hooks as hooks
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 674, in _load_unlocked
File "", line 571, in module_from_spec
File "/home/terrance/.local/lib/python3.10/site-packages/oneflow/mock_torch/init.py", line 88, in create_module
raise NotImplementedError(oneflow_mod_fullname + error_msg)
NotImplementedError: oneflow.utils.hooks is not implemented, please submit an issue at
'https://github.com/Oneflow-Inc/oneflow/issues' including the log information of the error, the
minimum reproduction code, and the system information.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/terrance/oneflow/test_diffusion.py", line 2, in
from diffusers import OneFlowStableDiffusionPipeline
File "/home/terrance/oneflow/diffusers/src/diffusers/init.py", line 22, in
from transformers import CLIPTextModel, CLIPFeatureExtractor
File "", line 1075, in _handle_fromlist
File "/home/terrance/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1101, in getattr
value = getattr(module, name)
File "/home/terrance/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1100, in getattr
module = self._get_module(self._class_to_module[name])
File "/home/terrance/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1112, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
oneflow.utils.hooks is not implemented, please submit an issue at
'https://github.com/Oneflow-Inc/oneflow/issues' including the log information of the error, the
minimum reproduction code, and the system information.

Anyone seen this before?

@strint strint transferred this issue from Oneflow-Inc/oneflow Feb 8, 2023
@yuanms2
Copy link
Collaborator

yuanms2 commented Feb 8, 2023

oneflow.utils.hooks is not implemented,
discussed here:
https://github.com/Oneflow-Inc/diffusers/issues/90

@jackalcooper
Copy link
Collaborator

looks like it has been resolved, feel free to reopen if not.

@Yaodada12
Copy link
Author

@strint 大佬,碰到个问题,代码安装3月份的oneflow-0.9.1.dev20230312+cu117会报错,你那有2月份的oneflow==0.9.1.dev20230216+cu117模型吗?

@strint
Copy link
Collaborator

strint commented Mar 13, 2023

碰到个问题,代码安装3月份的oneflow-0.9.1.dev20230312+cu117会报错

是什么问题呢,可以发下错误信息和 oneflow 版本号,我们跟进修一下 @Yaodada12

@Yaodada12
Copy link
Author

Yaodada12 commented Mar 13, 2023

@strint
版本:oneflow-0.9.1.dev20230312+cu117和oneflow-0.9.1.dev20230309+cu117都会报错,只有oneflow-0.9.1.dev20230216+cu117不报错。
还是之前那个问题。
F20230313 10:33:57.451692 1326001 user_op_conf.cpp:87] Check failed: attr.get() != nullptr attr_name: query_head_size
*** Check failure stack trace: ***
@ 0x7fe57e7a7c9a google::LogMessage::Fail()
@ 0x7fe57e7aabd1 google::LogMessage::SendToLog()
@ 0x7fe57e7a77c9 google::LogMessage::Flush()
@ 0x7fe57e7ab4b9 google::LogMessageFatal::~LogMessageFatal()
@ 0x7fe576a967a9 oneflow::user_op::UserOpConfWrapper::Attr4Name()
@ 0x7fe57866c50e oneflow::user_op::(anonymous namespace)::FusedMultiHeadAttentionInferenceKernel::Compute()
@ 0x7fe5776572f7 oneflow::UserKernel::ForwardUserKernel()
@ 0x7fe5776574bb oneflow::UserKernel::ForwardDataContent()
@ 0x7fe577623f63 oneflow::Kernel::Forward()
@ 0x7fe577624069 oneflow::Kernel::Launch()
@ 0x7fe577772e58 oneflow::(anonymous namespace)::LightActor<>::ProcessMsg()
@ 0x7fe577d4d62f oneflow::Thread::PollMsgChannel()
@ 0x7fe577d4dcc1 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7oneflow6ThreadC4ERKNS3_8StreamIdEEUlvE_EEEEE6_M_runEv
@ 0x7fe6b3ff9de4 (unknown)
@ 0x7fe72701e609 start_thread
@ 0x7fe726f43133 clone

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

@Yaodada12
Copy link
Author

@strint 大佬有2月份模型的whl文件或者下载链接吗,我先应个急。

@strint
Copy link
Collaborator

strint commented Mar 13, 2023

有2月份模型的whl文件或者下载链接吗,我先应个急。

@jackalcooper 知道哪里还有不,刚看了下 https://staging.oneflow.info/branch/master/cu117 都是最新的

@Yaodada12
Copy link
Author

@Ldpe2G @strint 多谢大佬们,上茶。

@strint
Copy link
Collaborator

strint commented Mar 13, 2023

@strint 版本:oneflow-0.9.1.dev20230312+cu117和oneflow-0.9.1.dev20230309+cu117都会报错,只有oneflow-0.9.1.dev20230216+cu117不报错。 还是之前那个问题。 F20230313 10:33:57.451692 1326001 user_op_conf.cpp:87] Check failed: attr.get() != nullptr attr_name: query_head_size *** Check failure stack trace: *** @ 0x7fe57e7a7c9a google::LogMessage::Fail() @ 0x7fe57e7aabd1 google::LogMessage::SendToLog() @ 0x7fe57e7a77c9 google::LogMessage::Flush() @ 0x7fe57e7ab4b9 google::LogMessageFatal::~LogMessageFatal() @ 0x7fe576a967a9 oneflow::user_op::UserOpConfWrapper::Attr4Name() @ 0x7fe57866c50e oneflow::user_op::(anonymous namespace)::FusedMultiHeadAttentionInferenceKernel::Compute() @ 0x7fe5776572f7 oneflow::UserKernel::ForwardUserKernel() @ 0x7fe5776574bb oneflow::UserKernel::ForwardDataContent() @ 0x7fe577623f63 oneflow::Kernel::Forward() @ 0x7fe577624069 oneflow::Kernel::Launch() @ 0x7fe577772e58 oneflow::(anonymous namespace)::LightActor<>::ProcessMsg() @ 0x7fe577d4d62f oneflow::Thread::PollMsgChannel() @ 0x7fe577d4dcc1 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7oneflow6ThreadC4ERKNS3_8StreamIdEEUlvE_EEEEE6_M_runEv @ 0x7fe6b3ff9de4 (unknown) @ 0x7fe72701e609 start_thread @ 0x7fe726f43133 clone

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

怎么触发这个问题呢,你用的 oneflow sd 的 commit id 可以帮忙发下

@liujuncheng 看这个报错 user_op_conf.cpp:87] Check failed: attr.get() != nullptr attr_name: query_head_size 是最近优化过的一个 op:Oneflow-Inc/oneflow#9963

@Yaodada12
Copy link
Author

@strint 好的,晚点我来看看

@liujuncheng
Copy link
Collaborator

@strint 版本:oneflow-0.9.1.dev20230312+cu117和oneflow-0.9.1.dev20230309+cu117都会报错,只有oneflow-0.9.1.dev20230216+cu117不报错。 还是之前那个问题。 F20230313 10:33:57.451692 1326001 user_op_conf.cpp:87] Check failed: attr.get() != nullptr attr_name: query_head_size *** Check failure stack trace: *** @ 0x7fe57e7a7c9a google::LogMessage::Fail() @ 0x7fe57e7aabd1 google::LogMessage::SendToLog() @ 0x7fe57e7a77c9 google::LogMessage::Flush() @ 0x7fe57e7ab4b9 google::LogMessageFatal::~LogMessageFatal() @ 0x7fe576a967a9 oneflow::user_op::UserOpConfWrapper::Attr4Name() @ 0x7fe57866c50e oneflow::user_op::(anonymous namespace)::FusedMultiHeadAttentionInferenceKernel::Compute() @ 0x7fe5776572f7 oneflow::UserKernel::ForwardUserKernel() @ 0x7fe5776574bb oneflow::UserKernel::ForwardDataContent() @ 0x7fe577623f63 oneflow::Kernel::Forward() @ 0x7fe577624069 oneflow::Kernel::Launch() @ 0x7fe577772e58 oneflow::(anonymous namespace)::LightActor<>::ProcessMsg() @ 0x7fe577d4d62f oneflow::Thread::PollMsgChannel() @ 0x7fe577d4dcc1 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7oneflow6ThreadC4ERKNS3_8StreamIdEEUlvE_EEEEE6_M_runEv @ 0x7fe6b3ff9de4 (unknown) @ 0x7fe72701e609 start_thread @ 0x7fe726f43133 clone
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

怎么触发这个问题呢,你用的 oneflow sd 的 commit id 可以帮忙发下

@liujuncheng 看这个报错 user_op_conf.cpp:87] Check failed: attr.get() != nullptr attr_name: query_head_size 是最近优化过的一个 op:Oneflow-Inc/oneflow#9963

有没有使用编译缓存之类的技术,如果有的话,要注意编译缓存是不能跨不同OneFlow的版本使用的,如果没有的话,看能不能提供一个复现脚本。

@Yaodada12
Copy link
Author

Yaodada12 commented Mar 13, 2023

@strint 版本:oneflow-0.9.1.dev20230312+cu117和oneflow-0.9.1.dev20230309+cu117都会报错,只有oneflow-0.9.1.dev20230216+cu117不报错。 还是之前那个问题。 F20230313 10:33:57.451692 1326001 user_op_conf.cpp:87] Check failed: attr.get() != nullptr attr_name: query_head_size *** Check failure stack trace: *** @ 0x7fe57e7a7c9a google::LogMessage::Fail() @ 0x7fe57e7aabd1 google::LogMessage::SendToLog() @ 0x7fe57e7a77c9 google::LogMessage::Flush() @ 0x7fe57e7ab4b9 google::LogMessageFatal::~LogMessageFatal() @ 0x7fe576a967a9 oneflow::user_op::UserOpConfWrapper::Attr4Name() @ 0x7fe57866c50e oneflow::user_op::(anonymous namespace)::FusedMultiHeadAttentionInferenceKernel::Compute() @ 0x7fe5776572f7 oneflow::UserKernel::ForwardUserKernel() @ 0x7fe5776574bb oneflow::UserKernel::ForwardDataContent() @ 0x7fe577623f63 oneflow::Kernel::Forward() @ 0x7fe577624069 oneflow::Kernel::Launch() @ 0x7fe577772e58 oneflow::(anonymous namespace)::LightActor<>::ProcessMsg() @ 0x7fe577d4d62f oneflow::Thread::PollMsgChannel() @ 0x7fe577d4dcc1 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7oneflow6ThreadC4ERKNS3_8StreamIdEEUlvE_EEEEE6_M_runEv @ 0x7fe6b3ff9de4 (unknown) @ 0x7fe72701e609 start_thread @ 0x7fe726f43133 clone
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

怎么触发这个问题呢,你用的 oneflow sd 的 commit id 可以帮忙发下
@liujuncheng 看这个报错 user_op_conf.cpp:87] Check failed: attr.get() != nullptr attr_name: query_head_size 是最近优化过的一个 op:Oneflow-Inc/oneflow#9963

有没有使用编译缓存之类的技术,如果有的话,要注意编译缓存是不能跨不同OneFlow的版本使用的,如果没有的话,看能不能提供一个复现脚本。

我靠,还真是。之前离线编译模型用的是2月份的模型,现在用pip install --pre oneflow -f https://staging.oneflow.info/branch/master/cu117 安装的oneflow是3月份的,导致报错,切换回2月份的模型就可以了。。

@strint
Copy link
Collaborator

strint commented Mar 13, 2023

Op 信息也属于 graph runtime_state_dict 的一部分(在执行计划 plan 中),最近版本更新了部分 Op 字段,这样之前版本保存的 runtime_state_dict,最近的版本不兼容了,导致 load graph 失败。

这个 case 还挺典型的,我新开个 issue 记录下。

@Yaodada12
Copy link
Author

Op 信息也属于 graph runtime_state_dict 的一部分(在执行计划 plan 中),最近版本更新了部分 Op 字段,这样之前版本保存的 runtime_state_dict,最近的版本不兼容了,导致 load graph 失败。

这个 case 还挺典型的,我新开个 issue 记录下。

给大佬们点赞。

@strint
Copy link
Collaborator

strint commented Apr 19, 2023

上面的链接没有了应该就没有现成的了。

你是不是可以考虑使用下新版本?接口是兼容的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants