Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

开启onediff优化后,A100上unet执行结果为nan,但是V100上unet计算结果正常 #958

Closed
lss15151161 opened this issue Jun 17, 2024 · 8 comments
Assignees

Comments

@lss15151161
Copy link

lss15151161 commented Jun 17, 2024

Describe the bug

StableVideoDiffusionPipeline 开启onediff优化后,A100上unet执行结果为nan,V100上unet计算结果正常

Your environment

debian

OneDiff git commit id

onediff 0.13.0.dev202403280124
onediffx 0.13.0.dev202403280124

OneFlow version info

oneflow 0.9.1.dev20240615+cu118

How To Reproduce

Steps to reproduce the behavior(code or script):

The complete error message

Uploading image.png…

Additional context

Add any other context about the problem here.

@lijunliangTG
Copy link
Contributor

请您提供一下复现该行为的代码。您可以尝试先更新OneDiff版本到最新版本。

@lss15151161
Copy link
Author

lss15151161 commented Jun 19, 2024

请您提供一下复现该行为的代码。您可以尝试先更新OneDiff版本到最新版本。
您好,感谢回复,升级到最新版测试仍然有同样问题,复现代码如下。但是模型权重太大不好上传,您用开源权重应该也可以复现

import os
import torch
import torch.nn as nn
import json
import numpy as np
from diffusers.models import UNetSpatioTemporalConditionModel
import json
from onediff.infer_compiler import oneflow_compile

if __name__ == "__main__":
    unet_model_path = './xxx/unet'
    
    device = "cuda"
    m_torch = UNetSpatioTemporalConditionModel.from_pretrained(unet_model_path, torch_dtype=torch.float16).cuda().eval()

    m_oneflow = oneflow_compile(m_torch)

    # 创建输入
    with open("./mock_unet_input_a100.txt", 'r') as f:
        mock_input = json.load(f)
        latent_model_input = torch.tensor(mock_input['latent_model_input']).half().cuda()
        t = torch.tensor(mock_input['t']).cuda()
        image_embeddings = torch.tensor(mock_input["image_embeddings"]).half().cuda()
        added_time_ids = torch.tensor(mock_input["added_time_ids"]).half().cuda()

    with torch.no_grad():
        out = m_torch(latent_model_input, t, image_embeddings, added_time_ids, return_dict=False)
    print(out[0][0,0,0,0,:5])

    with torch.no_grad():
        out = m_oneflow(latent_model_input, t, image_embeddings, added_time_ids, return_dict=False)
    print(out[0][0,0,0,0,:5])

mock_unet_input_a100.txt

@lijunliangTG
Copy link
Contributor

您可以提供huggingface的模型名来帮助我复现您的错误

@lss15151161
Copy link
Author

lss15151161 commented Jun 20, 2024

您可以提供huggingface的模型名来帮助我复现您的错误

这个模型,用的这里的模型,试了fp16和fp32两种权重,加载权重的时候都使用fp16加载。 结果都是nan,https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/tree/main/unet

image ![Uploading image.png…]()

@lss15151161
Copy link
Author

您可以提供huggingface的模型名来帮助我复现您的错误

这个模型,用的这里的模型,试了fp16和fp32两种权重,加载权重的时候都使用fp16加载。 结果都是nan,https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/tree/main/unet

image Uploading image.png…

您可以提供huggingface的模型名来帮助我复现您的错误

这个模型,用的这里的模型,试了fp16和fp32两种权重,加载权重的时候都使用fp16加载。 结果都是nan,https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/tree/main/unet

image Uploading image.png…

A10,A30,A100都有同样的问题

@strint
Copy link
Collaborator

strint commented Jul 12, 2024

@lss15151161

Try this:

export ONEFLOW_ATTENTION_ALLOW_HALF_PRECISION_ACCUMULATION=False

@strint
Copy link
Collaborator

strint commented Jul 19, 2024

@lss15151161 Please have a try.

@lss15151161
Copy link
Author

@lss15151161 Please have a try.

thx,this solved my problem~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants