Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Triton Error [CUDA]: invalid argument #3382

Closed
abhijitpal1247 opened this issue Apr 26, 2023 · 6 comments
Closed

[BUG] Triton Error [CUDA]: invalid argument #3382

abhijitpal1247 opened this issue Apr 26, 2023 · 6 comments
Labels
bug Something isn't working inference

Comments

@abhijitpal1247
Copy link

abhijitpal1247 commented Apr 26, 2023

Describe the bug
A clear and concise description of what the bug is.
Facing this error RuntimeError: Triton Error [CUDA]: invalid argument, while using deepspeed inference for stable-diffusion model.

To Reproduce
Steps to reproduce the behavior:

  1. Simple inference script to reproduce
from diffusers import StableDiffusionPipeline
import torch
import os
import shutil

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe._progress_bar_config = {"disable": True}

import deepspeed
with torch.inference_mode():
  deepspeed.init_inference(
        model=getattr(pipe,"model", pipe),      # Transformers models
        #mp_size=1,        # Number of GPU
        dtype=torch.float16, # dtype of the weights (fp16)
        #replace_method="auto", # Lets DS autmatically identify the layer to replace
        replace_with_kernel_inject=True, # replace the model with the kernel injector
    )
  image = pipe("A Happy CEO").images[0]
  1. What packages are required and their versions
    deepspeed==0.9.1+fef5aa6e
    diffusers==0.13.1
    transformers==4.27.3
    triton==2.0.0.dev20221202
    accelerate==0.16.0
    xformers==0.0.16
    huggingface_hub==0.12.0
    torch==1.13.1

Expected behavior
Execute without any issue.

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables towhere it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/envs/trlx/lib/python3.10/site-packages/torch']
torch version .................... 1.13.1+cu117
deepspeed install path ........... ['/opt/conda/envs/trlx/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.9.1+fef5aa6e, fef5aa6e, HEAD
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

System info (please complete the following information):

  • OS:
 NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
  • GPU count and types
    one Tesla T4

  • (if applicable) Hugging Face Transformers/Accelerate/etc. versions

  • Python version
    Python 3.10.9

Docker context
Using Conda to maintain environments

Additional context
Error log:

[2023-04-26 05:38:51,186] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.1+fef5aa6e, git-hash=fef5aa6e, git-branch=HEAD
[2023-04-26 05:38:51,188] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
**** found and replaced vae w. <class 'deepspeed.model_implementations.diffusers.vae.DSVAE'>
Using /home/ec2-user/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ec2-user/.cache/torch_extensions/py310_cu117/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Time to load transformer_inference op: 0.08540964126586914 seconds
[2023-04-26 05:38:51,584] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Attention config: {'layer_id': 0, 'hidden_size': 320, 'intermediate_size': 1280, 'heads': 8, 'num_hidden_layers': -1, 'fp16': True, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-12, 'mp_size': 1, 'q_int8': False, 'scale_attention': True, 'triangular_masking': False, 'local_attention': False, 'window_size': 256, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 4096, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False}
Time to load transformer_inference op: 0.002669811248779297 seconds
Loading extension module transformer_inference...
Using /home/ec2-user/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Using /home/ec2-user/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ec2-user/.cache/torch_extensions/py310_cu117/spatial_inference/build.ninja...
Building extension module spatial_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Loading extension module spatial_inference...
Time to load spatial_inference op: 0.08253216743469238 seconds
**** found and replaced unet w. <class 'deepspeed.model_implementations.diffusers.unet.DSUNet'>
Using /home/ec2-user/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module spatial_inference, skipping build step...
Loading extension module spatial_inference...
Time to load spatial_inference op: 0.002627134323120117 seconds
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in _fwd_kernel                                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 
('2-.-0-.-0-d82511111ad128294e9d31a6ac684238-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962
222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033
f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, 
torch.float16, 'fp32', torch.float32, torch.float16, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32',
'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (128, 128, 128), (True, True, True, 
(False,), True, True, (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), 
(True, False), (False, True), (True, False), (True, False), (True, False), (False, True), (True, False), (True, 
False), (True, False), (False, True), (False, False), (False, False), (True, False)))

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>                                                                                      │
│                                                                                                  │
│    7 │   │   #replace_method="auto", # Lets DS autmatically identify the layer to replace        │
│    8 │   │   replace_with_kernel_inject=True, # replace the model with the kernel injector       │
│    9 │   )                                                                                       │
│ ❱ 10   image = pipe("A Happy CEO").images[0]                                                     │
│   11                                                                                             │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/torch/autograd/grad_mode.py:27 in              │
│ decorate_context                                                                                 │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_ │
│ stable_diffusion.py:643 in __call__                                                              │
│                                                                                                  │
│   640 │   │   │   │   latent_model_input = self.scheduler.scale_model_input(latent_model_input   │
│   641 │   │   │   │                                                                              │
│   642 │   │   │   │   # predict the noise residual                                               │
│ ❱ 643 │   │   │   │   noise_pred = self.unet(                                                    │
│   644 │   │   │   │   │   latent_model_input,                                                    │
│   645 │   │   │   │   │   t,                                                                     │
│   646 │   │   │   │   │   encoder_hidden_states=prompt_embeds,                                   │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl  │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/deepspeed/model_implementations/diffusers/unet │
│ .py:44 in forward                                                                                │
│                                                                                                  │
│   41 │   │   │   │   outputs = self._graph_replay(*inputs, **kwargs)                             │
│   42 │   │   │   return outputs                                                                  │
│   43 │   │   else:                                                                               │
│ ❱ 44 │   │   │   return self._forward(*inputs, **kwargs)                                         │
│   45 │                                                                                           │
│   46 │   def _create_cuda_graph(self, *inputs, **kwargs):                                        │
│   47 │   │   # warmup to create the workspace and cublas handle                                  │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/deepspeed/model_implementations/diffusers/unet │
│ .py:73 in _forward                                                                               │
│                                                                                                  │
│   70 │   │   │   │   │   │   │    return_dict,                                                   │
│   71 │   │   │   │   │   │   │    cross_attention_kwargs=cross_attention_kwargs)                 │
│   72 │   │   else:                                                                               │
│ ❱ 73 │   │   │   return self.unet(sample, timestamp, encoder_hidden_states, return_dict)         │
│   74                                                                                             │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl  │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:580 in   │
│ forward                                                                                          │
│                                                                                                  │
│   577 │   │   down_block_res_samples = (sample,)                                                 │
│   578 │   │   for downsample_block in self.down_blocks:                                          │
│   579 │   │   │   if hasattr(downsample_block, "has_cross_attention") and downsample_block.has   │
│ ❱ 580 │   │   │   │   sample, res_samples = downsample_block(                                    │
│   581 │   │   │   │   │   hidden_states=sample,                                                  │
│   582 │   │   │   │   │   temb=emb,                                                              │
│   583 │   │   │   │   │   encoder_hidden_states=encoder_hidden_states,                           │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl  │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py:837 in      │
│ forward                                                                                          │
│                                                                                                  │
│    834 │   │   │   │   )[0]                                                                      │
│    835 │   │   │   else:                                                                         │
│    836 │   │   │   │   hidden_states = resnet(hidden_states, temb)                               │
│ ❱  837 │   │   │   │   hidden_states = attn(                                                     │
│    838 │   │   │   │   │   hidden_states,                                                        │
│    839 │   │   │   │   │   encoder_hidden_states=encoder_hidden_states,                          │
│    840 │   │   │   │   │   cross_attention_kwargs=cross_attention_kwargs,                        │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl  │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/diffusers/models/transformer_2d.py:265 in      │
│ forward                                                                                          │
│                                                                                                  │
│   262 │   │                                                                                      │
│   263 │   │   # 2. Blocks                                                                        │
│   264 │   │   for block in self.transformer_blocks:                                              │
│ ❱ 265 │   │   │   hidden_states = block(                                                         │
│   266 │   │   │   │   hidden_states,                                                             │
│   267 │   │   │   │   encoder_hidden_states=encoder_hidden_states,                               │
│   268 │   │   │   │   timestep=timestep,                                                         │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl  │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/diffusers_ │
│ transformer_block.py:91 in forward                                                               │
│                                                                                                  │
│    88 │   │   │   context = kwargs["encoder_hidden_states"]                                      │
│    89 │   │                                                                                      │
│    90 │   │   out_norm_1 = self.transformer_cuda_module.layer_norm(hidden_states, self.norm1_g   │
│ ❱  91 │   │   out_attn_1 = self.attn_1(out_norm_1)                                               │
│    92 │   │                                                                                      │
│    93 │   │   out_norm_2, out_attn_1 = self.transformer_cuda_module.layer_norm_residual_store_   │
│    94 │   │   │   out_attn_1, self.attn_1_bias, hidden_states, self.norm2_g, self.norm2_b, sel   │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl  │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/diffusers_ │
│ attention.py:188 in forward                                                                      │
│                                                                                                  │
│   185 │   │   │   │   │   │   │   │   │   input.size()[1],                                       │
│   186 │   │   │   │   │   │   │   │   │   input.size()[0], DeepSpeedDiffusersAttention.layer_i   │
│   187 │   │   │   │   │   │   │   │   │   0, self.config.max_out_tokens, self.config.min_out_t   │
│ ❱ 188 │   │   output = DeepSpeedDiffusersAttentionFunction.apply(input, context, input_mask, s   │
│   189 │   │   │   │   │   │   │   │   │   │   │   │   │   │      self.attn_qw, self.attn_kw, s   │
│   190 │   │   │   │   │   │   │   │   │   │   │   │   │   │      self.num_attention_heads_per_   │
│   191 │   │   │   │   │   │   │   │   │   │   │   │   │   │      self.hidden_size_per_partitio   │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/diffusers_ │
│ attention.py:88 in forward                                                                       │
│                                                                                                  │
│    85 │   │   │   output = linear_func(context_layer, attn_ow, attn_ob, do_out_bias, False, co   │
│    86 │   │   │   return output                                                                  │
│    87 │   │                                                                                      │
│ ❱  88 │   │   output = selfAttention_fp(input, context, input_mask)                              │
│    89 │   │                                                                                      │
│    90 │   │   return output                                                                      │
│    91                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/diffusers_ │
│ attention.py:64 in selfAttention_fp                                                              │
│                                                                                                  │
│    61 │   │   │   │   qkv_out = linear_func(input, attn_qkvw, attn_qkvb if attn_qkvb is not No   │
│    62 │   │   │   │   │   │   │   │   │     is not None, do_flash_attn, config.heads, False)     │
│    63 │   │   │   │                                                                              │
│ ❱  64 │   │   │   │   context_layer = triton_flash_attn_kernel(qkv_out[0], qkv_out[1], qkv_out   │
│    65 │   │   │   │   │   │   │   │   │   │   │   │   │   │    input.shape[-2] % 128 == 0)       │
│    66 │   │   │   │   context_layer = _transpose_for_context(context_layer[:, :, :, :head_size   │
│    67                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl  │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/triton_ops │
│ .py:121 in forward                                                                               │
│                                                                                                  │
│   118 │   │   tmp = torch.empty((q.shape[0] * q.shape[1], q.shape[2]), device=q.device, dtype=   │
│   119 │   │   num_warps = 4 if Lk <= 64 else 8                                                   │
│   120 │   │                                                                                      │
│ ❱ 121 │   │   _fwd_kernel[grid](                                                                 │
│   122 │   │   │   q,                                                                             │
│   123 │   │   │   k,                                                                             │
│   124 │   │   │   v,                                                                             │
│                                                                                                  │
│ /opt/conda/envs/trlx/lib/python3.10/site-packages/triton/runtime/jit.py:106 in launcher          │
│                                                                                                  │
│   103 │   │   memorizes the grid.                                                                │
│   104 │   │   """                                                                                │
│   105 │   │   def launcher(*args, **kwargs):                                                     │
│ ❱ 106 │   │   │   return self.run(*args, grid=grid, **kwargs)                                    │
│   107 │   │   return launcher                                                                    │
│   108                                                                                            │
│   109                                                                                            │
│ in _fwd_kernel                                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Triton Error [CUDA]: invalid argument
@abhijitpal1247 abhijitpal1247 added bug Something isn't working inference labels Apr 26, 2023
@CrossNox
Copy link

Hi @abhijitpal1247, perhaps my comment in another issue could be of help: microsoft/DeepSpeed-MII#170 (comment)

@abhijitpal1247
Copy link
Author

@CrossNox looks like it. #2942 and #2702 also have experienced similar issue while using T4, across different versions of deepspeed.

@Dentoty
Copy link

Dentoty commented Apr 28, 2023

Run this in a colab to reproduce #2968 and this one

Here's some code to speedrun the error:
AttributeError: 'StableDiffusionPipeline' object has no attribute 'children'

Which I believe is still not fixed.

!pip install diffusers==0.15.0 torch==1.13.1 transformers==4.28.1 triton==2.0.0.dev20221105

%cd /content/sample_data
!git clone https://github.com/microsoft/DeepSpeed.git

%cd /content/sample_data/DeepSpeed/requirements
!pip install -r requirements.txt
%cd /content/sample_data/DeepSpeed
!pip install .
!export PYTHONPATH="$PYTHONPATH:/content/sample_data/DeepSpeed"


import os, torch, diffusers, deepspeed

pipe = diffusers.StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
    revision="fp16",
    replace_with_kernel_inject=True # replace the model with the kernel injector 
)

model = deepspeed.init_inference(pipe.to("cuda"), dtype=torch.float16)
model("hello from here")

Here's some code to speedrun the error:
RuntimeError: Triton Error [CUDA]: invalid argument

!pip install torch
!pip install diffusers==0.14.0 triton==2.0.0.dev20221202
!pip install transformers accelerate

%cd /content/sample_data
!git clone https://github.com/microsoft/DeepSpeed.git

%cd /content/sample_data/DeepSpeed/requirements
!pip install -r requirements.txt
%cd /content/sample_data/DeepSpeed
!pip install .
!export PYTHONPATH="$PYTHONPATH:/content/sample_data/DeepSpeed"

import torch
import deepspeed
from diffusers import StableDiffusionPipeline
print(deepspeed.__version__)
# load vanilla pipeline
ds_pipeline = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    torch_dtype=torch.float16
).to("cuda")

# init deepspeed inference engine
deepspeed.init_inference(
    model=getattr(ds_pipeline,"model", ds_pipeline),      # Transformers models
    mp_size=1,        # Number of GPU
    dtype=torch.float16, # dtype of the weights (fp16)
    replace_method="auto", # Lets DS autmatically identify the layer to replace
    replace_with_kernel_inject=True, # replace the model with the kernel injector
)
print("DeepSpeed Inference Engine initialized")


image = ds_pipeline("a photo of an astronaut riding a horse on mars").images[0]

image.show()

@loadams
Copy link
Contributor

loadams commented Jul 24, 2023

Can you test again with the latest DeepSpeed with the triton versions updated? If you are still seeing this, can you re-open this issue?

@loadams loadams closed this as completed Jul 24, 2023
@hayday100
Copy link

I ran into a similar error. I can confirm I updated both triton (to 2.0.0) and deepspeed (to 0.10.0) but the problem persists. Here is the error message.

miniconda3/envs/py39/lib/python3.9/site-packages/triton_pre_mlir/run │
│ time/autotuner.py:200 in run │
│ │
│ 197 │ def run(self, *args, **kwargs): │
│ 198 │ │ for v, heur in self.values.items(): │
│ 199 │ │ │ kwargs[v] = heur({**dict(zip(self.arg_names, args)), **kwargs}) │
│ ❱ 200 │ │ return self.fn.run(*args, **kwargs) │
│ 201 │
│ 202 │
│ 203 def heuristics(values): │
│ in _fwd_kernel:43 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Triton Error [CUDA]: invalid argument

@loadams
Copy link
Contributor

loadams commented Aug 8, 2023

@hayday100 - for now, can you use the triton version listed in requirements-sd.txt? That version specifically works when running our unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inference
Projects
None yet
Development

No branches or pull requests

5 participants