does diffuer example support batch size option? #6

ericlormul · 2022-10-03T22:32:52Z

In plain diffuser, if we make prompt a list, it will batch the input, but got following error in AITemplate. I make prompt a list of size 2.

{'trained_betas'} was not found in config. Values will be initialized to default values.
[18:28:36] ./tmp/CLIPTextModel/model-generated.h:275: Init AITemplate Runtime.
[18:28:37] ./tmp/UNet2DConditionModel/model-generated.h:3262: Init AITemplate Runtime.
[18:28:37] ./tmp/AutoencoderKL/model-generated.h:678: Init AITemplate Runtime.
[18:28:40] ./tmp/CLIPTextModel/model_interface.cu:92: Error: [SetValue] Dimension got value out of bounds; expected value to be in [1, 1], but got 2
Traceback (most recent call last):
  File "examples/05_stable_diffusion/demo.py", line 46, in <module>
    run()
  File "/home/root/miniconda3/envs/ldm/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/root/miniconda3/envs/ldm/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/root/miniconda3/envs/ldm/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/root/miniconda3/envs/ldm/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "examples/05_stable_diffusion/demo.py", line 37, in run
    image = pipe(prompt).images[0]
  File "/home/root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/root/repos/AITemplate/examples/05_stable_diffusion/pipeline_stable_diffusion_ait.py", line 247, in __call__
    text_embeddings = self.clip_inference(text_input.input_ids.to(self.device))
  File "/home/root/repos/AITemplate/examples/05_stable_diffusion/pipeline_stable_diffusion_ait.py", line 139, in clip_inference
    exe_module.run_with_tensors(inputs, ys, graph_mode=True)
  File "/home/root/miniconda3/envs/ldm/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 483, in run_with_tensors
    outputs_ait = self.run(
  File "/home/root/miniconda3/envs/ldm/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 438, in run
    return self._run_impl(
  File "/home/root/miniconda3/envs/ldm/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 377, in _run_impl
    self.DLL.AITemplateModelContainerRun(
  File "/home/root/miniconda3/envs/ldm/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 192, in _wrapped_func
    raise RuntimeError(f"Error in function: {method.__name__}")
RuntimeError: Error in function: AITemplateModelContainerRun

The text was updated successfully, but these errors were encountered:

antinucleon · 2022-10-03T22:34:20Z

Yes. the batch version is not merged yet. Check here:

https://github.com/terrychenism/AIT_StableDiffusion/tree/main/examples/05_stable_diffusion

terrychenism · 2022-10-04T01:48:44Z

batched sd: #8

ericlormul · 2022-10-09T00:52:58Z

Hi, i pulled the latest and commit and re-install and compile everything. but when I change the prompt variable in demo.py to a list of strings. It still gives out the above error. What's the correct way of doing batch size inference using StableDiffusionAITPipeline? Thanks!

lileilai · 2022-11-10T02:53:20Z

I have meet the same problem, Do you get the solution ？

* fixed batch_size > 1 * load so file for benchmark

* [runner] unified parallel builder/profiler * [lint] patched * [test] recover avg pool2d test * [task_runner] add comment for ftask_proc, fret_proc Co-authored-by: Bing Xu <bingxu@fb.com>

* updated to 5th stable diffusion checkpoint (facebookincubator#57) * updated to 5th stable diffusion checkpoint * updated all stable diffusion example files to checkpoint v1.5 * Support different sizes via recompilation (StableDiff demo) (facebookincubator#71) Mostly, this commit is just re-establishing the relationship between various previously-hardcoded constants and the target image size (since the latent size is 1/8 of the image size, hardcoding the latent sizes is inconvenient). This adds `--width` and `--height` options to both compile.py and demo.py, and provided these both match you can process different sizes. For img2img mode, the size options passed at compile time must match the size of the actual input image. Consequently, the `--img2img` flag for `compile.py` no longer exists: all this ever did was change the hardcoded size to match the default input image used by `demo_img2img.py`. Yikes. Sooo it's slightly more flexible than before, but still has no support for a single binary to handle different image sizes. It isn't super clear that compiling a generic binary is useful: the upstream project can do that just fine: isn't the whole point of AITemplates to achieve performance gains via aggressive constant propagation and benchmarking to select the optimal kernels? * v0.1.1 (facebookincubator#74) * v0.11 * update cutlass * fix * add missing files * patch cutlass Co-authored-by: Bing Xu <bingxu@fb.com> * fix sm86 conv (facebookincubator#81) Co-authored-by: Bing Xu <bingxu@fb.com> * fix README.md of bert example (facebookincubator#82) * Add negative prompts feature for txt2img pipeline (facebookincubator#75) Add optional negative prompt option for txt2img pipeline * add missing copyright headers (facebookincubator#86) * Conv2d group (facebookincubator#73) * group conv * add conv_groups op compiler * Conv2d groups * Conv2d depthwise * wip * wip * wip * wip * only one ops to get feedback * only one ops to get feedback * Fix layout, now test passes * Fix docstring * Add conv2d_depthwise_bias and test * Add conv2d_depthwise_bias and test and frontends * doc * frontend import depthwise * Fix lint * Fix lint * Fix after rebase UTs pass * fix lint * fix more lint * add more tile size for GN + update CK to main (facebookincubator#40) (facebookincubator#3) * add more tile size for gn * update ck Co-authored-by: Terry Chen <terrychen@meta.com> Co-authored-by: Terry Chen <hahakuku@hotmail.com> Co-authored-by: Terry Chen <terrychen@meta.com> * Ck remove unnecessary compile include directories (facebookincubator#4) * remove unnecessary include directory while compiling ck code * refactor data_type.hpp under ck/utility/data_type.hpp * Update docker to ROCm5.3 (facebookincubator#2) * upgrade compiler to ROCM 5.3 version * remove unnecessary build fixes Co-authored-by: illsilin <Illia.Silin@amd.com> * Fix BERT benchmark for 2 gcd (facebookincubator#6) * fixed batch_size > 1 * load so file for benchmark * Ci setup (facebookincubator#11) * add script for ci and testing * fix syntax * fix syntax again * get rid of the drun alias * get rid of interactive flag for docker * fix syntax * run docker without sudo * run some sanity checks before docker * change the run directive * fix syntax * merge build and test steps into one * fix the path to examples * add pytorch * fix syntax * install timm module * set paths in the docker * change the version of the pytorch * try running bert and vit models * add modules for bert * test if examples work with FB repo * try building the docker from the ait source * try building the docker from the rocm/ait repo * get rid of unnecessary changing paths * try running examples 1 and 4 * update docker arguments * fix syntax * try skippinfg the rebuilding steps * try using the same commits as Jing * check the pytorch version * force replacing pytorch * update the examples * remove the foreground commands * skip the BERT tests while using mi100 * clean up and add logfiles * archive the logfiles * fix path to log files, refine steps * fix paths * fix path to logfiles * specify exact paths to logs * fix syntax * fix syntax * get rid of workspace path in artifact paths * write log headers and archive them in one step * set git branch name as global env var * fix syntax * set the branch name value in each necessary step * test posting test results to db * add missing python packages * do not install glob module * do not convert dbsshport to int type * check the port value * hardcode ssh port * try re-running with new action secrets * skip the ssh tunnel * apply changes to all branches and use tunnel if not running on db host * change the syntax to check hostname * fix syntax * move the python script for processing the results * only run ci for the push branch * add BERT tests * modify the script to parse and store BERT test results * post-merge fix of pr 6 (facebookincubator#13) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com> * Add stable diffusion benchmark to the CI. (facebookincubator#16) * add compilation of stable diffusion * add missing python modules and new demos * add accelerate module and fix the parsing script * only use batch size 1 for stable diffusion * add stable diffusion benchmark result to the table * sync upstream v0.1.1 (facebookincubator#15) * updated to 5th stable diffusion checkpoint (facebookincubator#57) * updated to 5th stable diffusion checkpoint * updated all stable diffusion example files to checkpoint v1.5 * Support different sizes via recompilation (StableDiff demo) (facebookincubator#71) Mostly, this commit is just re-establishing the relationship between various previously-hardcoded constants and the target image size (since the latent size is 1/8 of the image size, hardcoding the latent sizes is inconvenient). This adds `--width` and `--height` options to both compile.py and demo.py, and provided these both match you can process different sizes. For img2img mode, the size options passed at compile time must match the size of the actual input image. Consequently, the `--img2img` flag for `compile.py` no longer exists: all this ever did was change the hardcoded size to match the default input image used by `demo_img2img.py`. Yikes. Sooo it's slightly more flexible than before, but still has no support for a single binary to handle different image sizes. It isn't super clear that compiling a generic binary is useful: the upstream project can do that just fine: isn't the whole point of AITemplates to achieve performance gains via aggressive constant propagation and benchmarking to select the optimal kernels? * v0.1.1 (facebookincubator#74) * v0.11 * update cutlass * fix * add missing files * patch cutlass Co-authored-by: Bing Xu <bingxu@fb.com> * fix profile * fix profile bugs * update ck commit * fix format * fix format * update timeout * add rocm unittest case Co-authored-by: Ivan Mikhnenkov <39604625+ivanmikhnenkov@users.noreply.github.com> Co-authored-by: Chris Kitching <chriskitching@linux.com> Co-authored-by: Bing Xu <antinucleon@gmail.com> Co-authored-by: Bing Xu <bingxu@fb.com> * merge amd-develop Co-authored-by: Ivan Mikhnenkov <39604625+ivanmikhnenkov@users.noreply.github.com> Co-authored-by: Chris Kitching <chriskitching@linux.com> Co-authored-by: Bing Xu <antinucleon@gmail.com> Co-authored-by: Bing Xu <bingxu@fb.com> Co-authored-by: Zhang Jun <ewalker@live.cn> Co-authored-by: Bozhao <yubz86@gmail.com> Co-authored-by: Max Podkorytov <maxdp@meta.com> Co-authored-by: Ehsan Azar <dashesy@gmail.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com> Co-authored-by: Terry Chen <hahakuku@hotmail.com> Co-authored-by: Terry Chen <terrychen@meta.com> Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

terrychenism closed this as completed Oct 4, 2022

asroy pushed a commit to shaojiewang/AITemplate that referenced this issue Nov 10, 2022

Fix BERT benchmark for 2 gcd (facebookincubator#6)

19f94f2

* fixed batch_size > 1 * load so file for benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

does diffuer example support batch size option? #6

does diffuer example support batch size option? #6

ericlormul commented Oct 3, 2022

antinucleon commented Oct 3, 2022

terrychenism commented Oct 4, 2022

ericlormul commented Oct 9, 2022

lileilai commented Nov 10, 2022

does diffuer example support batch size option? #6

does diffuer example support batch size option? #6

Comments

ericlormul commented Oct 3, 2022

antinucleon commented Oct 3, 2022

terrychenism commented Oct 4, 2022

ericlormul commented Oct 9, 2022

lileilai commented Nov 10, 2022