-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example does not run due to missing cutlass lib #2
Comments
Please clone code with
git clone --recursive
to clone the 3rdparty code as well
On Mon, Oct 3, 2022 at 11:35 Chris Taylor ***@***.***> wrote:
To reproduce the error, I started a fresh install with these commands,
following the README guides:
cd python
python setup.py bdist_wheel
pip install dist/*.whl
cd ..
python3 examples/05_stable_diffusion/compile.py
│ModuleNotFoundError: No module named 'cutlass_lib'
—
Reply to this email directly, view it on GitHub
<#2>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTLXTUQMHLPG3S5JV27FLWBMRQZANCNFSM6AAAAAAQ33GZIA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Bing Xu
|
This fixed it for me. To fix my install, I had to run |
Ok we will add today! Thanks for suggestions!
On Mon, Oct 3, 2022 at 12:12 Chris Taylor ***@***.***> wrote:
This fixed it for me. To fix my install, I had to run pip install
dist/*.whl --force-reinstall so probably worth throwing that in the
README as well if you want to make it more fool-proof.
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTLXUQ6AKHHVZ46ZN6CMTWBMVZFANCNFSM6AAAAAAQ33GZIA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Bing Xu
|
asroy
added a commit
to shaojiewang/AITemplate
that referenced
this issue
Nov 10, 2022
* upgrade compiler to ROCM 5.3 version * remove unnecessary build fixes Co-authored-by: illsilin <Illia.Silin@amd.com>
tissue3
pushed a commit
to tissue3/AITemplate-1
that referenced
this issue
Feb 7, 2023
Summary: Pull Request resolved: fairinternal/AITemplate#1100 With this diff, the ops from the `conv` family are getting `float32` support. Namely: - `conv2d` - `conv2d_bias` - `conv2d_bias_relu` - `conv2d_bias_hardswish` - `conv2d_bias_sigmoid` - `conv2d_bias_add` - `conv2d_bias_add_relu` - `conv2d_bias_add_hardswish` - `conv2d_bias_few_channels` - `conv2d_bias_relu_few_channels` - `conv2d_bias_hardswish_few_channels` - `transposed_conv2d` - `transposed_conv2d_bias` - `transposed_conv2d_bias_relu` - `depthwise_conv3d` **A few points worth reviewer's attention:** **facebookincubator#1**. For the ops relying on the `cutlass` kernels, the tolerance of assertion in the respective unit tests had to be increased from `1e-2` to `5e-2` to make the tests pass for the `float32` version of the ops. If I've missed anything and the ops' output can be made closer to that of `pytorch`, please let me know. **facebookincubator#2.** `cutlass`'s SIMT kernels had to be excluded from selection for the `conv2d_bias_add_*` and `conv2d_*_bias_few_channels` kernels. Otherwise, generated CUDA code for the ops runs into template instantiation errors during compilation. Disabling SIMT kernels was inspired by the existing code here: https://www.internalfb.com/code/fbsource/[0f1fbb522f6ec10b23a6331da4adfdf2c9fe5908]/fbcode/aitemplate/AITemplate/python/aitemplate/backend/cuda/gemm_universal/common.py?lines=1072-1077 **facebookincubator#3.** There don't seem to be any kernels with `cutlass_lib.library.DataType.f32` inputs / outputs (`op.A.element`, `op.B.element`, etc.) in the `Target.current()._operators[Conv3d]` dict. As a result, even though the `conv3d` op's code is extended to support `fp32`, technically it doesn't work with `fp32` inputs, because the list of selected kernels returned from here ends up being empty (profiler fails first): https://www.internalfb.com/code/fbsource/[D41423689-V1]/fbcode/aitemplate/AITemplate/python/aitemplate/backend/cuda/conv3d/common.py?lines=235 My guess is that `conv3d`'s current limitation to `fp16` comes from the current content of the [`generator.py`](https://www.internalfb.com/code/fbsource/[dc7b8ee10f0c]/fbcode/aitemplate/AITemplate/fb/3rdparty/cutlass/tools/library/scripts/generator.py) in the `cutlass` ilbrary. Currently, `conv3d` operators are only created with the `fp16` arguments here: https://www.internalfb.com/code/fbsource/[dc7b8ee10f0c31078f1e1a2fbd703c91441ccd2a]/fbcode/aitemplate/AITemplate/fb/3rdparty/cutlass/tools/library/scripts/generator.py?lines=1663%2C1668%2C1673%2C1722-1724 `conv2d` operators, on the other hand, are also created with `fp32` arguments: https://www.internalfb.com/code/fbsource/[dc7b8ee10f0c]/fbcode/aitemplate/AITemplate/fb/3rdparty/cutlass/tools/library/scripts/generator.py?lines=2472%2C2505 Maybe inserting a `CreateConv3dOperator` call after the line 2505 could add `fp32` versions of `conv3d` op, too? Is this feasible? (A quick attempt of doing so has run into some `KeyError`s downstream in `emit_instance` calls on the created ops: I guess, it's not that trivial.) `fp32` test for `conv3d` is written but disabled for now by a `unittest.skip` with a message. Importantly, `depthwise_conv3d` *does* support `fp32` now: its code is hand-written, hence was possible to extend to `fp32`. **facebookincubator#4.** In `V1` the newly added `fp32` tests have successfully passed Sandcastle, but failed Circle CI. Looking into the similar diffs for gemm / bmm --- D41168398 (fairinternal/AITemplate@1549112) and D41246673 (fairinternal/AITemplate@e81b808) --- I've noticed that the added `fp32` tests there were guarded against CUDA arch < 80. As the CUDA arch in Circle CI seems to be 75, this probably explains the failure of the `fp32` tests there. So in `V2` I've added the same guard here, too. **facebookincubator#5.** As written currently, alignment-based filtering of the `conv2d` and `conv3d` ops won't allow any `fp32` cutlass kernels in case of the number of channels divisible by `8` (as the maximum possible `ab_alignment` would be `4` for `fp32`). E.g., for `conv2d`: https://www.internalfb.com/code/fbsource/[427a647ecb904df6e6b8556f524ebf1a7017e755]/fbcode/aitemplate/AITemplate/python/aitemplate/backend/cuda/conv2d/common.py?lines=217-226%2C246-254%2C229 Apparently, alignment-based filtering needs to become `dtype`-aware. To this end, the code above (also for `conv3d`) has been refactored in terms of the following function from the `utils.alignment`: https://www.internalfb.com/code/fbsource/[bf9d94d11f61]/fbcode/aitemplate/AITemplate/python/aitemplate/utils/alignment.py?lines=39-48 Reviewed By: chenyang78 Differential Revision: D41423689 fbshipit-source-id: 09c63e96238b3a9c6085b4bc3e4c0a49fde4b924
evshiron
pushed a commit
to are-we-gfx1100-yet/AITemplate
that referenced
this issue
Jun 21, 2023
* updated to 5th stable diffusion checkpoint (facebookincubator#57) * updated to 5th stable diffusion checkpoint * updated all stable diffusion example files to checkpoint v1.5 * Support different sizes via recompilation (StableDiff demo) (facebookincubator#71) Mostly, this commit is just re-establishing the relationship between various previously-hardcoded constants and the target image size (since the latent size is 1/8 of the image size, hardcoding the latent sizes is inconvenient). This adds `--width` and `--height` options to both compile.py and demo.py, and provided these both match you can process different sizes. For img2img mode, the size options passed at compile time must match the size of the actual input image. Consequently, the `--img2img` flag for `compile.py` no longer exists: all this ever did was change the hardcoded size to match the default input image used by `demo_img2img.py`. Yikes. Sooo it's slightly more flexible than before, but still has no support for a single binary to handle different image sizes. It isn't super clear that compiling a generic binary is useful: the upstream project can do that just fine: isn't the whole point of AITemplates to achieve performance gains via aggressive constant propagation and benchmarking to select the optimal kernels? * v0.1.1 (facebookincubator#74) * v0.11 * update cutlass * fix * add missing files * patch cutlass Co-authored-by: Bing Xu <bingxu@fb.com> * fix sm86 conv (facebookincubator#81) Co-authored-by: Bing Xu <bingxu@fb.com> * fix README.md of bert example (facebookincubator#82) * Add negative prompts feature for txt2img pipeline (facebookincubator#75) Add optional negative prompt option for txt2img pipeline * add missing copyright headers (facebookincubator#86) * Conv2d group (facebookincubator#73) * group conv * add conv_groups op compiler * Conv2d groups * Conv2d depthwise * wip * wip * wip * wip * only one ops to get feedback * only one ops to get feedback * Fix layout, now test passes * Fix docstring * Add conv2d_depthwise_bias and test * Add conv2d_depthwise_bias and test and frontends * doc * frontend import depthwise * Fix lint * Fix lint * Fix after rebase UTs pass * fix lint * fix more lint * add more tile size for GN + update CK to main (facebookincubator#40) (facebookincubator#3) * add more tile size for gn * update ck Co-authored-by: Terry Chen <terrychen@meta.com> Co-authored-by: Terry Chen <hahakuku@hotmail.com> Co-authored-by: Terry Chen <terrychen@meta.com> * Ck remove unnecessary compile include directories (facebookincubator#4) * remove unnecessary include directory while compiling ck code * refactor data_type.hpp under ck/utility/data_type.hpp * Update docker to ROCm5.3 (facebookincubator#2) * upgrade compiler to ROCM 5.3 version * remove unnecessary build fixes Co-authored-by: illsilin <Illia.Silin@amd.com> * Fix BERT benchmark for 2 gcd (facebookincubator#6) * fixed batch_size > 1 * load so file for benchmark * Ci setup (facebookincubator#11) * add script for ci and testing * fix syntax * fix syntax again * get rid of the drun alias * get rid of interactive flag for docker * fix syntax * run docker without sudo * run some sanity checks before docker * change the run directive * fix syntax * merge build and test steps into one * fix the path to examples * add pytorch * fix syntax * install timm module * set paths in the docker * change the version of the pytorch * try running bert and vit models * add modules for bert * test if examples work with FB repo * try building the docker from the ait source * try building the docker from the rocm/ait repo * get rid of unnecessary changing paths * try running examples 1 and 4 * update docker arguments * fix syntax * try skippinfg the rebuilding steps * try using the same commits as Jing * check the pytorch version * force replacing pytorch * update the examples * remove the foreground commands * skip the BERT tests while using mi100 * clean up and add logfiles * archive the logfiles * fix path to log files, refine steps * fix paths * fix path to logfiles * specify exact paths to logs * fix syntax * fix syntax * get rid of workspace path in artifact paths * write log headers and archive them in one step * set git branch name as global env var * fix syntax * set the branch name value in each necessary step * test posting test results to db * add missing python packages * do not install glob module * do not convert dbsshport to int type * check the port value * hardcode ssh port * try re-running with new action secrets * skip the ssh tunnel * apply changes to all branches and use tunnel if not running on db host * change the syntax to check hostname * fix syntax * move the python script for processing the results * only run ci for the push branch * add BERT tests * modify the script to parse and store BERT test results * post-merge fix of pr 6 (facebookincubator#13) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com> * Add stable diffusion benchmark to the CI. (facebookincubator#16) * add compilation of stable diffusion * add missing python modules and new demos * add accelerate module and fix the parsing script * only use batch size 1 for stable diffusion * add stable diffusion benchmark result to the table * sync upstream v0.1.1 (facebookincubator#15) * updated to 5th stable diffusion checkpoint (facebookincubator#57) * updated to 5th stable diffusion checkpoint * updated all stable diffusion example files to checkpoint v1.5 * Support different sizes via recompilation (StableDiff demo) (facebookincubator#71) Mostly, this commit is just re-establishing the relationship between various previously-hardcoded constants and the target image size (since the latent size is 1/8 of the image size, hardcoding the latent sizes is inconvenient). This adds `--width` and `--height` options to both compile.py and demo.py, and provided these both match you can process different sizes. For img2img mode, the size options passed at compile time must match the size of the actual input image. Consequently, the `--img2img` flag for `compile.py` no longer exists: all this ever did was change the hardcoded size to match the default input image used by `demo_img2img.py`. Yikes. Sooo it's slightly more flexible than before, but still has no support for a single binary to handle different image sizes. It isn't super clear that compiling a generic binary is useful: the upstream project can do that just fine: isn't the whole point of AITemplates to achieve performance gains via aggressive constant propagation and benchmarking to select the optimal kernels? * v0.1.1 (facebookincubator#74) * v0.11 * update cutlass * fix * add missing files * patch cutlass Co-authored-by: Bing Xu <bingxu@fb.com> * fix profile * fix profile bugs * update ck commit * fix format * fix format * update timeout * add rocm unittest case Co-authored-by: Ivan Mikhnenkov <39604625+ivanmikhnenkov@users.noreply.github.com> Co-authored-by: Chris Kitching <chriskitching@linux.com> Co-authored-by: Bing Xu <antinucleon@gmail.com> Co-authored-by: Bing Xu <bingxu@fb.com> * merge amd-develop Co-authored-by: Ivan Mikhnenkov <39604625+ivanmikhnenkov@users.noreply.github.com> Co-authored-by: Chris Kitching <chriskitching@linux.com> Co-authored-by: Bing Xu <antinucleon@gmail.com> Co-authored-by: Bing Xu <bingxu@fb.com> Co-authored-by: Zhang Jun <ewalker@live.cn> Co-authored-by: Bozhao <yubz86@gmail.com> Co-authored-by: Max Podkorytov <maxdp@meta.com> Co-authored-by: Ehsan Azar <dashesy@gmail.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com> Co-authored-by: Terry Chen <hahakuku@hotmail.com> Co-authored-by: Terry Chen <terrychen@meta.com> Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
To reproduce the error, I started a fresh install with these commands, following the README guides:
│ModuleNotFoundError: No module named 'cutlass_lib'
The text was updated successfully, but these errors were encountered: