-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPT dispatch tracker #96
Comments
Nice... the next step then is to take the |
Yes, we can start putting pieces together for pack based pipeline. As for the first step, when @erwei-xilinx 's pack/unpack lowering pass got merged, we can add an e2e example with a smaller input sizes and only parallel loops. After this works, we'll add reduction loops and peeling if needed. I have some additional comments in this PR. |
CC: @MaheshRavishankar @nirvedhmeshram @yzhang93 I worked on creating a utility which would hopefully be helpful for each of us to individually test out various sizes of Matmul (can be later on extended for other individual ops) - Testing utilities Following is my observation based on the recent addition of Pack based pipeline without changing any config and using pack_pipeline_e2e.mlir as the base input IR structure to test the matmul sizes :- NOTE: For each failure cases I have attached the logs too. Also, the below size combinations are a superset of the ones found in OPT.
Hope we can start making progress on this front now. EDIT: |
Thanks @Abhishek-Varma for testing on different input sizes. CC @erwei-xilinx in case he has some thoughts on the failure in AIR. |
Thanks @Abhishek-Varma , I wanted to add a relatively same test that I also see failure for, the error doesnt seem to be in the passes you mention as well, did you encounter this one?
Error:
|
Have the IREE Transform Dialect tiling script been updated accordingly? Here's an example of 128x256x128xi32 GEMM. https://gist.gitenterprise.xilinx.com/erweiw/9cdc1d6ead27e40d551963e2328c50b0 Note how tiling factors in the transform dialect are changed accordingly. |
Why does the tile size need to be changed according to the input sizes? I know it would affect the performance with different tile sizes and data layout, but it shouldn't cause a failure? Also, we are using the pack pipeline for the tests. |
AIR infers herd size (no of tiles) and L1 L2 memory buffer sizes from the loop structure generated from the script. At the moment AIR would try to strictlly follow the scf loop structure generated by the Transform script. |
Hi Nirvedh. No, I haven't encountered this error. |
Transform dialect script wasn't used for the above tests. We now have an equivalent C++ Pack based pipeline which was used. |
Thanks @erwei-xilinx! I think now it makes sense to me. So the tests are actually sensitive to the tile sizes and the pack sizes. For these particular cases |
Lets discuss this further... Ideally the tile sizes are fully derived by architecture size. Lets restrict ourselves for multiples of tile sizes for now (cause we have to see what happens with padding semantics of packing). But apart from that they should not matter. Maybe if we set a tile size of |
This example should help with explaining some of the issues being discussed here: #124 (comment) |
Hi @MaheshRavishankar @yzhang93 @nirvedhmeshram This is based on Vivian's patch : Update tile and pack. This looks like a better thread for this update since Vivian rightly mentioned in this thread that we're able to generate vmfbs for I have confirmed each of the generated vmfbs yield correct results. I therefore currently working on |
Hi @MaheshRavishankar @yzhang93 @nirvedhmeshram Continuing on yesterday's tryst with The IR after all
NOTE: You won't see This is now failing at Please let me know the following :-
I'll accordingly look into |
Hi @Abhishek-Varma, can you try and see if this patch fixed the |
This is perhaps a feature that we may need to discuss, because currently AIR shall try to convert this 3D forall loop into a physical 2D herd, and currently AIR is not able to automatically figure out a mapping from 3D space to 2D hw... |
Hi @erwei-xilinx - Your fix patch seems to work. Thanks! This is the current state of the lowering. I've raised the current branch as a WIP PR for review : Enable linalg.batch_matmul PR. |
Thanks for putting together the WIP PR. Just browsing through the IR, and I noticed there seems to be an issue at I'll take a closer look tomorrow. |
As discussed earlier today, we should also try to run different matmul sizes with pad pipeline. The pad pipeline could be used as a baseline, and @erwei-xilinx is pushing forward the pad pipeline to work with large K dimension this week. @Abhishek-Varma would you like to run the same experiments with pad pipeline? |
Hi @yzhang93 - I'm currently taking a look at |
CC: @MaheshRavishankar @yzhang93 @nirvedhmeshram @erwei-xilinx My updates for the day (besides the WIP PR for
|
Thanks @Abhishek-Varma for adding the infra support for other required ops. For larger M/N size like 2048, you should probably try to test based on this PR. |
Hi @MaheshRavishankar @yzhang93 @nirvedhmeshram @erwei-xilinx My updates concerning this thread :-
|
@Abhishek-Varma For pack based pipeline to work with large GEMM, it still needs some passes and new config sizes which I'm currently working on. I would suggest stop testing this until I have something running with reduction loops. For pad based pipeline, it's the similar thing, we should change the config sizes accordingly. Note there is another variable that needs to be changed for smaller GEMM sizes, which I haven't fully explored. https://github.com/nod-ai/iree-amd-aie/pull/129/files#diff-42f0d0bb098689f25ee68e8f05ec6c2eaa89ce41a6394d7d99c1d1c912943b38R390 |
@kumardeepakamd this may be useful for you so sharing here, this is the "flat" linalg graph that IREE sees as input |
The text was updated successfully, but these errors were encountered: