Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating alp with iree-llvm-sandbox #83

Merged
merged 1 commit into from
Dec 8, 2021

Conversation

giuseros
Copy link
Collaborator

@giuseros giuseros commented Dec 3, 2021

This is the first integration for our project in alp/experimental. I tried to make the minimal number of changes outside of this folder, but:

  • I still wanted to integrate with mlir-proto-opt
  • I still want to be able to switch the experimental/alp folder from the root CMakeLists.txt (although everything is disabled by default)
    All of this is super-early stage and it is very buggy and not tested. However, I wanted to show earlier than later how we would like to integrate with this, and what were the main ideas. In particular:
  • I integrate at the c++ level (so mostly reusing mlir-proto-opt, but not reusing the Pyhton harness you have)
  • The major components are there (minimal python harness+autotuner+few simple passes). Once I have it cleaned up, I can add a README.md to explain how to use it.
  • If you have any early comment on anything, please feel free to add them or to open an issue

Thank you so much,
Giuseppe

CMakeLists.txt Outdated Show resolved Hide resolved
@@ -32,6 +32,15 @@ set(MLIR_MAIN_SRC_DIR ${LLVM_MAIN_SRC_DIR}/../mlir)
set(MLIR_INCLUDE_DIR ${LLVM_MAIN_SRC_DIR}/../mlir/include)
set(MLIR_TABLEGEN_OUTPUT_DIR ${CMAKE_BINARY_DIR}/tools/mlir/include)

# Disable experimental alp by default
set(SANDBOX_ENABLE_ALP OFF)
if (SANDBOX_ENABLE_ALP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this level of integration, I agree for projects that want it it is good to be buildable and integrated from the get go.
The fact that it is optional and cannot break the CI SGTM!

experimental/alp/CMakeLists.txt Show resolved Hide resolved
experimental/alp/alp/compile_op.py Show resolved Hide resolved
experimental/alp/alp/library/blas.py Show resolved Hide resolved
experimental/alp/lib/Transforms/modulo_scheduling_pass.cpp Outdated Show resolved Hide resolved
experimental/alp/lib/Transforms/modulo_scheduling_pass.cpp Outdated Show resolved Hide resolved
using namespace mlir;

namespace{
struct ModuloSchedulingPass : public ModuloSchedulingPassBase<ModuloSchedulingPass>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not reviewing the pass itself for now, this would need a bunch of .mlir tests to qualify as "in reviewable state".

tools/mlir-proto-opt/mlir-proto-opt.cpp Show resolved Hide resolved
@nicolasvasilache
Copy link
Contributor

I still wanted to integrate with mlir-proto-opt
I still want to be able to switch the experimental/alp folder from the root CMakeLists.txt (although everything is disabled by default)

SGTM, since everything is optional I don't see an issue.
We may want to restructure the CMake a bit so that the number of places touched by a future such addition is minimized
but this is fine for now and you don't need to shoulder this burden (I expect the next project will have some refactoring requests).

All of this is super-early stage and it is very buggy and not tested. However, I wanted to show earlier than later how we would like to integrate with this, and what were the main ideas. In particular:
I integrate at the c++ level (so mostly reusing mlir-proto-opt, but not reusing the Pyhton harness you have)

Sounds good for a first commit and to get started. I still think in the future the proper efforts should be made to either integrate with the harness or to come up with something better we can all reuse; I added comments where appropriate.
Note in particular the point about just reusing ScikitLearn/pandas/pytorch rather than having to reimplement a lot of functionality in C++ to just be able to do anything: composition is our friend, both in IR and with libraries.

The major components are there (minimal python harness+autotuner+few simple passes). Once I have it cleaned up, I can add a README.md to explain how to use it.
If you have any early comment on anything, please feel free to add them or to open an issue

I gave you review comments, no need to address everything now but please open the proper issues (prefixed with [ALP]) for the stuff you punt on.

@nicolasvasilache
Copy link
Contributor

Note I create and landed the alp subdir in an effort to try and give you rights on it specifically.
However it is not a default github mode and I have to parse through this:
https://stackoverflow.com/questions/40567468/give-permissions-on-project-folder-in-github?noredirect=1&lq=1

It will take me a little time as I have a bunch of stuff in my stack.

@giuseros
Copy link
Collaborator Author

giuseros commented Dec 7, 2021

Thanks Nicolas for this first round of comments. I started to reply to some. Few overall replies:

  • Our main focus for now is performance. We need to show that, by using this framework we can compete with handwritten routines. We have proof of this for a very specific case, but now we are trying to get more general results.
  • Once we are happy with performance we will refactor everything to generalize to other operations. There are some features in the harness that I feel are missing, but we can try to add them

One question I didn't ask yet. Do you have any performance result for x86? Do they look promising?

@nicolasvasilache
Copy link
Contributor

Re. Our main focus for now is performance.
Re. Once we are happy with performance we will refactor everything
Yes understood, the objective here is to give you some broad feedback so you have a few directions for the future.
You should absolutely optimize for your velocity at this point.

Still, it looks like you have a string-stichy and inflexible bandaid that limits you in ways you may not realize (what you have is very close to what the sandbox started with, we invested into making the QoL better for these reasons). You have to decide
what is best for your current sprint, I can only share my experience 😄

Re x86, the various benchmarks run at a high fraction of peak on my AVX512 machine. I have previously dumped some perf results here:

The matmul cases has mixes of divisible and non-divisble sizes; I get similar perf with ? everywhere.
The conv / depthwise conv cases are all divisible, the first objective was to ensure we can get the main kernel to high-perf.
I have not yet tried to plug padding for non-divisible cases with those.
I also plugged and ran the 2-d cases for conv and depthwise conv to similar high perf (note that depthwise 1-D is much less arithmetic intensive and it is prob limited by instruction issue / number of instructions).

Lastly, I did some moderate amount of manual searching to get to these perf points, scaling up search (esp. for the kernel) and rooting out the type of issue you mention re outlining will be important for us too.

Hope this helps!

@nicolasvasilache
Copy link
Contributor

could you please rebase/fix CI/land?

@giuseros giuseros force-pushed the alp_experimental branch 2 times, most recently from 5c535ca to 76945be Compare December 8, 2021 16:32
@giuseros
Copy link
Collaborator Author

giuseros commented Dec 8, 2021

I tried to rebase and fix few things. Let's see if CI is happy.

@giuseros
Copy link
Collaborator Author

giuseros commented Dec 8, 2021

Alright, CI is happy, but I am not authorized to merge

@nicolasvasilache nicolasvasilache merged commit 3464454 into iree-org:main Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants