- 
                Notifications
    You must be signed in to change notification settings 
- Fork 64
RFC: test out new syntax for launch with type deduction #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Looks cleaner to me, but I would not define stuff in syclcompat namespace within cutlass | 
| I agree in general we shouldn't put things into syclcompat namespace. For this PR I put it in syclcompat, because the purpose of this PR is to protype what we want to ask syclcompat to add (intel/llvm#17832). Assuming it gets added to syclcompat and we switch our code to use it, we would require the latest version of syclcompat. That doesn't work because we want to support the latest released DPC++ version. Would it be reasonable to have in our headers the extra syclcompat code for older versions of DPC++ to avoid this dependency? | 
| namespace syclcompat { | ||
| template <class F, int Dim> | ||
| sycl::event launch(const sycl::nd_range<Dim> &range, sycl::queue q, const F& f) { | ||
| return q.parallel_for(detail::transform_nd_range<Dim>(range), [=](sycl::nd_item<Dim>) { f(); }); | ||
| } | ||
| template <class F, int Dim> | ||
| sycl::event launch(const sycl::nd_range<Dim> &range, const F& f) { | ||
| return launch(range, get_default_queue(), f); | ||
| } | ||
| // Alternative launch through dim3 objects | ||
| template <class F> | ||
| sycl::event launch(const dim3 &grid, const dim3 &threads, sycl::queue q, const F& f) { | ||
| return launch(sycl::nd_range<3>{grid * threads, threads}, q, f); | ||
| } | ||
| template <class F> | ||
| sycl::event launch(const dim3 &grid, const dim3 &threads, const F& f) { | ||
| return launch(grid, threads, get_default_queue(), f); | ||
| } | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we put those as PR on sycl::compat repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| We can add the  | 
| B, dB, sB, tB, | ||
| C, dC, sC, tC, | ||
| alpha, beta); | ||
| auto event = syclcompat::launch(dimGrid, dimBlock, [=] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you only want to record this event if you are doing fine-grained profiling?
It is possible you could see a notable performance improvement by only recording events when required: see https://github.com/intel/llvm/pull/18021/files#r2048537558
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if an analogue to cudaEventRecord in sycl would be useful to you then you could request creating an extension for e.g. oneapi_event_record that takes a sycl event.
Prototype to test out a solution for intel/llvm#17832 . Looking for review on the syntax. Is it OK to have to wrap the function in a lambda to gain the type deduction? Should the lambda be past as first or last argument?