- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.4k
SYCL: Kernel function refactor #11515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8b6a0d2    to
    e32b19a      
    Compare
  
    | Unfortunately glibc 2.41 update has broken the SYCL compiler. ... I will try to fix it in the meantime. | 
b07f6f3    to
    efb5773      
    Compare
  
    | Looks like non contiguous norm tests were added in #11659 . Our current kernels currently does not support it. I need to disable it in backend_supports_op function. | 
… at the end of debug logs
| @qnixsynapse I see some code change is to move code to right place. I think they are safe and no need more test. But some code change would impact the performance, like add try-catch(). Add it to more sub functions, that would impact the performance.  - It's my guess. So much code change will make the test is hard too. I know you are active developer. 
 So, I suggest refactor this PR, thought this PR is used refactor the SYCL backend: 
 There is still no real CI of SYCL backend. It's very hard to control the quality. | 
| @NeoZhangJianyu Thank you for your comments. Most of the changes in this PR is just moving the kernels from ggml-sycl.cpp to their individual files like that of CUDA backend. Try -- catch exception handling is done one time like it was before. I did not notice any performance regression on my system. The rest of the changes are just removals of intermediate type conversations and adding back split buffer type checks. Apart from this, the GGML_SYCL_DEBUG macro was fixed and added to places so that non tech savvy people can help us debug easily. I know that there is no CI for SYCL and that's why I tested and test at every point of this change. If this change is really big, I will close it and submit smaller changes. | 
| I agree with @NeoZhangJianyu here. @qnixsynapse, you've been putting a lot of effort into the PR and it contributes a lot of good things to the backend but the scale of these changes make it a bit harder for others to provide feedback. Since you refactor functions into different files along with other changes, some smaller modifications that require verification or understanding could get lost in the PR. Breaking these down might really help everyone else to follow along. | 
| @Alcpz Hmm.. I decided to close this PR and submit smaller changes. | 
| @qnixsynapse | 
| 
 Our backend is lagging behind others in OP support, so increased collaboration from the teams would be greatly appreciated. :) | 
This PR simplifies the SYCL backend by removing the
ggml_sycl_op_flattenfunction and integrating its responsibilities directly into the kernel functions.The current implementation of
ggml_sycl_op_flattenappears to have a misleading name, as its functionality does not align with the typical meaning of "flatten" in machine learning (i.e., collapsing dimensions). It currently performs the following operations:dst->src[1]if available anddstfor non-split tensors.float32before passing them to the kernel function.Additionally, some unused
sycl_pool_allocobjects are present in the code.The unconditional casting to
float32might present a significant concern. It can lead to unnecessary and potentially unstable type conversions. For instance, anint32tensor is cast tofloat32inggml_sycl_op_flattenand then back toint32within one of the kernels which might not be numerically stable IMO.This PR tries to achieve few things:
TODOs: