Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for stateless addressing flags for >4GB allocations for devices to be passed through SYCL #10946

Open
simonlui opened this issue Aug 23, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@simonlui
Copy link

simonlui commented Aug 23, 2023

According to https://github.com/intel/compute-runtime/blob/master/programmers-guide/ALLOCATIONS_GREATER_THAN_4GB.md, there are ways to make allocations greater than 4GB allocations on devices which follows the standard Intel stateful addressing model at this point in time. But you must be able to pass CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL or ze_relaxed_allocation_limits_exp_desc_t through OpenCL or Level Zero respectively. Unfortunately, there doesn't seem to be a way to do this through SYCL right now. This applies to anything in the SYCL backend that that would use zeMemAllocDevice, zeMemAllocShared and zeMemAllocHost for Level Zero and clCreateBuffer, clCreateBufferWithProperties, clCreateBufferWithPropertiesINTEL, clSVMAlloc, clSharedMemAllocINTEL, clDeviceMemAllocINTEL, clHostMemAllocINTEL for OpenCL.

Since the compiler here is what essentially takes in SYCL and spits out Level Zero or OpenCL code for various Intel projects, I think this is the right place to discuss this. Unfortunately, I'm not sure what it would take for this to happen. Would this become a non-standard extension to SYCL like a vendor extension or would something like this need to get standardized? The reason I am opening this is because this seems to be affecting downstream packages like oneDNN here and Intel Extension for Pytorch here where they use SYCL to make their allocations and are hitting this limitation. IPEX is choosing to limit allocations to 4GB only and disallowing >4GB allocations which I don't think is a good solution given there are valid usecases for needing to use more than 4GB even if it involves a performance penalty. I hope this can be considered and some path forward can be made. Thank you.

@simonlui simonlui added the enhancement New feature or request label Aug 23, 2023
@simonlui simonlui changed the title Allow for stateless addressing flags for >4GB for devices to be passed through SYCL Allow for stateless addressing flags for >4GB allocations for devices to be passed through SYCL Aug 23, 2023
@abagusetty
Copy link
Contributor

By chance tried this already:
export SYCL_PROGRAM_COMPILE_OPTIONS=" -ze-opt-greater-than-4GB-buffer-required"

@simonlui
Copy link
Author

simonlui commented Aug 24, 2023

I don't doubt that that would allow you to pass the required compile flags for >4GB allocations. But according to the document I linked, that doesn't solve the issue with passing the flags I mentioned which is needed for the allocation to work correctly. I also don't have an application personally that would use this, this is more or less a gap I identified given the issues I had with this limitation when using Intel's Extension for Pytorch and running into frequently this 4GB memory limit. That is why I submitted this report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants