Allow for stateless addressing flags for >4GB allocations for devices to be passed through SYCL #10946

simonlui · 2023-08-23T19:49:35Z

According to https://github.com/intel/compute-runtime/blob/master/programmers-guide/ALLOCATIONS_GREATER_THAN_4GB.md, there are ways to make allocations greater than 4GB allocations on devices which follows the standard Intel stateful addressing model at this point in time. But you must be able to pass CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL or ze_relaxed_allocation_limits_exp_desc_t through OpenCL or Level Zero respectively. Unfortunately, there doesn't seem to be a way to do this through SYCL right now. This applies to anything in the SYCL backend that that would use zeMemAllocDevice, zeMemAllocShared and zeMemAllocHost for Level Zero and clCreateBuffer, clCreateBufferWithProperties, clCreateBufferWithPropertiesINTEL, clSVMAlloc, clSharedMemAllocINTEL, clDeviceMemAllocINTEL, clHostMemAllocINTEL for OpenCL.

Since the compiler here is what essentially takes in SYCL and spits out Level Zero or OpenCL code for various Intel projects, I think this is the right place to discuss this. Unfortunately, I'm not sure what it would take for this to happen. Would this become a non-standard extension to SYCL like a vendor extension or would something like this need to get standardized? The reason I am opening this is because this seems to be affecting downstream packages like oneDNN here and Intel Extension for Pytorch here where they use SYCL to make their allocations and are hitting this limitation. IPEX is choosing to limit allocations to 4GB only and disallowing >4GB allocations which I don't think is a good solution given there are valid usecases for needing to use more than 4GB even if it involves a performance penalty. I hope this can be considered and some path forward can be made. Thank you.

The text was updated successfully, but these errors were encountered:

abagusetty · 2023-08-23T22:20:29Z

By chance tried this already:
export SYCL_PROGRAM_COMPILE_OPTIONS=" -ze-opt-greater-than-4GB-buffer-required"

simonlui · 2023-08-24T03:46:31Z

I don't doubt that that would allow you to pass the required compile flags for >4GB allocations. But according to the document I linked, that doesn't solve the issue with passing the flags I mentioned which is needed for the allocation to work correctly. I also don't have an application personally that would use this, this is more or less a gap I identified given the issues I had with this limitation when using Intel's Extension for Pytorch and running into frequently this 4GB memory limit. That is why I submitted this report.

simonlui added the enhancement New feature or request label Aug 23, 2023

simonlui changed the title ~~Allow for stateless addressing flags for >4GB for devices to be passed through SYCL~~ Allow for stateless addressing flags for >4GB allocations for devices to be passed through SYCL Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for stateless addressing flags for >4GB allocations for devices to be passed through SYCL #10946

Allow for stateless addressing flags for >4GB allocations for devices to be passed through SYCL #10946

simonlui commented Aug 23, 2023 •

edited

Loading

abagusetty commented Aug 23, 2023

simonlui commented Aug 24, 2023 •

edited

Loading

Allow for stateless addressing flags for >4GB allocations for devices to be passed through SYCL #10946

Allow for stateless addressing flags for >4GB allocations for devices to be passed through SYCL #10946

Comments

simonlui commented Aug 23, 2023 • edited Loading

abagusetty commented Aug 23, 2023

simonlui commented Aug 24, 2023 • edited Loading

simonlui commented Aug 23, 2023 •

edited

Loading

simonlui commented Aug 24, 2023 •

edited

Loading