Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL 2020][USM] Add USM memory management functions and usm_allocator #308

Merged
merged 1 commit into from
Sep 17, 2020

Conversation

illuhad
Copy link
Collaborator

@illuhad illuhad commented Sep 16, 2020

This adds USM memory management functions, pointer query functions, and usm_allocator. USM tests will follow later as we will need more functionality (in particular memcpy) for proper testing.

Known caveats:

  • As far as I am aware, CUDA and HIP do not allow specifying alignments for allocations. The SYCL aligned memory allocation functions therefore currently ignore the alignment argument on CUDA and HIP. On CPU it should work as expected.
  • HIP/ROCm only has partial support for shared allocations (unified memory in CUDA terminology). My understanding is that it is currently implemented as mapped host memory in ROCm, which means that performance will probably be very bad as every memory access goes across PCIe. I suspect that this is due to hardware limitations, and I expect that this will change with future AMD hardware. After all, AMD has promised a "coherent" architecture for Frontier and El Capitan. hipSYCL correctly calls hipMallocManaged() for shared allocations, so when this is improved hipSYCL will be ready.
  • Because of the previous point, I believe that there currently is no way in ROCm to distinguish between "shared" and (pinned) "host" allocations. So, I expect that pointer queries for SYCL shared allocations would identify as SYCL host allocations. We use the HIP pointer property hipMemoryTypeUnified to check for shared allocations; the HIP documentation says that it is currently not yet used but I suspect that this would be the right one when this is properly implemented in ROCm.

@aTom3333
Copy link
Contributor

As far as I am aware, CUDA and HIP do not allow specifying alignments for allocations. The SYCL aligned memory allocation functions therefore currently ignore the alignment argument on CUDA and HIP. On CPU it should work as expected.

Can't you align the pointers yourself by allocating a bit more? Something like:

void *cuda_allocator::allocate(size_t min_alignment, size_t size_bytes)
{
  void *ptr, *originalPtr;
  auto err = cudaSetDevice(_dev);
  err = cudaMalloc(&originalPtr, size_bytes + min_alignment);
  ptr = (orignalPtr & ~(min_alignment-1)) + min_alignment;
  int offset = ptr - originalPtr;
  *(static_cast<unsigned char*>(ptr) - 1) = offset;

  if (err != cudaSuccess) {
    register_error(__hipsycl_here(),
                   error_info{"cuda_allocator: cudaMalloc() failed",
                              error_code{"CUDA", err},
                              error_type::memory_allocation_error});
    return nullptr;
  }

  return ptr;
}

void cuda_allocator::free(void *mem)
{
  int offset = *(static_cast<unsigned char*>(mem) - 1);
  void *originalPtr = mem - offset;
  auto err = cudaFree(originalPtr);
  if (err != cudaSuccess) {
    register_error(__hipsycl_here(),
                   error_info{"cuda_allocator: cudaFree() failed",
                              error_code{"CUDA", err},
                              error_type::memory_allocation_error});
  }
}

@illuhad
Copy link
Collaborator Author

illuhad commented Sep 16, 2020

That might work, but it seems very much like a hack to me. To be honest I'm not sure if it's worth it. CUDA and HIP, while not allowing custom alignments, already guarantee that all allocations will be suitably aligned for any type - I assume this means either something like double4 or even alignment on page boundaries.

EDIT: Also, this approach would in general require launching a kernel or cudaMemset to properly store the offset as pure device allocations (non-shared allocations) are not accessible from the host. And in free() we would need to read from GPU memory across PCIe. Alternatively we could have some map to associate all user pointers with their offset, but that also seems like a lot of effort..

@aTom3333
Copy link
Contributor

If they guarantee enough alignment I agree it is unnecessary. Even more so with the fact that kernel would need to be launched

@illuhad illuhad force-pushed the feature/sycl-2020/add-usm-memory-mgmt-functions branch from 3228343 to 77015f1 Compare September 16, 2020 23:59
@illuhad illuhad merged commit 814b296 into develop Sep 17, 2020
@illuhad illuhad deleted the feature/sycl-2020/add-usm-memory-mgmt-functions branch September 17, 2020 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants