-
Notifications
You must be signed in to change notification settings - Fork 804
Closed
Labels
bugSomething isn't workingSomething isn't workinghipIssues related to execution on HIP backend.Issues related to execution on HIP backend.
Description
Describe the bug
I am trying to use multiGPU code on AMD MI210 and I can't allocate anything on second GPU. Even N=1 sized allocation fails with HIP Out of Memory error. I tried these prebuilt
nightly builds:
llvm-nighltly-2024-10-26
llvm-nighltly-2024-11-09
llvm-nighltly-2024-11-07
The code works if I use a single GPU and only allocate memory on single GPU
To reproduce
#include <sycl/sycl.hpp>
#include <iostream>
#define NUM_GPU 2
int main() {
auto Devs = sycl::device::get_devices(sycl::info::device_type::gpu);
std::vector<sycl::queue> Queues;
// Insert not all devices only the required ones for model
std::transform(Devs.begin(), Devs.begin() + NUM_GPU, std::back_inserter(Queues),
[](const sycl::device &D) { return sycl::queue{D}; });
////////////////////////////////////////////////////////////////////////
if (Devs.size() > 1){
if (!Devs[0].ext_oneapi_can_access_peer(
Devs[1], sycl::ext::oneapi::peer_access::access_supported)) {
std::cout << "P2P access is not supported by devices, exiting."
<< std::endl;
}
}
std::cout <<"\n----------------------------------------"<< std::endl;
std::cout << "Running on devices:" << std::endl;
for(int i =0; i < Queues.size(); i++){
std::cout << i << ":\t" << Queues[i].get_device().get_info<sycl::info::device::name>()
<< std::endl;
}
std::cout <<"----------------------------------------"<< std::endl;
const int N = 1;
std::vector<uint32_t> data_host(N);
std::vector<uint32_t> data_host_2(N);
uint32_t* data_dev = malloc_device<uint32_t>(N, Queues[0]);
uint32_t* data_dev_2 = malloc_device<uint32_t>(N, Queues[1]);
Queues[0].memcpy(data_dev, data_host.data(), N * sizeof(uint32_t));
Queues[1].memcpy(data_dev_2, data_host_2.data(), N * sizeof(uint32_t));
}
This is the compile command:
export PATH=$DPCPP_HOME/llvm-2024-10-26/bin:$PATH;
export LD_LIBRARY_PATH=$DPCPP_HOME/llvm-2024-10-26/lib:$LD_LIBRARY_PATH
sycl-ls
rm -rf a.out;clang++ -O3 -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a atomic_test.cpp ; ./a.out
This is the output :
$ ./build.sh
[hip:gpu][hip:0] AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60140.9]
[hip:gpu][hip:1] AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60140.9]
[hip:gpu][hip:2] AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60140.9]
----------------------------------------
Running on devices:
0: AMD Instinct MI210
1: AMD Instinct MI210
----------------------------------------
<HIP>[ERROR]:
UR HIP ERROR:
Value: 2
Name: hipErrorOutOfMemory
Description: out of memory
Function: getNextTransferStream
Source Location: /__w/llvm/llvm/build/_deps/unified-runtime-src/source/adapters/hip/queue.cpp:106
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): Native API failed. Native API returns: 38 (UR_RESULT_ERROR_OUT_OF_HOST_MEMORY)
./build.sh: line 16: 542041 Aborted (core dumped) ./a.out
Environment
- OS: Ubuntu
- Target device and vendor: AMD MI210
- DPC++ version:clang version 20.0.0git (https://github.com/intel/llvm 91ad0ed)
- Dependencies version:
Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
Name (20 chars) (Edge) (Avg) (Mem, Compute)
==============================================================================================================
0 [0x0c34 : 0x02] 39.0°C 41.0W N/A, N/A 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
Instinct MI210
1 [0x0c34 : 0x02] 39.0°C 43.0W N/A, N/A 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
Instinct MI210
2 [0x0c34 : 0x02] 34.0°C 42.0W N/A, N/A 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
Instinct MI210
sycl-ls:
[hip:gpu][hip:0] AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60140.9]
[hip:gpu][hip:1] AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60140.9]
[hip:gpu][hip:2] AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60140.9]
Platforms: 1
Platform [#1]:
Version : HIP 60140.9
Name : AMD HIP BACKEND
Vendor : AMD Corporation
Devices : 3
Device [#0]:
Type : gpu
Version : gfx90a:sramecc+:xnack-
Name : AMD Instinct MI210
Vendor : AMD Corporation
Driver : HIP 60140.9
UUID : 4950100981019857102575252102995310254
DeviceID : 0
Num SubDevices : 0
Num SubSubDevices : 0
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_queue_profiling_tag
info::device::sub_group_sizes: 64
Architecture: amd_gpu_gfx90a
Device [#1]:
Type : gpu
Version : gfx90a:sramecc+:xnack-
Name : AMD Instinct MI210
Vendor : AMD Corporation
Driver : HIP 60140.9
UUID : 5548569810097102505099994910097100100
DeviceID : 0
Num SubDevices : 0
Num SubSubDevices : 0
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_queue_profiling_tag
info::device::sub_group_sizes: 64
Architecture: amd_gpu_gfx90a
Device [#2]:
Type : gpu
Version : gfx90a:sramecc+:xnack-
Name : AMD Instinct MI210
Vendor : AMD Corporation
Driver : HIP 60140.9
UUID : 1014852485797999898535199551015355
DeviceID : 0
Num SubDevices : 0
Num SubSubDevices : 0
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_queue_profiling_tag
info::device::sub_group_sizes: 64
Architecture: amd_gpu_gfx90a
default_selector() : gpu, AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60140.9]
accelerator_selector() : No device of requested type available.
cpu_selector() : No device of requested type available.
gpu_selector() : gpu, AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60140.9]
custom_selector(gpu) : gpu, AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60140.9]
custom_selector(cpu) : No device of requested type available.
custom_selector(acc) : No device of requested type available.
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghipIssues related to execution on HIP backend.Issues related to execution on HIP backend.