Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vega64 + ROCm 1.7: Illegal instruction detected: Operand has incorrect register class. #1485

Closed
chron0 opened this issue Apr 22, 2018 · 7 comments

Comments

@chron0
Copy link

chron0 commented Apr 22, 2018

xmr-stak fails during compilation of the opencl code with error: Illegal instruction detected: Operand has incorrect register class. In order to test if this is a kernel/driver issue, I've tried https://github.com/genesismining/sgminer-gm, which works. Is there any way to let xmr-stak be more verbose about the compilation step to figure out why and where it is failing there?

@gstoner, @justxi: do you have any ideas from ROCm perspective?
@justxi: Thanks for the ebuild submissions - no issues during emerge

voyager /opt/xmr-stak/build9/bin # ./xmr-stak --noCPU --noNVIDIA                                                                                                                                                                                                     
-------------------------------------------------------------------                                                                                                                                                                                                  
xmr-stak 2.4.3 26a5d
-------------------------------------------------------------------                                                                                                                                                                                                  
[2018-04-22 08:31:16] : Mining coin: monero7                                                                                                                                                                                                                         
[2018-04-22 08:31:16] : Compiling code and initializing GPUs. This will take a while...                                                                                                                                                                              
[2018-04-22 08:31:16] : Device 0 work size 8 / 32.                                                                                                                                                                                                                   
[2018-04-22 08:31:16] : OpenCL device 0 - Precompiled code /root/.openclcache/c5bddd8e20cae2624555ebaf2d7e44155ccecd7abdb00a1e22a7ea711f26e927.openclbin not found. Compiling ...                                                                                    
error: Illegal instruction detected: Operand has incorrect register class.                                                                                                            

Basic information

  • Intel(R) Xeon(R) CPU D-1587
  • Sapphire Nitro+ Radeon RX Vega 64 (8GB)
    • 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1)
  • gentoo 17.0/desktop (stable) - using @justxi 1.7.x ROCm ebuilds and ROCM-1.7 kernel
  • Version: xmr-stak/2.4.3/26a5d65/master/lin/nvidia-amd-cpu/aeon-cryptonight-monero/0

Autodeteced amd.conf values

"gpu_threads_conf" : [
  // gpu: gfx900 memory:6821
  // compute units: 64
  { "index" : 0,
    "intensity" : 1536, "worksize" : 8,
    "affine_to_cpu" : false, "strided_index" : 1, "mem_chunk" : 2,
    "comp_mode" : true
  },

],

/*
 * Platform index. This will be 0 unless you have different OpenCL platform - eg. AMD and Intel.
 */
"platform_index" : 0,

Build Trace

voyager /opt/xmr-stak/build9 # cmake ..                                                                                                                                                                                                                              
-- The C compiler identification is GNU 6.4.0                                                                                                                                                                                                                        
-- The CXX compiler identification is GNU 6.4.0                                                                                                                                                                                                                      
-- Check for working C compiler: /usr/bin/cc                                                                                                                                                                                                                         
-- Check for working C compiler: /usr/bin/cc -- works                                                                                                                                                                                                                
-- Detecting C compiler ABI info                                                                                                                                                                                                                                     
-- Detecting C compiler ABI info - done                                                                                                                                                                                                                              
-- Detecting C compile features                                                                                                                                                                                                                                      
-- Detecting C compile features - done                                                                                                                                                                                                                               
-- Check for working CXX compiler: /usr/bin/c++                                                                                                                                                                                                                      
-- Check for working CXX compiler: /usr/bin/c++ -- works                                                                                                                                                                                                             
-- Detecting CXX compiler ABI info                                                                                                                                                                                                                                   
-- Detecting CXX compiler ABI info - done                                                                                                                                                                                                                            
-- Detecting CXX compile features                                                                                                                                                                                                                                    
-- Detecting CXX compile features - done                                                                                                                                                                                                                             
-- Looking for pthread.h                                                                                                                                                                                                                                             
-- Looking for pthread.h - found                                                                                                                                                                                                                                     
-- Looking for pthread_create                                                                                                                                                                                                                                        
-- Looking for pthread_create - not found                                                                                                                                                                                                                            
-- Looking for pthread_create in pthreads                                                                                                                                                                                                                            
-- Looking for pthread_create in pthreads - not found                                                                                                                                                                                                                
-- Looking for pthread_create in pthread                                                                                                                                                                                                                             
-- Looking for pthread_create in pthread - found                                                                                                                                                                                                                     
-- Found Threads: TRUE                                                                                                                                                                                                                                               
-- Found CUDA: /opt/cuda (found suitable version "9.1", minimum required is "7.5")                                                                                                                                                                                   
-- Looking for CL_VERSION_2_0                                                                                                                                                                                                                                        
-- Looking for CL_VERSION_2_0 - found                                                                                                                                                                                                                                
-- Found OpenCL: /usr/lib/libOpenCL.so (found version "2.0")                                                                                                                                                                                                         
-- Found OpenSSL: /usr/lib64/libcrypto.so (found version "1.0.2o")                                                                                                                                                                                                   
-- Configuring done                                                                                                                                                                                                                                                  
-- Generating done                                                                                                                                                                                                                                                   
-- Build files have been written to: /opt/xmr-stak/build9                                                                                                                                                                                                            
voyager /opt/xmr-stak/build9 # make -j4                                                                                                                                                                                                                              
Scanning dependencies of target xmr-stak-c                                                                                                                                                                                                                           
[  5%] Building C object CMakeFiles/xmr-stak-c.dir/xmrstak/backend/cpu/crypto/c_blake256.c.o                                                                                                                                                                         
[  5%] Building C object CMakeFiles/xmr-stak-c.dir/xmrstak/backend/cpu/crypto/c_groestl.c.o                                                                                                                                                                          
[  8%] Building C object CMakeFiles/xmr-stak-c.dir/xmrstak/backend/cpu/crypto/c_jh.c.o                                                                                                                                                                               
[ 11%] Building C object CMakeFiles/xmr-stak-c.dir/xmrstak/backend/cpu/crypto/c_keccak.c.o                                                                                                                                                                           
[ 14%] Building C object CMakeFiles/xmr-stak-c.dir/xmrstak/backend/cpu/crypto/c_skein.c.o                                                                                                                                                                            
[ 17%] Linking C static library bin/libxmr-stak-c.a                                                                                                                                                                                                                  
[ 17%] Built target xmr-stak-c                                                                                                                                                                                                                                       
Scanning dependencies of target xmr-stak-backend                                                                                                                                                                                                                     
[ 20%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/jconf.cpp.o                                                                                                                                                                                       
[ 22%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/version.cpp.o                                                                                                                                                                                     
[ 25%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/cpu/hwlocMemory.cpp.o                                                                                                                                                                     
[ 28%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/cpu/jconf.cpp.o                                                                                                                                                                           
[ 31%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/cpu/minethd.cpp.o                                                                                                                                                                         
[ 34%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/backendConnector.cpp.o                                                                                                                                                                    
[ 37%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/globalStates.cpp.o                                                                                                                                                                        
[ 40%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/cpu/crypto/cryptonight_common.cpp.o                                                                                                                                                       
[ 42%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/http/httpd.cpp.o                                                                                                                                                                                  
[ 45%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/http/webdesign.cpp.o                                                                                                                                                                              
[ 48%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/misc/console.cpp.o                                                                                                                                                                                
[ 51%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/misc/executor.cpp.o                                                                                                                                                                               
[ 54%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/misc/telemetry.cpp.o                                                                                                                                                                              
[ 57%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/misc/uac.cpp.o                                                                                                                                                                                    
[ 60%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/misc/utility.cpp.o                                                                                                                                                                                
[ 62%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/net/jpsock.cpp.o                                                                                                                                                                                  
[ 65%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/net/socket.cpp.o                                                                                                                                                                                  
[ 68%] Linking CXX static library bin/libxmr-stak-backend.a                                                                                                                                                                                                          
[ 68%] Built target xmr-stak-backend                                                                                                                                                                                                                                 
[ 74%] Building NVCC (Device) object CMakeFiles/xmrstak_cuda_backend.dir/xmrstak/backend/nvidia/nvcc_code/xmrstak_cuda_backend_generated_cuda_extra.cu.o                                                                                                             
[ 74%] Building NVCC (Device) object CMakeFiles/xmrstak_cuda_backend.dir/xmrstak/backend/nvidia/nvcc_code/xmrstak_cuda_backend_generated_cuda_core.cu.o                                                                                                              
Scanning dependencies of target xmrstak_opencl_backend                                                                                                                                                                                                               
Scanning dependencies of target xmr-stak                                                                                                                                                                                                                             
[ 77%] Building CXX object CMakeFiles/xmr-stak.dir/xmrstak/cli/cli-miner.cpp.o                                                                                                                                                                                       
[ 80%] Building CXX object CMakeFiles/xmrstak_opencl_backend.dir/xmrstak/backend/amd/amd_gpu/gpu.cpp.o                                                                                                                                                               
[ 82%] Linking CXX executable bin/xmr-stak                                                                                                                                                                                                                           
[ 82%] Built target xmr-stak                                                                                                                                                                                                                                         
[ 85%] Building CXX object CMakeFiles/xmrstak_opencl_backend.dir/xmrstak/backend/amd/jconf.cpp.o                                                                                                                                                                     
[ 88%] Building CXX object CMakeFiles/xmrstak_opencl_backend.dir/xmrstak/backend/amd/minethd.cpp.o                                                                                                                                                                   
[ 91%] Linking CXX shared library bin/libxmrstak_opencl_backend.so                                                                                                                                                                                                   
[ 91%] Built target xmrstak_opencl_backend                                                                                                                                                                                                                           
Scanning dependencies of target xmrstak_cuda_backend                                                                                                                                                                                                                 
[ 97%] Building CXX object CMakeFiles/xmrstak_cuda_backend.dir/xmrstak/backend/nvidia/minethd.cpp.o                                                                                                                                                                  
[ 97%] Building CXX object CMakeFiles/xmrstak_cuda_backend.dir/xmrstak/backend/nvidia/jconf.cpp.o                                                                                                                                                                    
[100%] Linking CXX shared library bin/libxmrstak_cuda_backend.so                                                                                                                                                                                                     
[100%] Built target xmrstak_cuda_backend                                                                                    
CMAKE_AR:FILEPATH=/usr/bin/ar                                                                                                                                                                                                                                        
CMAKE_BUILD_TYPE:STRING=Release                                                                                                                                                                                                                                      
CMAKE_COLOR_MAKEFILE:BOOL=ON                                                                                                                                                                                                                                         
CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/c++                                                                                                                                                                                                                             
CMAKE_CXX_COMPILER_AR:FILEPATH=/usr/bin/gcc-ar                                                                                                                                                                                                                       
CMAKE_CXX_COMPILER_RANLIB:FILEPATH=/usr/bin/gcc-ranlib                                                                                                                                                                                                               
CMAKE_CXX_FLAGS:STRING=                                                                                                                                                                                                                                              
CMAKE_CXX_FLAGS_DEBUG:STRING=-g                                                                                                                                                                                                                                      
CMAKE_CXX_FLAGS_MINSIZEREL:STRING=-Os -DNDEBUG                                                                                                                                                                                                                       
CMAKE_CXX_FLAGS_RELEASE:STRING=-O3 -DNDEBUG                                                                                                                                                                                                                          
CMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG                                                                                                                                                                                                                
CMAKE_C_COMPILER:FILEPATH=/usr/bin/cc                                                                                                                                                                                                                                
CMAKE_C_COMPILER_AR:FILEPATH=/usr/bin/gcc-ar                                                                                                                                                                                                                         
CMAKE_C_COMPILER_RANLIB:FILEPATH=/usr/bin/gcc-ranlib                                                                                                                                                                                                                 
CMAKE_C_FLAGS:STRING=                                                                                                                                                                                                                                                
CMAKE_C_FLAGS_DEBUG:STRING=-g                                                                                                                                                                                                                                        
CMAKE_C_FLAGS_MINSIZEREL:STRING=-Os -DNDEBUG                                                                                                                                                                                                                         
CMAKE_C_FLAGS_RELEASE:STRING=-O3 -DNDEBUG                                                                                                                                                                                                                            
CMAKE_C_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG                                                                                                                                                                                                                  
CMAKE_EXE_LINKER_FLAGS:STRING=                                                                                                                                                                                                                                       
CMAKE_EXE_LINKER_FLAGS_DEBUG:STRING=                                                                                                                                                                                                                                 
CMAKE_EXE_LINKER_FLAGS_MINSIZEREL:STRING=                                                                                                                                                                                                                            
CMAKE_EXE_LINKER_FLAGS_RELEASE:STRING=                                                                                                                                                                                                                               
CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO:STRING=                                                                                                                                                                                                                        
CMAKE_EXPORT_COMPILE_COMMANDS:BOOL=OFF                                                                                                                                                                                                                               
CMAKE_INSTALL_PREFIX:PATH=/opt/xmr-stak                                                                                                                                                                                                                              
CMAKE_LINKER:FILEPATH=/usr/bin/ld                                                                                                                                                                                                                                    
CMAKE_LINK_STATIC:BOOL=OFF                                                                                                                                                                                                                                           
CMAKE_MAKE_PROGRAM:FILEPATH=/usr/bin/gmake                                                                                                                                                                                                                           
CMAKE_MODULE_LINKER_FLAGS:STRING=                                                                                                                                                                                                                                    
CMAKE_MODULE_LINKER_FLAGS_DEBUG:STRING=                                                                                                                                                                                                                              
CMAKE_MODULE_LINKER_FLAGS_MINSIZEREL:STRING=                                                                                                                                                                                                                         
CMAKE_MODULE_LINKER_FLAGS_RELEASE:STRING=                                                                                                                                                                                                                            
CMAKE_MODULE_LINKER_FLAGS_RELWITHDEBINFO:STRING=                                                                                                                                                                                                                     
CMAKE_NM:FILEPATH=/usr/bin/nm                                                                                                                                                                                                                                        
CMAKE_OBJCOPY:FILEPATH=/usr/bin/objcopy                                                                                                                                                                                                                              
CMAKE_OBJDUMP:FILEPATH=/usr/bin/objdump                                                                                                                                                                                                                              
CMAKE_RANLIB:FILEPATH=/usr/bin/ranlib                                                                                                                                                                                                                                
CMAKE_SHARED_LINKER_FLAGS:STRING=                                                                                                                                                                                                                                    
CMAKE_SHARED_LINKER_FLAGS_DEBUG:STRING=                                                                                                                                                                                                                              
CMAKE_SHARED_LINKER_FLAGS_MINSIZEREL:STRING=                                                                                                                                                                                                                         
CMAKE_SHARED_LINKER_FLAGS_RELEASE:STRING=                                                                                                                                                                                                                            
CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO:STRING=                                                                                                                                                                                                                     
CMAKE_SKIP_INSTALL_RPATH:BOOL=NO                                                                                                                                                                                                                                     
CMAKE_SKIP_RPATH:BOOL=NO                                                                                                                                                                                                                                             
CMAKE_STATIC_LINKER_FLAGS:STRING=                                                                                                                                                                                                                                    
CMAKE_STATIC_LINKER_FLAGS_DEBUG:STRING=                                                                                                                                                                                                                              
CMAKE_STATIC_LINKER_FLAGS_MINSIZEREL:STRING=                                                                                                                                                                                                                         
CMAKE_STATIC_LINKER_FLAGS_RELEASE:STRING=                                                                                                                                                                                                                            
CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO:STRING=                                                                                                                                                                                                                     
CMAKE_STRIP:FILEPATH=/usr/bin/strip                                                                                                                                                                                                                                  
CMAKE_VERBOSE_MAKEFILE:BOOL=FALSE                                                                                                                                                                                                                                    
CPU_ENABLE:BOOL=ON                                                                                                                                                                                                                                                   
CUDA_64_BIT_DEVICE_CODE:BOOL=ON                                                                                                                                                                                                                                      
CUDA_ARCH:STRING=30;35;37;50;52;60;61;62;70                                                                                                                                                                                                                          
CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE:BOOL=ON                                                                                                                                                                                                                       
CUDA_BUILD_CUBIN:BOOL=OFF                                                                                                                                                                                                                                            
CUDA_BUILD_EMULATION:BOOL=OFF                                                                                                                                                                                                                                        
CUDA_COMPILER:STRING=nvcc                                                                                                                                                                                                                                            
CUDA_CUDART_LIBRARY:FILEPATH=/opt/cuda/lib64/libcudart.so                                                                                                                                                                                                            
CUDA_CUDA_LIBRARY:FILEPATH=/usr/lib/libcuda.so                                                                                                                                                                                                                       
CUDA_ENABLE:BOOL=ON                                                                                                                                                                                                                                                  
CUDA_GENERATED_OUTPUT_DIR:PATH=                                                                                                                                                                                                                                      
CUDA_HOST_COMPILATION_CPP:BOOL=ON                                                                                                                                                                                                                                    
CUDA_HOST_COMPILER:FILEPATH=/usr/bin/cc                                                                                                                                                                                                                              
CUDA_KEEP_FILES:BOOL=OFF                                                                                                                                                                                                                                             
CUDA_NVCC_EXECUTABLE:FILEPATH=/opt/cuda/bin/nvcc                                                                                                                                                                                                                     
CUDA_NVCC_FLAGS:STRING=                                                                                                                                                                                                                                              
CUDA_NVCC_FLAGS_DEBUG:STRING=                                                                                                                                                                                                                                        
CUDA_NVCC_FLAGS_MINSIZEREL:STRING=                                                                                                                                                                                                                                   
CUDA_NVCC_FLAGS_RELEASE:STRING=                                                                                                                                                                                                                                      
CUDA_NVCC_FLAGS_RELWITHDEBINFO:STRING=                                                                                                                                                                                                                               
CUDA_PROPAGATE_HOST_FLAGS:BOOL=ON                                                                                                                                                                                                                                    
CUDA_SDK_ROOT_DIR:PATH=CUDA_SDK_ROOT_DIR-NOTFOUND                                                                                                                                                                                                                    
CUDA_SEPARABLE_COMPILATION:BOOL=OFF                                                                                                                                                                                                                                  
CUDA_SHOW_CODELINES:BOOL=OFF                                                                                                                                                                                                                                         
CUDA_SHOW_REGISTER:BOOL=OFF                                                                                                                                                                                                                                          
CUDA_TOOLKIT_INCLUDE:PATH=/opt/cuda/include                                                                                                                                                                                                                          
CUDA_TOOLKIT_ROOT_DIR:PATH=/opt/cuda                                                                                                                                                                                                                                 
CUDA_USE_STATIC_CUDA_RUNTIME:BOOL=ON                                                                                                                                                                                                                                 
CUDA_VERBOSE_BUILD:BOOL=OFF                                                                                                                                                                                                                                          
CUDA_VERSION:STRING=9.1                                                                                                                                                                                                                                              
CUDA_cublas_LIBRARY:FILEPATH=/opt/cuda/lib64/libcublas.so                                                                                                                                                                                                            
CUDA_cublas_device_LIBRARY:FILEPATH=/opt/cuda/lib64/libcublas_device.a                                                                                                                                                                                               
CUDA_cudadevrt_LIBRARY:FILEPATH=/opt/cuda/lib64/libcudadevrt.a                                                                                                                                                                                                       
CUDA_cudart_static_LIBRARY:FILEPATH=/opt/cuda/lib64/libcudart_static.a                                                                                                                                                                                               
CUDA_cufft_LIBRARY:FILEPATH=/opt/cuda/lib64/libcufft.so                                                                                                                                                                                                              
CUDA_cupti_LIBRARY:FILEPATH=CUDA_cupti_LIBRARY-NOTFOUND                                                                                                                                                                                                              
CUDA_curand_LIBRARY:FILEPATH=/opt/cuda/lib64/libcurand.so                                                                                                                                                                                                            
CUDA_cusolver_LIBRARY:FILEPATH=/opt/cuda/lib64/libcusolver.so                                                                                                                                                                                                        
CUDA_cusparse_LIBRARY:FILEPATH=/opt/cuda/lib64/libcusparse.so                                                                                                                                                                                                        
CUDA_nppc_LIBRARY:FILEPATH=/opt/cuda/lib64/libnppc.so                                                                                                                                                                                                                
CUDA_nppi_LIBRARY:FILEPATH=CUDA_nppi_LIBRARY-NOTFOUND                                                                                                                                                                                                                
CUDA_npps_LIBRARY:FILEPATH=/opt/cuda/lib64/libnpps.so                                                                                                                                                                                                                
CUDA_rt_LIBRARY:FILEPATH=/usr/lib/librt.so                                                                                                                                                                                                                           
EXECUTABLE_OUTPUT_PATH:STRING=bin                                                                                                                                                                                                                                    
HWLOC:FILEPATH=/usr/lib/libhwloc.so                                                                                                                                                                                                                                  
HWLOC_ENABLE:BOOL=ON                                                                                                                                                                                                                                                 
HWLOC_INCLUDE_DIR:PATH=/usr/include                                                                                                                                                                                                                                  
LIBRARY_OUTPUT_PATH:STRING=bin                                                                                                                                                                                                                                       
MHTD:FILEPATH=/usr/lib/libmicrohttpd.so                                                                                                                                                                                                                              
MICROHTTPD_ENABLE:BOOL=ON                                                                                                                                                                                                                                            
MTHD_INCLUDE_DIR:PATH=/usr/include                                                                                                                                                                                                                                   
OPENSSL_CRYPTO_LIBRARY:FILEPATH=/usr/lib64/libcrypto.so                                                                                                                                                                                                              
OPENSSL_INCLUDE_DIR:PATH=/usr/include                                                                                                                                                                                                                                
OPENSSL_SSL_LIBRARY:FILEPATH=/usr/lib64/libssl.so                                                                                                                                                                                                                    
OpenCL_ENABLE:BOOL=ON                                                                                                                                                                                                                                                
OpenCL_INCLUDE_DIR:PATH=/usr/include                                                                                                                                                                                                                                 
OpenCL_LIBRARY:FILEPATH=/usr/lib/libOpenCL.so                                                                                                                                                                                                                        
OpenSSL_ENABLE:BOOL=ON                                                                                                                                                                                                                                               
PKG_CONFIG_EXECUTABLE:FILEPATH=/usr/bin/pkg-config                                                                                                                                                                                                                   
XMR-STAK_COMPILE:STRING=native                                                                                                                                                                                                                                       
XMR-STAK_LARGEGRID:BOOL=ON                                                                                                                                                                                                                                           
XMR-STAK_THREADS:STRING=0                                             

clinfo

Number of platforms                               2                                                                                                                                                                                                                  
Platform Name                                   AMD Accelerated Parallel Processing                                                                                                                                                                                
Platform Vendor                                 Advanced Micro Devices, Inc.                                                                                                                                                                                       
Platform Version                                OpenCL 2.0 AMD-APP.internal.dbg (2528.0)                                                                                                                                                                           
Platform Profile                                FULL_PROFILE                                                                                                                                                                                                       
Platform Extensions                             cl_khr_icd cl_amd_object_metadata cl_amd_event_callback                                                                                                                                                            
Platform Max metadata object keys (AMD)         8                                                                                                                                                                                                                  
Platform Extensions function suffix             AMD                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                   
Platform Name                                   NVIDIA CUDA                                                                                                                                                                                                        
Platform Vendor                                 NVIDIA Corporation                                                                                                                                                                                                 
Platform Version                                OpenCL 1.2 CUDA 9.1.84                                                                                                                                                                                             
Platform Profile                                FULL_PROFILE                                                                                                                                                                                                       
Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer                                                                                                                                                             
Platform Extensions function suffix             NV                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                   
Platform Name                                   AMD Accelerated Parallel Processing                                                                                                                                                                                
Number of devices                                 1                                                                                                                                                                                                                  
Device Name                                     gfx900                                                                                                                                                                                                             
Device Vendor                                   Advanced Micro Devices, Inc.                                                                                                                                                                                       
Device Vendor ID                                0x1002                                                                                                                                                                                                             
Device Version                                  OpenCL 1.2                                                                                                                                                                                                         
Driver Version                                  2528.0 (HSA1.1,LC)                                                                                                                                                                                                 
Device OpenCL C Version                         OpenCL C 2.0                                                                                                                                                                                                       
Device Type                                     GPU                                                                                                                                                                                                                
Device Board Name (AMD)                         Vega 10 XT [Radeon RX Vega 64]                                                                                                                                                                                     
Device Topology (AMD)                           PCI-E, 06:00.0                                                                                                                                                                                                     
Device Profile                                  FULL_PROFILE                                                                                                                                                                                                       
Device Available                                Yes                                                                                                                                                                                                                
Compiler Available                              Yes                                                                                                                                                                                                                
Linker Available                                Yes                                                                                                                                                                                                                
Max compute units                               64                                                                                                                                                                                                                 
SIMD per compute unit (AMD)                     4                                                                                                                                                                                                                  
SIMD width (AMD)                                16                                                                                                                                                                                                                 
SIMD instruction width (AMD)                    1                                                                                                                                                                                                                  
Max clock frequency                             1630MHz                                                                                                                                                                                                            
Graphics IP (AMD)                               9.0                                                                                                                                                                                                                
Device Partition                                (core)                                                                                                                                                                                                             
  Max number of sub-devices                     64                                                                                                                                                                                                                 
  Supported partition types                     (n/a)                                                                                                                                                                                                              
  Supported affinity domains                    (n/a)                                                                                                                                                                                                              
Max work item dimensions                        3                                                                                                                                                                                                                  
Max work item sizes                             1024x1024x1024                                                                                                                                                                                                     
Max work group size                             256                                                                                                                                                                                                                
Preferred work group size (AMD)                 256                                                                                                                                                                                                                
Max work group size (AMD)                       1024                                                                                                                                                                                                               
Preferred work group size multiple              64                                                                                                                                                                                                                 
Wavefront width (AMD)                           64                                                                                                                                                                                                                 
Preferred / native vector sizes                                                                                                                                                                                                                                    
  char                                                 4 / 4                                                                                                                                                                                                       
  short                                                2 / 2                                                                                                                                                                                                       
  int                                                  1 / 1                                                                                                                                                                                                       
  long                                                 1 / 1                                                                                                                                                                                                       
  half                                                 1 / 1        (cl_khr_fp16)                                                                                                                                                                                  
  float                                                1 / 1                                                                                                                                                                                                       
  double                                               1 / 1        (cl_khr_fp64)                                                                                                                                                                                  
Half-precision Floating-point support           (cl_khr_fp16)                                                                                                                                                                                                      
  Denormals                                     No                                                                                                                                                                                                                 
  Infinity and NANs                             No                                                                                                                                                                                                                 
  Round to nearest                              No                                                                                                                                                                                                                 
  Round to zero                                 No                                                                                                                                                                                                                 
  Round to infinity                             No                                                                                                                                                                                                                 
  IEEE754-2008 fused multiply-add               No                                                                                                                                                                                                                 
  Support is emulated in software               No                                                                                                                                                                                                                 
Single-precision Floating-point support         (core)                                                                                                                                                                                                             
  Denormals                                     Yes                                                                                                                                                                                                                
  Infinity and NANs                             Yes                                                                                                                                                                                                                
  Round to nearest                              Yes                                                                                                                                                                                                                
  Round to zero                                 Yes                                                                                                                                                                                                                
  Round to infinity                             Yes                                                                                                                                                                                                                
  IEEE754-2008 fused multiply-add               Yes                                                                                                                                                                                                                
  Support is emulated in software               No                                                                                                                                                                                                                 
  Correctly-rounded divide and sqrt operations  Yes                                                                                                                                                                                                                
Double-precision Floating-point support         (cl_khr_fp64)                                                                                                                                                                                                      
  Denormals                                     Yes                                                                                                                                                                                                                
  Infinity and NANs                             Yes                                                                                                                                                                                                                
  Round to nearest                              Yes                                                                                                                                                                                                                
  Round to zero                                 Yes                                                                                                                                                                                                                
  Round to infinity                             Yes                                                                                                                                                                                                                
  IEEE754-2008 fused multiply-add               Yes                                                                                                                                                                                                                
  Support is emulated in software               No                                                                                                                                                                                                                 
Address bits                                    64, Little-Endian                                                                                                                                                                                                  
Global memory size                              8573157376 (7.984GiB)                                                                                                                                                                                              
Global free memory (AMD)                        8370176 (7.982GiB)                                                                                                                                                                                                 
Global memory channels (AMD)                    64                                                                                                                                                                                                                 
Global memory banks per channel (AMD)           4                                                                                                                                                                                                                  
Global memory bank width (AMD)                  256 bytes                                                                                                                                                                                                          
Error Correction support                        No                                                                                                                                                                                                                 
Max memory allocation                           7287183769 (6.787GiB)                                                                                                                                                                                              
Unified memory for Host and Device              No                                                                                                                                                                                                                 
Minimum alignment for any data type             128 bytes                                                                                                                                                                                                          
Alignment of base address                       1024 bits (128 bytes)                                                                                                                                                                                              
Global Memory cache type                        Read/Write                                                                                                                                                                                                         
Global Memory cache size                        16384 (16KiB)                                                                                                                                                                                                      
Global Memory cache line size                   64 bytes                                                                                                                                                                                                           
Image support                                   No                                                                                                                                                                                                                 
Local memory type                               Local                                                                                                                                                                                                              
Local memory size                               65536 (64KiB)                                                                                                                                                                                                      
Local memory syze per CU (AMD)                  65536 (64KiB)                                                                                                                                                                                                      
Local memory banks (AMD)                        32                                                                                                                                                                                                                 
Max number of constant args                     8                                                                                                                                                                                                                  
Max constant buffer size                        7287183769 (6.787GiB)                                                                                                                                                                                              
Preferred constant buffer size (AMD)            16384 (16KiB)                                                                                                                                                                                                      
Max size of kernel argument                     1024                                                                                                                                                                                                               
Queue properties                                                                                                                                                                                                                                                   
  Out-of-order execution                        No                                                                                                                                                                                                                 
  Profiling                                     Yes                                                                                                                                                                                                                
Prefer user sync for interop                    Yes                                                                                                                                                                                                                
Number of P2P devices (AMD)                     0                                                                                                                                                                                                                  
P2P devices (AMD)                               (n/a)                                                                                                                                                                                                              
Profiling timer resolution                      1ns                                                                                                                                                                                                                
Profiling timer offset since Epoch (AMD)        0ns (Thu Jan  1 00:00:00 1970)                                                                                                                                                                                     
Execution capabilities                                                                                                                                                                                                                                             
  Run OpenCL kernels                            Yes                                                                                                                                                                                                                
  Run native kernels                            No                                                                                                                                                                                                                 
  Thread trace supported (AMD)                  No                                                                                                                                                                                                                 
  Number of async queues (AMD)                  8                                                                                                                                                                                                                  
  Max real-time compute queues (AMD)            8                                                                                                                                                                                                                  
  Max real-time compute units (AMD)             64                                                                                                                                                                                                                 
printf() buffer size                            4194304 (4MiB)                                                                                                                                                                                                     
Built-in kernels                                (n/a)                                                                                                                                                                                                              
Device Extensions                               cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_subgroups cl_khr_depth_images cl_amd_liquid_flash cl_amd_copy_buffer_p2p                                        
                                                                                                                                                                                                                                                                   
Platform Name                                   NVIDIA CUDA                                                                                                                                                                                                        
Number of devices                                 1                                                                                                                                                                                                                  
Device Name                                     GeForce GTX 1070 Ti                                                                                                                                                                                                
Device Vendor                                   NVIDIA Corporation                                                                                                                                                                                                 
Device Vendor ID                                0x10de                                                                                                                                                                                                             
Device Version                                  OpenCL 1.2 CUDA                                                                                                                                                                                                    
Driver Version                                  390.42                                                                                                                                                                                                             
Device OpenCL C Version                         OpenCL C 1.2                                                                                                                                                                                                       
Device Type                                     GPU                                                                                                                                                                                                                
Device Topology (NV)                            PCI-E, 01:00.0                                                                                                                                                                                                     
Device Profile                                  FULL_PROFILE                                                                                                                                                                                                       
Device Available                                Yes                                                                                                                                                                                                                
Compiler Available                              Yes                                                                                                                                                                                                                
Linker Available                                Yes                                                                                                                                                                                                                
Max compute units                               19                                                                                                                                                                                                                 
Max clock frequency                             1683MHz                                                                                                                                                                                                            
Compute Capability (NV)                         6.1                                                                                                                                                                                                                
Device Partition                                (core)                                                                                                                                                                                                             
  Max number of sub-devices                     1                                                                                                                                                                                                                  
  Supported partition types                     None                                                                                                                                                                                                               
  Supported affinity domains                    (n/a)                                                                                                                                                                                                              
Max work item dimensions                        3                                                                                                                                                                                                                  
Max work item sizes                             1024x1024x64                                                                                                                                                                                                       
Max work group size                             1024                                                                                                                                                                                                               
Preferred work group size multiple              32                                                                                                                                                                                                                 
Warp size (NV)                                  32                                                                                                                                                                                                                 
Preferred / native vector sizes                                                                                                                                                                                                                                    
  char                                                 1 / 1                                                                                                                                                                                                       
  short                                                1 / 1                                                                                                                                                                                                       
  int                                                  1 / 1                                                                                                                                                                                                       
  long                                                 1 / 1                                                                                                                                                                                                       
  half                                                 0 / 0        (n/a)                                                                                                                                                                                          
  float                                                1 / 1                                                                                                                                                                                                       
  double                                               1 / 1        (cl_khr_fp64)                                                                                                                                                                                  
Half-precision Floating-point support           (n/a)                                                                                                                                                                                                              
Single-precision Floating-point support         (core)                                                                                                                                                                                                             
  Denormals                                     Yes                                                                                                                                                                                                                
  Infinity and NANs                             Yes                                                                                                                                                                                                                
  Round to nearest                              Yes                                                                                                                                                                                                                
  Round to zero                                 Yes                                                                                                                                                                                                                
  Round to infinity                             Yes                                                                                                                                                                                                                
  IEEE754-2008 fused multiply-add               Yes                                                                                                                                                                                                                
  Support is emulated in software               No                                                                                                                                                                                                                 
  Correctly-rounded divide and sqrt operations  Yes                                                                                                                                                                                                                
Double-precision Floating-point support         (cl_khr_fp64)                                                                                                                                                                                                      
  Denormals                                     Yes                                                                                                                                                                                                                
  Infinity and NANs                             Yes                                                                                                                                                                                                                
  Round to nearest                              Yes                                                                                                                                                                                                                
  Round to zero                                 Yes                                                                                                                                                                                                                
  Round to infinity                             Yes                                                                                                                                                                                                                
  IEEE754-2008 fused multiply-add               Yes                                                                                                                                                                                                                
  Support is emulated in software               No                                                                                                                                                                                                                 
Address bits                                    64, Little-Endian                                                                                                                                                                                                  
Global memory size                              8513978368 (7.929GiB)                                                                                                                                                                                              
Error Correction support                        No                                                                                                                                                                                                                 
Max memory allocation                           2128494592 (1.982GiB)                                                                                                                                                                                              
Unified memory for Host and Device              No                                                                                                                                                                                                                 
Integrated memory (NV)                          No                                                                                                                                                                                                                 
Minimum alignment for any data type             128 bytes                                                                                                                                                                                                          
Alignment of base address                       4096 bits (512 bytes)                                                                                                                                                                                              
Global Memory cache type                        Read/Write                                                                                                                                                                                                         
Global Memory cache size                        311296 (304KiB)                                                                                                                                                                                                    
Global Memory cache line size                   128 bytes                                                                                                                                                                                                          
Image support                                   Yes                                                                                                                                                                                                                
  Max number of samplers per kernel             32                                                                                                                                                                                                                 
  Max size for 1D images from buffer            134217728 pixels                                                                                                                                                                                                   
  Max 1D or 2D image array size                 2048 images                                                                                                                                                                                                        
  Max 2D image size                             16384x32768 pixels                                                                                                                                                                                                 
  Max 3D image size                             16384x16384x16384 pixels                                                                                                                                                                                           
  Max number of read image args                 256                                                                                                                                                                                                                
  Max number of write image args                16                                                                                                                                                                                                                 
Local memory type                               Local                                                                                                                                                                                                              
Local memory size                               49152 (48KiB)                                                                                                                                                                                                      
Registers per block (NV)                        65536                                                                                                                                                                                                              
Max number of constant args                     9                                                                                                                                                                                                                  
Max constant buffer size                        65536 (64KiB)                                                                                                                                                                                                      
Max size of kernel argument                     4352 (4.25KiB)                                                                                                                                                                                                     
Queue properties                                                                                                                                                                                                                                                   
  Out-of-order execution                        Yes                                                                                                                                                                                                                
  Profiling                                     Yes                                                                                                                                                                                                                
Prefer user sync for interop                    No                                                                                                                                                                                                                 
Profiling timer resolution                      1000ns                                                                                                                                                                                                             
Execution capabilities                                                                                                                                                                                                                                             
  Run OpenCL kernels                            Yes                                                                                                                                                                                                                
  Run native kernels                            No                                                                                                                                                                                                                 
  Kernel execution timeout (NV)                 No                                                                                                                                                                                                                 
Concurrent copy and kernel execution (NV)       Yes                                                                                                                                                                                                                
  Number of async copy engines                  2                                                                                                                                                                                                                  
printf() buffer size                            1048576 (1024KiB)                                                                                                                                                                                                  
Built-in kernels                                (n/a)                                                                                                                                                                                                              
Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer                                                                                                                                                             
                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                   
NULL platform behavior                                                                                                                                                                                                                                               
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform                                                                                                                                                                                                        
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform                                                                                                                                                                                                        
clCreateContext(NULL, ...) [default]            No platform                                                                                                                                                                                                        
clCreateContext(NULL, ...) [other]              Success [AMD]                                                                                                                                                                                                      
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)                                                                                                                                                                                                 
  Platform Name                                 AMD Accelerated Parallel Processing                                                                                                                                                                                
  Device Name                                   gfx900                                                                                                                                                                                                             
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform                                                                                                                                                                                    
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)                                                                                                                                                                                                     
  Platform Name                                 AMD Accelerated Parallel Processing                                                                                                                                                                                
  Device Name                                   gfx900                                                                                                                                                                                                             
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform                                                                                                                                                                            
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform                                                                                                                                                                                 
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)                                                                                                                                                                                                     
  Platform Name                                 AMD Accelerated Parallel Processing                                                                                                                                                                                
  Device Name                                   gfx900                               

Modules

Module                  Size  Used by
nvidia_uvm            675840  4
ext2                   57344  1
dm_mod                 98304  0
dax                    20480  1 dm_mod
ipmi_ssif              24576  0
x86_pkg_temp_thermal    16384  0
kvm_intel             188416  0
kvm                   491520  1 kvm_intel
irqbypass              16384  1 kvm
aesni_intel           184320  0
cp210x                 20480  0
usbserial              28672  1 cp210x
amdkfd                126976  3
nvidia_drm             32768  1
aes_x86_64             20480  1 aesni_intel
nvidia_modeset       1069056  3 nvidia_drm
amdgpu               2154496  2
nvidia              13799424  348 nvidia_modeset,nvidia_uvm
ttm                    81920  1 amdgpu
backlight              16384  1 amdgpu
coretemp               16384  0
crypto_simd            16384  1 aesni_intel
igb                   155648  0
cryptd                 20480  2 crypto_simd,aesni_intel
glue_helper            16384  1 aesni_intel
ipmi_si                53248  0
ipmi_devintf           20480  0
ipmi_msghandler        36864  4 nvidia,ipmi_ssif,ipmi_devintf,ipmi_si
@gstoner
Copy link

gstoner commented Apr 22, 2018

We will take a look

greg

@ghost
Copy link

ghost commented Apr 29, 2018

@chron0 not sure if this will help but this is what I did to get it compiling with the ROCm drivers.

  1. Don't install the AMD SDK.
  2. Install the ROCm drivers following the steps on the GitHub Repo.
  3. Follow xmr-stak compile steps found in Linux docs but pass the include and library paths to cmake to the ROCm install location. See below:
# Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-39-generic x86_64)
sudo apt install libmicrohttpd-dev libssl-dev cmake build-essential libhwloc-dev
git clone https://github.com/fireice-uk/xmr-stak.git
mkdir xmr-stak/build
cd xmr-stak/build
cmake .. -DCUDA_ENABLE=OFF -DOpenCL_INCLUDE_DIR=/opt/rocm/opencl/include/ -DOpenCL_LIBRARY=/opt/rocm/opencl/lib/x86_64/libOpenCL.so
make install

I used a clean install of Ubuntu 16.04 with following hardware:

  • ASUS Z97-K Motherboard
  • Intel 4790K
  • 2 x Vega 56

xmr-stak runs ok (about 1200 H/s) Only problem is I can't get more than one GPU to be detected by xmr-stak or /opt/rocm/opencl/bin/x86_64/clinfo.

@gstoner
Copy link

gstoner commented Apr 29, 2018

Please do not install the Historical OpenCL SDK with ROCm it does not need this to build OpenCL applications. We removed this restriction in ROCm when also now when you install the base driver rocm-opencl-dev is installed as well so you no longer need to do this step like you did in the past.

@Spudz76
Copy link
Contributor

Spudz76 commented Apr 30, 2018

Apparently there is this new 17.50 series which claims to have fixed some OpenCL + Vega issue(s)...

https://support.amd.com/en-us/download/workstation?os=Linux%20x86_64#pro-driver

@chron0
Copy link
Author

chron0 commented Apr 30, 2018

That may do the trick, I do have AMDSDK 3.0 installed. I'll uninstall and try a new build tomorrow.

@chron0
Copy link
Author

chron0 commented Apr 30, 2018

@Spudz76: I've tried 17.50 before as well, still no working opencl interface but it's a pain to work with the amd "pro" stuff on gentoo.

@chron0
Copy link
Author

chron0 commented May 1, 2018

Full strace: http://termbin.com/t4g3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants