Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Dense PCA fails with CUDA12 #5555

Open
Intron7 opened this issue Aug 15, 2023 · 10 comments
Open

[BUG] Dense PCA fails with CUDA12 #5555

Intron7 opened this issue Aug 15, 2023 · 10 comments
Assignees
Labels
1 - On Deck To be worked on next ? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@Intron7
Copy link
Contributor

Intron7 commented Aug 15, 2023

Describe the bug
Dense PCA fails for larger Datasets with the following error.
RuntimeError: cuSOLVER error encountered at: file=/home/sdicks/micromamba/envs/rapids-23.08_12/include/raft/linalg/detail/eig.cuh line=118:
With CUDA 11.8 and Rapids-23.08 it works.
Steps/Code to reproduce bug

X = cp.random.rand(90000,5000,dtype= cp.float32)
pca_func = PCA(
    n_components=100, random_state=42, output_type="numpy"
)
X_pca = pca_func.fit_transform(X)

Expected behavior
That it work like cuml 23.08 with cuda11-8

Environment details (please complete the following information):

  • Environment location: [Bare-metal]
  • Linux Distro/Architecture: [Ubuntu 22.04 amd64]
  • GPU Model/Driver: [30902 and driver 535.86.10]
  • CUDA: [12.0]
  • Method of cuDF & cuML install: [conda]
    List of packages in environment: "/home/sdicks/micromamba/envs/rapids-23.08_12"

Name Version Build Channel
───────────────────────────────────────────────────────────────────────────────────────────────
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
aiohttp 3.8.5 py310h2372a71_0 conda-forge
aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge
anyio 3.7.1 pyhd8ed1ab_0 conda-forge
aom 3.5.0 h27087fc_0 conda-forge
appdirs 1.4.4 pyh9f0ad1d_0 conda-forge
argon2-cffi 21.3.0 pyhd8ed1ab_0 conda-forge
argon2-cffi-bindings 21.2.0 py310h5764c6d_3 conda-forge
arpack 3.7.0 hdefa2d7_2 conda-forge
arrow 1.2.3 pyhd8ed1ab_0 conda-forge
asttokens 2.2.1 pyhd8ed1ab_0 conda-forge
async-lru 2.0.4 pyhd8ed1ab_0 conda-forge
async-timeout 4.0.3 pyhd8ed1ab_0 conda-forge
attrs 23.1.0 pyh71513ae_1 conda-forge
aws-c-auth 0.7.0 hbbaa140_3 conda-forge
aws-c-cal 0.6.0 h93469e0_0 conda-forge
aws-c-common 0.8.23 hd590300_0 conda-forge
aws-c-compression 0.2.17 h862ab75_1 conda-forge
aws-c-event-stream 0.3.1 h9599702_1 conda-forge
aws-c-http 0.7.11 hbe98c3e_0 conda-forge
aws-c-io 0.13.28 h3870b5a_0 conda-forge
aws-c-mqtt 0.9.0 h2e270ba_0 conda-forge
aws-c-s3 0.3.13 heb0bb06_2 conda-forge
aws-c-sdkutils 0.1.12 h862ab75_0 conda-forge
aws-checksums 0.1.16 h862ab75_1 conda-forge
aws-crt-cpp 0.21.0 h87b6960_2 conda-forge
aws-sdk-cpp 1.10.57 h7062fed_18 conda-forge
babel 2.12.1 pyhd8ed1ab_1 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 pyhd8ed1ab_3 conda-forge
backports.functools_lru_cache 1.6.5 pyhd8ed1ab_0 conda-forge
beautifulsoup4 4.12.2 pyha770c72_0 conda-forge
bleach 6.0.0 pyhd8ed1ab_0 conda-forge
blosc 1.21.4 h0f2a231_0 conda-forge
bokeh 3.2.2 pyhd8ed1ab_0 conda-forge
boost-cpp 1.78.0 h6582d0a_3 conda-forge
branca 0.6.0 pyhd8ed1ab_0 conda-forge
brotli 1.0.9 h166bdaf_9 conda-forge
brotli-bin 1.0.9 h166bdaf_9 conda-forge
brotlipy 0.7.0 py310h5764c6d_1005 conda-forge
brunsli 0.1 h9c3ff4c_0 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.19.1 hd590300_0 conda-forge
c-blosc2 2.10.0 hb4ffafa_0 conda-forge
ca-certificates 2023.7.22 hbcca054_0 conda-forge
cached-property 1.5.2 hd8ed1ab_1 conda-forge
cached_property 1.5.2 pyha770c72_1 conda-forge
cachetools 5.3.1 pyhd8ed1ab_0 conda-forge
cairo 1.16.0 hbbf8b49_1016 conda-forge
certifi 2023.7.22 pyhd8ed1ab_0 conda-forge
cffi 1.15.1 py310h255011f_3 conda-forge
cfitsio 4.2.0 hd9d235c_0 conda-forge
charls 2.4.2 h59595ed_0 conda-forge
charset-normalizer 3.2.0 pyhd8ed1ab_0 conda-forge
click 8.1.6 unix_pyh707e725_0 conda-forge
click-plugins 1.1.1 py_0 conda-forge
cligj 0.7.2 pyhd8ed1ab_1 conda-forge
cloudpickle 2.2.1 pyhd8ed1ab_0 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
colorcet 3.0.1 pyhd8ed1ab_0 conda-forge
comm 0.1.4 pyhd8ed1ab_0 conda-forge
contourpy 1.1.0 py310hd41b1e2_0 conda-forge
cryptography 41.0.3 py310h75e40e8_0 conda-forge
cucim 23.08.00 cuda12_py310_230809_gf3a294b_0 rapidsai
cuda-cccl 12.2.128 0 nvidia
cuda-cudart 12.2.128 0 nvidia
cuda-cudart-dev 12.2.128 0 nvidia
cuda-nvcc-dev_linux-64 12.0.76 ha770c72_0 conda-forge
cuda-nvcc-impl 12.0.76 h59595ed_0 conda-forge
cuda-nvcc-tools 12.0.76 h59595ed_0 conda-forge
cuda-nvrtc 12.2.128 0 nvidia
cuda-nvtx 12.2.128 0 nvidia
cuda-profiler-api 12.2.128 0 nvidia
cuda-python 12.2.0 py310h79c70a0_0 nvidia
cuda-version 12.0 hffde075_2 conda-forge
cudf 23.08.00 cuda12_py310_230809_g8150d38e08_0 rapidsai
cudf_kafka 23.08.00 cuda12_py310_230809_g8150d38e08_0 rapidsai
cudnn 8.8.0.121 h459966d_1 conda-forge
cugraph 23.08.00 cuda12_py310_230809_g3079227b_0 rapidsai
cuml 23.08.00 cuda12_py310_230809_gd7162cdea_0 rapidsai
cuproj 23.08.01 cuda12_py310_230810_g2660aba7_0 rapidsai
cupy 12.1.0 py310hfc31588_1 conda-forge
curl 8.2.1 hca28451_0 conda-forge
cusparselt 0.4.0.7 h3a97aeb_2 conda-forge
cuspatial 23.08.01 cuda12_py310_230810_g2660aba7_0 rapidsai
custreamz 23.08.00 cuda12_py310_230809_g8150d38e08_0 rapidsai
cutensor 1.7.0.1 0 nvidia
cutensor-cuda-12 1.7.0.1 0 nvidia
cuxfilter 23.08.02 cuda12_py310_230811_g9d32b65_0 rapidsai
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
cyrus-sasl 2.1.27 h54b06d7_7 conda-forge
cytoolz 0.12.2 py310h2372a71_0 conda-forge
dask 2023.7.1 pyhd8ed1ab_0 conda-forge
dask-core 2023.7.1 pyhd8ed1ab_0 conda-forge
dask-cuda 23.08.00 py310_230809_gefbd6ca_0 rapidsai
dask-cudf 23.08.00 cuda12_py310_230809_g8150d38e08_0 rapidsai
datashader 0.15.1 pyhd8ed1ab_0 conda-forge
datashape 0.5.4 py_1 conda-forge
dav1d 1.2.1 hd590300_0 conda-forge
debugpy 1.6.8 py310hc6cd4ac_0 conda-forge
decorator 5.1.1 pyhd8ed1ab_0 conda-forge
defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge
distributed 2023.7.1 pyhd8ed1ab_0 conda-forge
dlpack 0.5 h9c3ff4c_0 conda-forge
entrypoints 0.4 pyhd8ed1ab_0 conda-forge
exceptiongroup 1.1.3 pyhd8ed1ab_0 conda-forge
executing 1.2.0 pyhd8ed1ab_0 conda-forge
expat 2.5.0 hcb278e6_1 conda-forge
fa2 0.3.5 py310h5764c6d_2 conda-forge
fastrlock 0.8 py310hd8f1fbe_3 conda-forge
fiona 1.9.4 py310h111440e_0 conda-forge
flit-core 3.9.0 pyhd8ed1ab_0 conda-forge
fmt 9.1.0 h924138e_0 conda-forge
folium 0.14.0 pyhd8ed1ab_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.42.0 py310h2372a71_0 conda-forge
fqdn 1.5.1 pyhd8ed1ab_0 conda-forge
freetype 2.12.1 hca18f0e_1 conda-forge
freexl 1.0.6 h166bdaf_1 conda-forge
frozenlist 1.4.0 py310h2372a71_0 conda-forge
fsspec 2023.6.0 pyh1a96a4e_0 conda-forge
gdal 3.7.0 py310h65bb550_3 conda-forge
gdk-pixbuf 2.42.10 h6b639ba_2 conda-forge
geopandas 0.13.2 pyhd8ed1ab_1 conda-forge
geopandas-base 0.13.2 pyha770c72_1 conda-forge
geos 3.11.2 hcb278e6_0 conda-forge
geotiff 1.7.1 h22adcc9_11 conda-forge
gettext 0.21.1 h27087fc_0 conda-forge
gflags 2.2.2 he1b5a44_1004 conda-forge
giflib 5.2.1 h0b41bf4_3 conda-forge
glog 0.6.0 h6f12383_0 conda-forge
glpk 5.0 h445213a_0 conda-forge
gmock 1.14.0 ha770c72_1 conda-forge
gmp 6.2.1 h58526e2_0 conda-forge
gtest 1.14.0 h00ab1b0_1 conda-forge
hdf4 4.2.15 h501b40f_6 conda-forge
hdf5 1.14.1 nompi_h4f84152_100 conda-forge
holoviews 1.17.0 pyhd8ed1ab_0 conda-forge
icu 72.1 hcb278e6_0 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
igraph 0.10.6 h97b68dd_0 conda-forge
imagecodecs 2023.7.10 py310hc929067_2 conda-forge
imageio 2.31.1 pyh24c5eb1_0 conda-forge
importlib-metadata 6.8.0 pyha770c72_0 conda-forge
importlib_metadata 6.8.0 hd8ed1ab_0 conda-forge
importlib_resources 6.0.1 pyhd8ed1ab_0 conda-forge
ipykernel 6.25.1 pyh71e2992_0 conda-forge
ipython 8.14.0 pyh41d4057_0 conda-forge
isoduration 20.11.0 pyhd8ed1ab_0 conda-forge
jbig 2.1 h7f98852_2003 conda-forge
jedi 0.19.0 pyhd8ed1ab_0 conda-forge
jinja2 3.1.2 pyhd8ed1ab_1 conda-forge
joblib 1.3.2 pyhd8ed1ab_0 conda-forge
json-c 0.16 hc379101_0 conda-forge
json5 0.9.14 pyhd8ed1ab_0 conda-forge
jsonpointer 2.0 py_0 conda-forge
jsonschema 4.19.0 pyhd8ed1ab_1 conda-forge
jsonschema-specifications 2023.7.1 pyhd8ed1ab_0 conda-forge
jsonschema-with-format-nongpl 4.19.0 pyhd8ed1ab_1 conda-forge
jupyter-lsp 2.2.0 pyhd8ed1ab_0 conda-forge
jupyter-server-proxy 4.0.0 pyhd8ed1ab_0 conda-forge
jupyter_client 8.3.0 pyhd8ed1ab_0 conda-forge
jupyter_core 5.3.1 py310hff52083_0 conda-forge
jupyter_events 0.7.0 pyhd8ed1ab_2 conda-forge
jupyter_server 2.7.0 pyhd8ed1ab_0 conda-forge
jupyter_server_terminals 0.4.4 pyhd8ed1ab_1 conda-forge
jupyterlab 4.0.5 pyhd8ed1ab_0 conda-forge
jupyterlab_pygments 0.2.2 pyhd8ed1ab_0 conda-forge
jupyterlab_server 2.24.0 pyhd8ed1ab_0 conda-forge
jxrlib 1.1 h7f98852_2 conda-forge
kealib 1.5.1 h3e6883b_4 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.4 py310hbf28c38_1 conda-forge
krb5 1.21.1 h659d440_0 conda-forge
lazy_loader 0.2 pyhd8ed1ab_0 conda-forge
lcms2 2.15 haa2dc70_1 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
leidenalg 0.10.1 py310hc6cd4ac_0 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libabseil 20230125.3 cxx17_h59595ed_0 conda-forge
libaec 1.0.6 hcb278e6_1 conda-forge
libarchive 3.6.2 h039dbb9_1 conda-forge
libarrow 11.0.0 h10ac928_33_cpu conda-forge
libavif 0.11.1 h8182462_2 conda-forge
libblas 3.9.0 17_linux64_openblas conda-forge
libbrotlicommon 1.0.9 h166bdaf_9 conda-forge
libbrotlidec 1.0.9 h166bdaf_9 conda-forge
libbrotlienc 1.0.9 h166bdaf_9 conda-forge
libcblas 3.9.0 17_linux64_openblas conda-forge
libcrc32c 1.1.2 h9c3ff4c_0 conda-forge
libcublas 12.2.4.5 0 nvidia
libcublas-dev 12.2.4.5 0 nvidia
libcucim 23.08.00 cuda12_230809_gf3a294b_0 rapidsai
libcudf 23.08.00 cuda12_230809_g8150d38e08_0 rapidsai
libcudf_kafka 23.08.00 cuda12_230809_g8150d38e08_0 rapidsai
libcufft 11.0.8.91 0 nvidia
libcufile 1.7.1.12 0 nvidia
libcufile-dev 1.7.1.12 0 nvidia
libcugraph 23.08.00 cuda12_230809_g3079227b_0 rapidsai
libcugraph_etl 23.08.00 cuda12_230809_g3079227b_0 rapidsai
libcugraphops 23.08.00 cuda12_230809_g0cc04e51_0 nvidia
libcuml 23.08.00 cuda12_230809_gd7162cdea_0 rapidsai
libcumlprims 23.08.00 cuda12_230809_g71c0a86_0 nvidia
libcurand 10.3.3.129 0 nvidia
libcurand-dev 10.3.3.129 0 nvidia
libcurl 8.2.1 hca28451_0 conda-forge
libcusolver 11.5.1.129 0 nvidia
libcusolver-dev 11.5.1.129 0 nvidia
libcusparse 12.1.2.129 0 nvidia
libcusparse-dev 12.1.2.129 0 nvidia
libcuspatial 23.08.01 cuda12_230810_g2660aba7_0 rapidsai
libcutensor-cuda-12 1.7.0.1 0 nvidia
libcutensor-dev-cuda-12 1.7.0.1 0 nvidia
libdeflate 1.18 h0b41bf4_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libevent 2.1.12 hf998b51_1 conda-forge
libexpat 2.5.0 hcb278e6_1 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.1.0 he5830b7_0 conda-forge
libgdal 3.7.0 h4a547c6_3 conda-forge
libgfortran-ng 13.1.0 h69a702a_0 conda-forge
libgfortran5 13.1.0 h15d22d2_0 conda-forge
libglib 2.76.4 hebfc3b9_0 conda-forge
libgomp 13.1.0 he5830b7_0 conda-forge
libgoogle-cloud 2.12.0 h840a212_1 conda-forge
libgrpc 1.56.2 h3905398_1 conda-forge
libhwloc 2.9.2 nocuda_h7313eea_1008 conda-forge
libiconv 1.17 h166bdaf_0 conda-forge
libjpeg-turbo 2.1.5.1 h0b41bf4_0 conda-forge
libkml 1.3.0 h37653c0_1015 conda-forge
libkvikio 23.08.00 cuda12_230809_g51a9036_0 rapidsai
liblapack 3.9.0 17_linux64_openblas conda-forge
libleidenalg 0.11.1 h00ab1b0_0 conda-forge
libllvm14 14.0.6 hcd5def8_4 conda-forge
libnetcdf 4.9.2 nompi_h7e745eb_109 conda-forge
libnghttp2 1.52.0 h61bc06f_0 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libntlm 1.4 h7f98852_1002 conda-forge
libnuma 2.0.16 h0b41bf4_1 conda-forge
libnvjpeg 12.2.1.2 0 nvidia
libopenblas 0.3.23 pthreads_h80387f5_0 conda-forge
libpng 1.6.39 h753d276_0 conda-forge
libpq 15.4 hfc447b1_0 conda-forge
libprotobuf 4.23.3 hd1fb520_0 conda-forge
libraft 23.08.00 cuda12_230809_ge588d7b5_0 rapidsai
libraft-headers 23.08.00 cuda12_230809_ge588d7b5_0 rapidsai
libraft-headers-only 23.08.00 cuda12_230809_ge588d7b5_0 rapidsai
librdkafka 1.9.2 ha5a0de0_2 conda-forge
librmm 23.08.00 cuda12_230809_gf3af0e8d_0 rapidsai
librttopo 1.1.0 h0d5128d_13 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libspatialindex 1.9.3 h9c3ff4c_4 conda-forge
libspatialite 5.0.1 hca56755_27 conda-forge
libsqlite 3.42.0 h2797004_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-ng 13.1.0 hfd8a6a1_0 conda-forge
libthrift 0.18.1 h8fd135c_2 conda-forge
libtiff 4.5.1 h8b53f26_0 conda-forge
libutf8proc 2.8.0 h166bdaf_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libuv 1.44.2 hd590300_1 conda-forge
libwebp 1.3.1 hbf2b3c1_0 conda-forge
libwebp-base 1.3.1 hd590300_0 conda-forge
libxcb 1.15 h0b41bf4_0 conda-forge
libxgboost 1.7.4 rapidsai_hbb0ba15_6 rapidsai
libxml2 2.11.5 h0d562d8_0 conda-forge
libzip 1.9.2 hc929e4a_1 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
libzopfli 1.0.3 h9c3ff4c_0 conda-forge
linkify-it-py 2.0.0 pyhd8ed1ab_0 conda-forge
llvmlite 0.40.1 py310h1b8f574_0 conda-forge
locket 1.0.0 pyhd8ed1ab_0 conda-forge
louvain 0.8.1 py310hc6cd4ac_0 conda-forge
lz4 4.3.2 py310h0cfdcf0_0 conda-forge
lz4-c 1.9.4 hcb278e6_0 conda-forge
lzo 2.10 h516909a_1000 conda-forge
mapclassify 2.5.0 pyhd8ed1ab_1 conda-forge
markdown 3.4.4 pyhd8ed1ab_0 conda-forge
markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge
markupsafe 2.1.3 py310h2372a71_0 conda-forge
matplotlib-base 3.7.2 py310hf38f957_0 conda-forge
matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge
mdit-py-plugins 0.4.0 pyhd8ed1ab_0 conda-forge
mdurl 0.1.0 pyhd8ed1ab_0 conda-forge
metis 5.1.1 h59595ed_0 conda-forge
mistune 3.0.0 pyhd8ed1ab_0 conda-forge
mpfr 4.2.0 hb012696_0 conda-forge
msgpack-python 1.0.5 py310hdf3cbec_0 conda-forge
multidict 6.0.4 py310h1fa729e_0 conda-forge
multipledispatch 0.6.0 py_0 conda-forge
munch 4.0.0 pyhd8ed1ab_0 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
nbclient 0.8.0 pyhd8ed1ab_0 conda-forge
nbconvert-core 7.7.3 pyhd8ed1ab_0 conda-forge
nbformat 5.9.2 pyhd8ed1ab_0 conda-forge
nccl 2.18.3.1 h3a97aeb_0 conda-forge
ncurses 6.4 hcb278e6_0 conda-forge
nest-asyncio 1.5.6 pyhd8ed1ab_0 conda-forge
networkx 3.1 pyhd8ed1ab_0 conda-forge
nodejs 20.1.0 hf52ce11_0 conda-forge
notebook-shim 0.2.3 pyhd8ed1ab_0 conda-forge
nspr 4.35 h27087fc_0 conda-forge
nss 3.89 he45b914_0 conda-forge
numba 0.57.1 py310h0f6aa51_0 conda-forge
numpy 1.24.4 py310ha4c1d20_0 conda-forge
nvcomp 2.6.1 h3a97aeb_2 conda-forge
nvtx 0.2.5 py310h1fa729e_0 conda-forge
openjpeg 2.5.0 hfec8fc6_2 conda-forge
openslide 3.4.1 ha896ae7_9 conda-forge
openssl 3.1.2 hd590300_0 conda-forge
orc 1.9.0 h385abfd_1 conda-forge
overrides 7.4.0 pyhd8ed1ab_0 conda-forge
packaging 23.1 pyhd8ed1ab_0 conda-forge
pandas 1.5.3 py310h9b08913_1 conda-forge
pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge
panel 1.2.1 pyhd8ed1ab_0 conda-forge
param 1.13.0 pyh1a96a4e_0 conda-forge
parso 0.8.3 pyhd8ed1ab_0 conda-forge
partd 1.4.0 pyhd8ed1ab_0 conda-forge
pcre2 10.40 hc3806b6_0 conda-forge
pexpect 4.8.0 pyh1a96a4e_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 10.0.0 py310h582fbeb_0 conda-forge
pip 23.2.1 pyhd8ed1ab_0 conda-forge
pixman 0.40.0 h36c2ea0_0 conda-forge
pkgutil-resolve-name 1.3.10 pyhd8ed1ab_0 conda-forge
platformdirs 3.10.0 pyhd8ed1ab_0 conda-forge
pooch 1.7.0 pyha770c72_3 conda-forge
poppler 23.05.0 hd18248d_1 conda-forge
poppler-data 0.4.12 hd8ed1ab_0 conda-forge
postgresql 15.4 h8972f4a_0 conda-forge
proj 9.2.1 ha643af7_0 conda-forge
prometheus_client 0.17.1 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.39 pyha770c72_0 conda-forge
prompt_toolkit 3.0.39 hd8ed1ab_0 conda-forge
protobuf 4.23.3 py310hb875b13_0 conda-forge
psutil 5.9.5 py310h1fa729e_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge
py-xgboost 1.7.4 rapidsai_py310h1395376_6 rapidsai
pyarrow 11.0.0 py310he6bfd7f_33_cpu conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pyct 0.4.6 py_0 conda-forge
pyct-core 0.4.6 py_0 conda-forge
pyee 8.1.0 pyhd8ed1ab_0 conda-forge
pygments 2.16.1 pyhd8ed1ab_0 conda-forge
pylibcugraph 23.08.00 cuda12_py310_230809_g3079227b_0 rapidsai
pylibraft 23.08.00 cuda12_py310_230809_ge588d7b5_0 rapidsai
pynvml 11.4.1 pyhd8ed1ab_0 conda-forge
pyopenssl 23.2.0 pyhd8ed1ab_1 conda-forge
pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge
pyppeteer 1.0.2 pyhd8ed1ab_0 conda-forge
pyproj 3.6.0 py310h24ef57a_1 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.10.12 hd12c33a_0_cpython conda-forge
python-confluent-kafka 1.9.2 py310h5764c6d_2 conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-fastjsonschema 2.18.0 pyhd8ed1ab_0 conda-forge
python-igraph 0.10.6 py310h33b8572_0 conda-forge
python-json-logger 2.0.7 pyhd8ed1ab_0 conda-forge
python_abi 3.10 3_cp310 conda-forge
pytz 2023.3 pyhd8ed1ab_0 conda-forge
pyviz_comms 2.3.2 pyhd8ed1ab_0 conda-forge
pywavelets 1.4.1 py310h0a54255_0 conda-forge
pyyaml 6.0 py310h5764c6d_5 conda-forge
pyzmq 25.1.1 py310h5bbb5d0_0 conda-forge
raft-dask 23.08.00 cuda12_py310_230809_ge588d7b5_0 rapidsai
rapids 23.08.00 cuda12_py310_230809_g2a5b6f0_0 rapidsai
rapids-xgboost 23.08.00 cuda12_py310_230809_g2a5b6f0_0 rapidsai
rdma-core 28.9 h59595ed_1 conda-forge
re2 2023.03.02 h8c504da_0 conda-forge
readline 8.2 h8228510_1 conda-forge
referencing 0.30.2 pyhd8ed1ab_0 conda-forge
requests 2.31.0 pyhd8ed1ab_0 conda-forge
rfc3339-validator 0.1.4 pyhd8ed1ab_0 conda-forge
rfc3986-validator 0.1.1 pyh9f0ad1d_0 conda-forge
rmm 23.08.00 cuda12_py310_230809_gf3af0e8d_0 rapidsai
rocm-smi 5.6.0 h59595ed_1 conda-forge
rpds-py 0.9.2 py310hcb5633a_0 conda-forge
rtree 1.0.1 py310hbdcdc62_2 conda-forge
s2n 1.3.46 h06160fa_0 conda-forge
scikit-image 0.21.0 py310hc6cd4ac_0 conda-forge
scikit-learn 1.3.0 py310hf7d194e_0 conda-forge
scipy 1.11.1 py310ha4c1d20_0 conda-forge
send2trash 1.8.2 pyh41d4057_0 conda-forge
setuptools 68.0.0 pyhd8ed1ab_0 conda-forge
shapely 2.0.1 py310h056c13c_1 conda-forge
simpervisor 1.0.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
snappy 1.1.10 h9fff704_0 conda-forge
sniffio 1.3.0 pyhd8ed1ab_0 conda-forge
sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge
soupsieve 2.3.2.post1 pyhd8ed1ab_0 conda-forge
spdlog 1.11.0 h9b3ece8_1 conda-forge
sqlite 3.42.0 h2c6b66d_0 conda-forge
stack_data 0.6.2 pyhd8ed1ab_0 conda-forge
streamz 0.6.4 pyh6c4a22f_0 conda-forge
suitesparse 5.10.1 h9e50725_1 conda-forge
tbb 2021.10.0 h00ab1b0_0 conda-forge
tblib 1.7.0 pyhd8ed1ab_0 conda-forge
terminado 0.17.1 pyh41d4057_0 conda-forge
texttable 1.6.7 pyhd8ed1ab_0 conda-forge
threadpoolctl 3.2.0 pyha21a80b_0 conda-forge
tifffile 2023.7.18 pyhd8ed1ab_0 conda-forge
tiledb 2.13.2 hd532e3d_0 conda-forge
tinycss2 1.2.1 pyhd8ed1ab_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
tomli 2.0.1 pyhd8ed1ab_0 conda-forge
toolz 0.12.0 pyhd8ed1ab_0 conda-forge
tornado 6.3.2 py310h2372a71_0 conda-forge
tqdm 4.66.1 pyhd8ed1ab_0 conda-forge
traitlets 5.9.0 pyhd8ed1ab_0 conda-forge
treelite 3.2.0 py310h1be96d9_0 conda-forge
typing-extensions 4.7.1 hd8ed1ab_0 conda-forge
typing_extensions 4.7.1 pyha770c72_0 conda-forge
typing_utils 0.1.0 pyhd8ed1ab_0 conda-forge
tzcode 2023c h0b41bf4_0 conda-forge
tzdata 2023c h71feb2d_0 conda-forge
uc-micro-py 1.0.1 pyhd8ed1ab_0 conda-forge
ucx 1.14.1 h195a15c_3 conda-forge
ucx-proc 1.0.0 gpu rapidsai
ucx-py 0.33.00 py310_230809_gea1eb8f_0 rapidsai
unicodedata2 15.0.0 py310h5764c6d_0 conda-forge
uri-template 1.3.0 pyhd8ed1ab_0 conda-forge
urllib3 1.26.15 pyhd8ed1ab_0 conda-forge
wcwidth 0.2.6 pyhd8ed1ab_0 conda-forge
webcolors 1.13 pyhd8ed1ab_0 conda-forge
webencodings 0.5.1 py_1 conda-forge
websocket-client 1.6.1 pyhd8ed1ab_0 conda-forge
websockets 10.4 py310h5764c6d_1 conda-forge
wheel 0.41.1 pyhd8ed1ab_0 conda-forge
xarray 2023.7.0 pyhd8ed1ab_0 conda-forge
xerces-c 3.2.4 h8d71039_2 conda-forge
xgboost 1.7.4 rapidsai_py310h1395376_6 rapidsai
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.1.1 hd590300_0 conda-forge
xorg-libsm 1.2.4 h7391055_0 conda-forge
xorg-libx11 1.8.6 h8ee46fc_0 conda-forge
xorg-libxau 1.0.11 hd590300_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxrender 0.9.11 hd590300_0 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xyzservices 2023.7.0 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
yarl 1.9.2 py310h2372a71_0 conda-forge
zeromq 4.3.4 h9c3ff4c_1 conda-forge
zfp 1.0.0 h27087fc_3 conda-forge
zict 3.0.0 pyhd8ed1ab_0 conda-forge
zipp 3.16.2 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 hd590300_5 conda-forge
zlib-ng 2.0.7 h0b41bf4_0 conda-forge
zstd 1.5.2 hfc55251_7 conda-forge

Additional context
Add any other context about the problem here.

@Intron7 Intron7 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Aug 15, 2023
@dantegd dantegd added the 1 - On Deck To be worked on next label Sep 6, 2023
@divyegala
Copy link
Member

@Intron7 thanks for raising this issue. Is there a stack trace that you could share?

@Intron7
Copy link
Contributor Author

Intron7 commented Sep 11, 2023

Traceback (most recent call last):
  File "/tmp/ipykernel_5301/2924074572.py", line 2, in <module>
    X_pca = pca_func.fit_transform(X)
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
    return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
  File "base.pyx", line 665, in cuml.internals.base.UniversalBase.dispatch_func
  File "pca.pyx", line 509, in cuml.decomposition.pca.PCA.fit_transform
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
    return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
  File "/home/severin/conda/envs/rapids-23.08_12/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
  File "base.pyx", line 665, in cuml.internals.base.UniversalBase.dispatch_func
  File "pca.pyx", line 470, in cuml.decomposition.pca.PCA.fit
RuntimeError: cuSOLVER error encountered at: file=/home/severin/conda/envs/rapids-23.08_12/include/raft/linalg/detail/eig.cuh line=118: 

pca-23.08_cu1.zip

@dantegd I hope that enough. I can also appended a nsys-report for cuda12 and cuda118 runs.

@Intron7
Copy link
Contributor Author

Intron7 commented Nov 6, 2023

It seems like the bug still persists in rapids-23.10

@Intron7
Copy link
Contributor Author

Intron7 commented Dec 12, 2023

Just to check up is there still work beeing done fixing this issue? As far as I can tell the bug still persists in 23.12

@lharri73
Copy link

Also wondering about this. It must be related to rapids because torch doesn't have an issue running the same calculation on one GPU.

@Intron7
Copy link
Contributor Author

Intron7 commented Mar 26, 2024

This is still broken. Also in the development version for 24.04

@Intron7
Copy link
Contributor Author

Intron7 commented Apr 18, 2024

@dantegd So I did some more testing on this because this seems to mainly affect Ampere GPUs. Can this please be fixed.

@lharri73
Copy link

@Intron7 no, this also affects H100's (Hopper).

@lowener
Copy link
Contributor

lowener commented May 23, 2024

I submitted PR rapidsai/raft#2332 to fix this issue. It should be resolved by version 24.08

@liyaodev
Copy link

liyaodev commented Jul 4, 2024

@lowener Hi, I also had a similar problem in CUDA 12.1, I would like to ask if I need to choose 12.0?

Traceback (most recent call last):
  File "/app/r_sc_test.py", line 115, in <module>
    rsc.tl.pca(adata, n_comps=100)
  File "/usr/local/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_pca.py", line 163, in pca
    X_pca = pca_func.fit_transform(X)
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
    return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
  File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
  File "pca.pyx", line 507, in cuml.decomposition.pca.PCA.fit_transform
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
    return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
  File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
  File "pca.pyx", line 468, in cuml.decomposition.pca.PCA.fit
RuntimeError: cuSOLVER error encountered at: file=/__w/cuml/cuml/python/build/cp310-cp310-linux_x86_64/_deps/raft-src/cpp/include/raft/linalg/detail/eig.cuh line=121: call='cusolverDnxsyevd(cusolverH, dn_params, CUSOLVER_EIG_MODE_VECTOR, CUBLAS_FILL_MODE_UPPER, static_cast<int64_t>(n_rows), eig_vectors, static_cast<int64_t>(n_cols), eig_vals, d_work.data(), workspaceDevice, h_work.data(), workspaceHost, d_dev_info.data(), stream)', Reason=7:CUSOLVER_STATUS_INTERNAL_ERROR
Obtained 42 stack frames
#1 in /usr/local/lib/python3.10/site-packages/cuml/internals/../libcuml++.so: raft::cusolver_error::cusolver_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0xbd [0x7f301aaa289d]
#2 in /usr/local/lib/python3.10/site-packages/cuml/internals/../libcuml++.so: void raft::linalg::detail::eigDC<float>(raft::resources const&, float const*, unsigned long, unsigned long, float*, float*, CUstream_st*) +0xe6b [0x7f301b0468fb]
#3 in /usr/local/lib/python3.10/site-packages/cuml/internals/../libcuml++.so: void ML::truncCompExpVars<float, ML::solver>(raft::handle_t const&, float*, float*, float*, float*, ML::paramsTSVDTemplate<ML::solver> const&, CUstream_st*) +0x5de [0x7f301b5f579e]
#4 in /usr/local/lib/python3.10/site-packages/cuml/internals/../libcuml++.so(+0x2cb60c5) [0x7f301b5e90c5]
#5 in /usr/local/lib/python3.10/site-packages/cuml/decomposition/pca.cpython-310-x86_64-linux-gnu.so(+0x3fedb) [0x7f2fe3c8eedb]
#6 in /usr/local/lib/python3.10/site-packages/cuml/internals/base.cpython-310-x86_64-linux-gnu.so(+0x1006e) [0x7f2fe455006e]
#7 in /usr/local/lib/python3.10/site-packages/cuml/internals/base.cpython-310-x86_64-linux-gnu.so(+0x2e7a6) [0x7f2fe456e7a6]
#8 in python: PyVectorcall_Call +0x6c [0x55ee4702169c]
#9 in python: _PyEval_EvalFrameDefault +0x44c6 [0x55ee47008c06]
#10 in python(+0x131d08) [0x55ee470dad08]
#11 in python(+0x2248c1) [0x55ee471cd8c1]
#12 in python: PyVectorcall_Call +0x6c [0x55ee4702169c]
#13 in python: _PyEval_EvalFrameDefault +0x44c6 [0x55ee47008c06]
#14 in python(+0x131d08) [0x55ee470dad08]
#15 in python: PyVectorcall_Call +0x6c [0x55ee4702169c]
#16 in python: _PyEval_EvalFrameDefault +0x44c6 [0x55ee47008c06]
#17 in python(+0x131d08) [0x55ee470dad08]
#18 in /usr/local/lib/python3.10/site-packages/cuml/decomposition/pca.cpython-310-x86_64-linux-gnu.so(+0x305d5) [0x7f2fe3c7f5d5]
#19 in /usr/local/lib/python3.10/site-packages/cuml/internals/base.cpython-310-x86_64-linux-gnu.so(+0x1006e) [0x7f2fe455006e]
#20 in /usr/local/lib/python3.10/site-packages/cuml/internals/base.cpython-310-x86_64-linux-gnu.so(+0x2e7a6) [0x7f2fe456e7a6]
#21 in python: PyVectorcall_Call +0x6c [0x55ee4702169c]
#22 in python: _PyEval_EvalFrameDefault +0x44c6 [0x55ee47008c06]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - On Deck To be worked on next ? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants