-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about a problem of install k2 #569
Comments
---> change to
|
I don't think it's about CUDNN but about CUBLAS. Don't you have to tell it
the root of the whole CUDA toolkit? I forget the variable name.
…On Tue, Jan 5, 2021 at 1:17 PM Fangjun Kuang ***@***.***> wrote:
-D CUDNN_LIBRARY_PATH="/usr/local/cuda/cudnn/lib64/"
---> change to
-D CUDNN_LIBRARY_PATH="/usr/local/cuda/cudnn/lib64/libcudnn.so"
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO7OHKL4BY7P7JCXIR3SYKOFFANCNFSM4VUIPSPQ>
.
|
|
Do you mean
|
For the second error:
Please use
In general, you do not need to specify so many values for |
if I don't specify cudnn path, CMake can't find it, because cudnn is not in the default location on the computer server cluster. Cuda and cudnn path of on the computer server cluster:
I will follow your suggestion and try to do it. |
The compile command is as follows:
because
when I run
|
/usr/local/cuda/cudnn/lib64/libcudnn.so exists?
…On Tue, Jan 5, 2021 at 2:20 PM shanguanma ***@***.***> wrote:
The compile command is as follows:
$ cmake -D CMAKE_CUDA_COMPILER="/usr/local/cuda/bin/nvcc " -D CMAKE_CXX_COMPILER="/usr/bin/g++" -D CUDNN_LIBRARY_PATH="/usr/local/cuda/cudnn/lib64/libcudnn.so" -D CUDA_cublas_LIBRARY="/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/libcublas.so" -D CUDNN_INCLUDE_PATH="/usr/local/cuda/cudnn/include" -DCMAKE_BUILD_TYPE=Release ..
because /usr/local/cuda don't contain libcublas.so,
grep -rn "libcublas.so" /usr/local
grep: /usr/local/libexec/dgx-cgroup/cgroup-classify: Permission denied
grep: /usr/local/libexec/dgx-cgroup/cgroup-remove: Permission denied
grep: /usr/local/libexec/dgx-cgroup/cgroup-create: Permission denied
grep: /usr/local/libexec/dgx-cgroup/cgroup-cleanup: Permission denied
grep: /usr/local/libexec/dgx-cgroup/common: Permission denied
/usr/local/cuda-9.0/doc/EULA.txt:1009: Linux : libcublas.so, libcublas_static.a, libcublas_device.a
/usr/local/cuda-9.0/doc/EULA.txt:1010: Android : libcublas.so, libcublas_static.a, libcublas_device.a
Binary file /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvgraph.so.9.0.176 matches
Binary file /usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/libcublas.so matches
Binary file /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcublas.so.9.0.333 matches
Binary file /usr/local/cuda-9.0/targets/x86_64-linux/lib/libnvblas.so.9.0.333 matches
grep: /usr/local/bin/pbs-dgx-cgroup-create: Permission denied
grep: /usr/local/bin/pbs-dgx-cleanup: Permission denied
grep: /usr/local/bin/dgx-cgroup-create: Permission denied
grep: /usr/local/bin/dgx-cgroup-remove: Permission denied
grep: /usr/local/bin/dgx-cgroup-classify: Permission denied
grep: /usr/local/bin/dgx-docker-cleanup: Permission denied
grep: /usr/local/bin/pam-sshd-attach: Permission denied
grep: /usr/local/bin/dgx-cgroup-cleanup: Permission denied
grep: /usr/local/etc/dgx-cgroup: Permission denied
grep: /usr/local/sbin/docker-log: Permission denied
grep: /usr/local/sbin/pbs-move-undelivered: Permission denied
grep: /usr/local/sbin/node-load: Permission denied
grep: /usr/local/sbin/purge-log: Permission denied
grep: /usr/local/sbin/cleanup-tmp: Permission denied
/usr/local/cuda-10.1/doc/EULA.txt:649:libcublas.so, libcublasLt.so, libcublas_static.a,
/usr/local/cuda-10.1/doc/EULA.txt:654:libcublas.so, libcublasLt.so, libcublas_static.a,
/usr/local/cuda-8.0/doc/EULA.txt:535: Linux : libcublas.so, libcublas_static.a, libcublas_device.a
/usr/local/cuda-8.0/doc/EULA.txt:536: Android : libcublas.so, libcublas_static.a, libcublas_device.a
Binary file /usr/local/cuda-8.0/targets/x86_64-linux/lib/libnvblas.so.8.0.61 matches
Binary file /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcublas.so.8.0.61 matches
Binary file /usr/local/cuda-8.0/targets/x86_64-linux/lib/libnvgraph.so.8.0.61 matches
Binary file /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcublas.so.8.0.88 matches
Binary file /usr/local/cuda-8.0/targets/x86_64-linux/lib/libnvblas.so.8.0.88 matches
when I run make _k2, the error is as follows:
[ 70%] Building CUDA object k2/csrc/CMakeFiles/context.dir/utils.cu.o
[ 74%] Building CUDA object k2/csrc/CMakeFiles/context.dir/pytorch_context.cu.o
make[3]: *** No rule to make target '/usr/local/cuda/cudnn/lib64/libcudnn.so', needed by 'k2/csrc/CMakeFiles/context.dir/cmake_device_link.o'. Stop.
CMakeFiles/Makefile2:706: recipe for target 'k2/csrc/CMakeFiles/context.dir/all' failed
make[2]: *** [k2/csrc/CMakeFiles/context.dir/all] Error 2
CMakeFiles/Makefile2:2210: recipe for target 'k2/python/csrc/CMakeFiles/_k2.dir/rule' failed
make[1]: *** [k2/python/csrc/CMakeFiles/_k2.dir/rule] Error 2
Makefile:727: recipe for target '_k2' failed
make: *** [_k2] Error 2
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO3L4FZKVM7MQE5CSIDSYKVT5ANCNFSM4VUIPSPQ>
.
|
yes,
|
|
Do ls -l, may be permission or dangling soft link problem
…On Tue, Jan 5, 2021 at 2:28 PM shanguanma ***@***.***> wrote:
$ ls /usr/local/cuda/cudnn/lib64/*
/usr/local/cuda/cudnn/lib64/libcudnn.so /usr/local/cuda/cudnn/lib64/libcudnn.so.7.6.0 /usr/local/cuda/cudnn/lib64/libcudnn_static_v7.a
/usr/local/cuda/cudnn/lib64/libcudnn.so.7 /usr/local/cuda/cudnn/lib64/libcudnn_static.a
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOZRUYSB7UVGMQX6TSLSYKWPJANCNFSM4VUIPSPQ>
.
|
What is the output of
You only posted the compilation log, without the configuration log. |
yes, it is as follows:
|
Yes, Maybe a problem there, Currently, as far as I know, k2 only support cuda=10.1, 10.2, can k2 support more cuda version, e.g.: cuda=10.0, cuda=9.2, etc, I don't know if there is such a plan. |
We only check that k2 is compiled with the same CUDA version that PyTorch is using. You can try k2 with cuda 10.0 or 9.2. It may work but I think it has not been tested. |
Previously, I try to do it, but it is failing, Any way, the Server shutdown just now, once It is working, I try to do it again by using the newest master branch |
I try to install k2 with cuda=10.0, because when cuda=10.0, max support pytorch version =1.4.0, so I use the below command to install k2 step by step:
then I run the
|
I would recommend to use CUDA 9.2 as there are lots of different PyTorch versions for it. Only PyTorch 1.6.0 and 1.7.0 have been tested and are known to work. |
Sorry, Currently I haven't cuda=9.2 computer server, so I can't test it right now. |
We are using cuda=10.1 for at least some development.
You have to find the right version of pytorch that's compiled for that
though.
…On Tue, Jan 5, 2021 at 8:07 PM Fangjun Kuang ***@***.***> wrote:
I try to install k2 with cuda=10.0, because when cuda=10.0, max support
pytorch version =1.4.0
I would recommend to use CUDA 9.2 as there are lots of different PyTorch
versions for it.
Only PyTorch 1.6.0 and 1.7.0 have been tested and are known to work.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2XYEEGPKEX6RNAERTSYL6ITANCNFSM4VUIPSPQ>
.
|
@danpovey ,OK, I see. Thanks for your reply. |
@danpovey , @csukuangfj , today(2020-1-12), because the computer server has been updated CUDA to CUDA10.2, cudnn update to cudnn7.6.5. I will compile the latest k2 master branch. I summary the details of install is as follows:
next install lhoste:
next install snowfall:
run the LibriSpeech recipe:
|
Mm. Try running the tests and see if any fail, e.g.
cd build
ctest
…On Tue, Jan 12, 2021 at 5:47 PM shanguanma ***@***.***> wrote:
Today(2020-1-12), because the computer server has been updated CUDA to
CUDA10.2, I will compile the latest k2 master branch. I summary the details
of install is as follows:
$ conda create -n k2-fsa1 python=3.7
$ conda activate k2-fsa1
$ conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
$ git clone https://github.com/k2-fsa/k2.git
$ cd k2
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
-- The CUDA compiler identification is NVIDIA 10.2.89
-- The CXX compiler identification is GNU 7.5.0
-- Check for working CUDA compiler: /cm/shared/apps/cuda10.2/toolkit/10.2.89/bin/nvcc
-- Check for working CUDA compiler: /cm/shared/apps/cuda10.2/toolkit/10.2.89/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CXX compiler: /home4/md510/gcc-7.5.0/bin/g++
-- Check for working CXX compiler: /home4/md510/gcc-7.5.0/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- K2_OS: CentOS Linux release 7.8.2003 (Core)
-- Found Git: /usr/bin/git (found version "1.8.3.1")
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for C++ include execinfo.h
-- Looking for C++ include execinfo.h - found
-- Performing Test K2_COMPILER_SUPPORTS_CXX14
-- Performing Test K2_COMPILER_SUPPORTS_CXX14 - Success
-- C++ Standard version: 14
CMake Warning at CMakeLists.txt:112 (message):
arch 62/72 are not supported for now
-- Found Valgrind: /usr/bin
-- Found Valgrind: /usr/bin/valgrind
-- To check memory, run `ctest -R <NAME> -D ExperimentalMemCheck`
-- Downloading pybind11
-- pybind11 is downloaded to /home4/md510/w2020/k2-fsa/k2/build/_deps/pybind11-src
-- pybind11 v2.6.0
-- Found PythonInterp: /home4/md510/anaconda3/envs/k2-fsa1/bin/python (found version "3.7.9")
-- Found PythonLibs: /home4/md510/anaconda3/envs/k2-fsa1/lib/libpython3.7m.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Python executable: /home4/md510/anaconda3/envs/k2-fsa1/bin/python
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
CMake Warning (dev) at /home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package):
Policy CMP0074 is not set: find_package uses <PackageName>_ROOT variables.
Run "cmake --help-policy CMP0074" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
Environment variable CUDA_ROOT is set to:
/cm/shared/apps/cuda10.2/toolkit/10.2.89
For compatibility, CMake is ignoring the variable.
Call Stack (most recent call first):
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:40 (find_package)
cmake/torch.cmake:11 (find_package)
CMakeLists.txt:134 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found CUDA: /cm/shared/apps/cuda10.2/toolkit/10.2.89 (found version "10.2")
-- Caffe2: CUDA detected: 10.2
-- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda10.2/toolkit/10.2.89/bin/nvcc
-- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda10.2/toolkit/10.2.89
-- Caffe2: Header version is: 10.2
-- Found CUDNN: /cm/shared/apps/cuda10.2/toolkit/10.2.89/lib64/libcudnn.so
-- Found cuDNN: v7.6.5 (include: /cm/shared/apps/cuda10.2/toolkit/10.2.89/include, library: /cm/shared/apps/cuda10.2/toolkit/10.2.89/lib64/libcudnn.so)
-- Autodetected CUDA architecture(s): 7.5 7.5 7.5 7.5 7.5
-- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
-- Found Torch: /home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/torch/lib/libtorch.so
-- PyTorch version: 1.7.1
-- PyTorch cuda version: 10.2
-- Downloading cub
-- cub is downloaded to /home4/md510/w2020/k2-fsa/k2/build/_deps/cub-src
-- Downloading moderngpu
-- moderngpu is downloaded to /home4/md510/w2020/k2-fsa/k2/build/_deps/moderngpu-src
-- Downloading googletest
-- googletest is downloaded to /home4/md510/w2020/k2-fsa/k2/build/_deps/googletest-src
-- googletest's binary dir is /home4/md510/w2020/k2-fsa/k2/build/_deps/googletest-build
-- The C compiler identification is GNU 7.5.0
-- Check for working C compiler: /home4/md510/gcc-7.5.0/bin/gcc
-- Check for working C compiler: /home4/md510/gcc-7.5.0/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Generated /home4/md510/w2020/k2-fsa/k2/build/k2/csrc/version.h
-- Configuring done
-- Generating done
-- Build files have been written to: /home4/md510/w2020/k2-fsa/k2/build
$ make _k2 ## no error
$ python3 -m pip install --no-deps --force-reinstall graphviz ## no error
$ make -j ## no error
$ ctest --parallel 5 ## no error
$ make test ## no error
$ pip3 install wheel twine
$ ./scripts/build_pip.sh
$ python3 -m pip install --no-deps --force-reinstall dist/k2-*.whl
next install lhoste:
$ pip install --force-reinstall git+https://github.com/lhotse-speech/lhotse
next install snowfall:
$ git clone https://github.com/k2-fsa/snowfall.git
$ cd snowfall
$ vim ../readme.txt
#k2
kaldialign
***@***.***+https://github.com/lhotse-speech/lhotse
tensorboard
#torch>=1.6.0
#torchaudio
$ python3 -m pip install -e .
run the LibriSpeech recipe:
$ ./run.sh --stage 1 --stop_stage 5 ## no error
$ ./run.sh --stage 6 its error is as follows:
2021-01-12 17:42:56,883 INFO [mmi_bigram_train.py:400] epoch 0, learning rate 0.001
[F] [F] [F] [F] [F] [F] [F] [F] [F] [F] [F] [F] [F] [F] [F] [F] [F] /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 /home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:lambda [](int)->void::operator()(int)->void:722 block:[0,0,0], thread: [37,0,0] block:[0,0,0], thread: [38,0,0] block:[0,0,0], thread: [39,0,0] block:[0,0,0], thread: [40,0,0] block:[0,0,0], thread: [41,0,0] block:[0,0,0], thread: [42,0,0] block:[0,0,0], thread: [43,0,0] block:[0,0,0], thread: [44,0,0] block:[0,0,0], thread: [45,0,0] block:[0,0,0], thread: [46,0,0] block:[0,0,0], thread: [47,0,0] block:[0,0,0], thread: [49,0,0] block:[0,0,0], thread: [50,0,0] block:[0,0,0], thread: [51,0,0] block:[0,0,0], thread: [52,0,0] block:[0,0,0], thread: [56,0,0] block:[0,0,0], thread: [57,0,0] Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: Check failed: tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [37,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [38,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [39,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [40,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [41,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [42,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [43,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [44,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [45,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [46,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [47,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [49,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [50,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [51,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [52,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [56,0,0] Assertion `Some bad things happened` failed.
/home4/md510/w2020/k2-fsa/k2/k2/csrc/intersect_dense.cu:722: lambda [](int)->void::operator()(int)->void: block: [0,0,0], thread: [57,0,0] Assertion `Some bad things happened` failed.
[F] /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]:280 Check failed: ret == cudaSuccess (710 vs. 0) Error: device-side assert triggered.
[ Stack-Trace: ]
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x34) [0x2aaccdcc1904]
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/libk2context.so(k2::internal::Logger::~Logger()+0x28) [0x2aaccaaf4108]
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/libk2context.so(k2::Array1<int>::operator[](int) const+0x1929) [0x2aaccaaf5d89]
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/libk2context.so(k2::Renumbering::ComputeOld2New()+0x13a) [0x2aaccaaf160a]
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/libk2context.so(k2::Renumbering::ComputeNew2Old()+0x5e0) [0x2aaccaaf2640]
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/libk2context.so(k2::MultiGraphDenseIntersect::FormatOutput(k2::Array1<int>*, k2::Array1<int>*)+0x13dc) [0x2aaccabf44bc]
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/libk2context.so(k2::IntersectDense(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float, k2::Ragged<k2::Arc>*, k2::Array1<int>*, k2::Array1<int>*)+0x364) [0x2aaccabe6ef4]
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/_k2.cpython-37m-x86_64-linux-gnu.so(+0x51f23) [0x2aacc742df23]
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/_k2.cpython-37m-x86_64-linux-gnu.so(+0x1a3a3) [0x2aacc73f63a3]
python3(_PyMethodDef_RawFastCallKeywords+0x316) [0x5555556b99b6]
python3(_PyCFunction_FastCallKeywords+0x21) [0x5555556b9a31]
python3(_PyEval_EvalFrameDefault+0x53e3) [0x555555726483]
python3(_PyFunction_FastCallDict+0x10b) [0x55555566985b]
/home4/md510/anaconda3/envs/k2-fsa1/lib/python3.7/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x93d) [0x2aaab378fa6d]
python3(_PyMethodDef_RawFastCallKeywords+0x1e4) [0x5555556b9884]
python3(_PyCFunction_FastCallKeywords+0x21) [0x5555556b9a31]
python3(_PyEval_EvalFrameDefault+0x4e1d) [0x555555725ebd]
python3(_PyFunction_FastCallKeywords+0xfb) [0x5555556b8e7b]
python3(_PyEval_EvalFrameDefault+0x4a89) [0x555555725b29]
python3(_PyEval_EvalCodeWithName+0xc30) [0x555555669160]
python3(_PyFunction_FastCallKeywords+0x387) [0x5555556b9107]
python3(_PyEval_EvalFrameDefault+0x416) [0x5555557214b6]
python3(_PyEval_EvalCodeWithName+0x2f9) [0x555555668829]
python3(_PyFunction_FastCallKeywords+0x387) [0x5555556b9107]
python3(_PyEval_EvalFrameDefault+0x14e5) [0x555555722585]
python3(_PyFunction_FastCallKeywords+0xfb) [0x5555556b8e7b]
python3(_PyEval_EvalFrameDefault+0x416) [0x5555557214b6]
python3(_PyEval_EvalCodeWithName+0x2f9) [0x555555668829]
python3(PyEval_EvalCodeEx+0x44) [0x555555669714]
python3(PyEval_EvalCode+0x1c) [0x55555566973c]
python3(+0x22cf14) [0x555555780f14]
python3(PyRun_FileExFlags+0xa1) [0x55555578b331]
python3(PyRun_SimpleFileExFlags+0x1c3) [0x55555578b523]
python3(+0x238655) [0x55555578c655]
python3(_Py_UnixMain+0x3c) [0x55555578c77c]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaaaf0d555]
python3(+0x1dcff0) [0x555555730ff0]
Aborted
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4EEAINY7P5USCZK23SZQLDXANCNFSM4VUIPSPQ>
.
|
Yes. try to do it again. it is no error.
|
Try `pip uninstall`ing the package and reinstalling..
Otherwise I'm not sure, it would require debugging by modifying code,
possibly.
…On Tue, Jan 12, 2021 at 6:41 PM shanguanma ***@***.***> wrote:
Yes. try to do it again. it is no error.
***@***.*** k2]$ cd build/
***@***.*** build]$ ctest
Test project /home4/md510/w2020/k2-fsa/k2/build
Start 1: Test.Cuda.cu_algorithms_test
1/75 Test #1: Test.Cuda.cu_algorithms_test ....... Passed 6.42 sec
Start 2: Test.Cuda.cu_array_ops_test
2/75 Test #2: Test.Cuda.cu_array_ops_test ........ Passed 8.96 sec
Start 3: Test.Cuda.cu_array_test
3/75 Test #3: Test.Cuda.cu_array_test ............ Passed 6.39 sec
Start 4: Test.Cuda.cu_fsa_algo_test
4/75 Test #4: Test.Cuda.cu_fsa_algo_test ......... Passed 8.75 sec
Start 5: Test.Cuda.cu_fsa_test
5/75 Test #5: Test.Cuda.cu_fsa_test .............. Passed 6.53 sec
Start 6: Test.Cuda.cu_fsa_utils_test
6/75 Test #6: Test.Cuda.cu_fsa_utils_test ........ Passed 6.84 sec
Start 7: Test.Cuda.cu_hash_test
7/75 Test #7: Test.Cuda.cu_hash_test ............. Passed 6.75 sec
Start 8: Test.Cuda.cu_host_shim_test
8/75 Test #8: Test.Cuda.cu_host_shim_test ........ Passed 0.19 sec
Start 9: Test.Cuda.cu_intersect_test
9/75 Test #9: Test.Cuda.cu_intersect_test ........ Passed 7.11 sec
Start 10: Test.Cuda.cu_log_test
10/75 Test #10: Test.Cuda.cu_log_test .............. Passed 6.42 sec
Start 11: Test.Cuda.cu_macros_test
11/75 Test #11: Test.Cuda.cu_macros_test ........... Passed 6.32 sec
Start 12: Test.Cuda.cu_nvtx_test
12/75 Test #12: Test.Cuda.cu_nvtx_test ............. Passed 4.21 sec
Start 13: Test.Cuda.cu_pinned_context_test
13/75 Test #13: Test.Cuda.cu_pinned_context_test ... Passed 40.72 sec
Start 14: Test.Cuda.cu_ragged_shape_test
14/75 Test #14: Test.Cuda.cu_ragged_shape_test ..... Passed 6.40 sec
Start 15: Test.Cuda.cu_ragged_test
15/75 Test #15: Test.Cuda.cu_ragged_test ........... Passed 7.07 sec
Start 16: Test.Cuda.cu_ragged_utils_test
16/75 Test #16: Test.Cuda.cu_ragged_utils_test ..... Passed 6.32 sec
Start 17: Test.Cuda.cu_rm_epsilon_test
17/75 Test #17: Test.Cuda.cu_rm_epsilon_test ....... Passed 7.27 sec
Start 18: Test.Cuda.cu_tensor_ops_test
18/75 Test #18: Test.Cuda.cu_tensor_ops_test ....... Passed 6.71 sec
Start 19: Test.Cuda.cu_tensor_test
19/75 Test #19: Test.Cuda.cu_tensor_test ........... Passed 0.19 sec
Start 20: Test.Cuda.cu_thread_pool_test
20/75 Test #20: Test.Cuda.cu_thread_pool_test ...... Passed 0.28 sec
Start 21: Test.Cuda.cu_top_sort_test
21/75 Test #21: Test.Cuda.cu_top_sort_test ......... Passed 8.10 sec
Start 22: Test.Cuda.cu_utils_test
22/75 Test #22: Test.Cuda.cu_utils_test ............ Passed 6.78 sec
Start 23: Test.arcsort_test
23/75 Test #23: Test.arcsort_test .................. Passed 0.01 sec
Start 24: Test.array_test
24/75 Test #24: Test.array_test .................... Passed 0.01 sec
Start 25: Test.aux_labels_test
25/75 Test #25: Test.aux_labels_test ............... Passed 0.01 sec
Start 26: Test.connect_test
26/75 Test #26: Test.connect_test .................. Passed 0.01 sec
Start 27: Test.determinize_test
27/75 Test #27: Test.determinize_test .............. Passed 0.02 sec
Start 28: Test.fsa_equivalent_test
28/75 Test #28: Test.fsa_equivalent_test ........... Passed 0.01 sec
Start 29: Test.fsa_renderer_test
29/75 Test #29: Test.fsa_renderer_test ............. Passed 0.01 sec
Start 30: Test.fsa_test
30/75 Test #30: Test.fsa_test ...................... Passed 0.01 sec
Start 31: Test.fsa_util_test
31/75 Test #31: Test.fsa_util_test ................. Passed 0.01 sec
Start 32: Test.intersect_test
32/75 Test #32: Test.intersect_test ................ Passed 0.01 sec
Start 33: Test.properties_test
33/75 Test #33: Test.properties_test ............... Passed 0.01 sec
Start 34: Test.rmepsilon_test
34/75 Test #34: Test.rmepsilon_test ................ Passed 0.01 sec
Start 35: Test.topsort_test
35/75 Test #35: Test.topsort_test .................. Passed 0.01 sec
Start 36: Test.weights_test
36/75 Test #36: Test.weights_test .................. Passed 0.01 sec
Start 37: add_epsilon_self_loops_test_py
37/75 Test #37: add_epsilon_self_loops_test_py ..... Passed 1.07 sec
Start 38: arc_sort_test_py
38/75 Test #38: arc_sort_test_py ................... Passed 0.68 sec
Start 39: closure_test_py
39/75 Test #39: closure_test_py .................... Passed 7.34 sec
Start 40: compose_test_py
40/75 Test #40: compose_test_py .................... Passed 0.74 sec
Start 41: connect_test_py
41/75 Test #41: connect_test_py .................... Passed 0.79 sec
Start 42: ctc_gradients_test_py
42/75 Test #42: ctc_gradients_test_py .............. Passed 8.10 sec
Start 43: dense_fsa_vec_test_py
43/75 Test #43: dense_fsa_vec_test_py .............. Passed 6.63 sec
Start 44: determinize_test_py
44/75 Test #44: determinize_test_py ................ Passed 0.73 sec
Start 45: fsa_test_py
45/75 Test #45: fsa_test_py ........................ Passed 7.19 sec
Start 46: get_tot_scores_test_py
46/75 Test #46: get_tot_scores_test_py ............. Passed 6.39 sec
Start 47: index_add_test_py
47/75 Test #47: index_add_test_py .................. Passed 7.25 sec
Start 48: index_select_test_py
48/75 Test #48: index_select_test_py ............... Passed 7.22 sec
Start 49: index_test_py
49/75 Test #49: index_test_py ...................... Passed 7.26 sec
Start 50: intersect_dense_pruned_test_py
50/75 Test #50: intersect_dense_pruned_test_py ..... Passed 6.69 sec
Start 51: intersect_dense_test_py
51/75 Test #51: intersect_dense_test_py ............ Passed 6.80 sec
Start 52: intersect_test_py
52/75 Test #52: intersect_test_py .................. Passed 0.74 sec
Start 53: invert_test_py
53/75 Test #53: invert_test_py ..................... Passed 0.67 sec
Start 54: linear_fsa_test_py
54/75 Test #54: linear_fsa_test_py ................. Passed 0.66 sec
Start 55: numerical_gradient_check_test_py
55/75 Test #55: numerical_gradient_check_test_py ... Passed 10.05 sec
Start 56: ragged_ops_test_py
56/75 Test #56: ragged_ops_test_py ................. Passed 0.79 sec
Start 57: ragged_shape_test_py
57/75 Test #57: ragged_shape_test_py ............... Passed 6.92 sec
Start 58: ragged_test_py
58/75 Test #58: ragged_test_py ..................... Passed 0.66 sec
Start 59: remove_epsilon_test_py
59/75 Test #59: remove_epsilon_test_py ............. Passed 0.66 sec
Start 60: shortest_path_test_py
60/75 Test #60: shortest_path_test_py .............. Passed 0.74 sec
Start 61: symbol_table_test_py
61/75 Test #61: symbol_table_test_py ............... Passed 0.73 sec
Start 62: top_sort_test_py
62/75 Test #62: top_sort_test_py ................... Passed 0.68 sec
Start 63: union_test_py
63/75 Test #63: union_test_py ...................... Passed 6.74 sec
Start 64: host_arcsort_test_py
64/75 Test #64: host_arcsort_test_py ............... Passed 0.68 sec
Start 65: host_array_test_py
65/75 Test #65: host_array_test_py ................. Passed 0.70 sec
Start 66: host_aux_labels_test_py
66/75 Test #66: host_aux_labels_test_py ............ Passed 0.68 sec
Start 67: host_connect_test_py
67/75 Test #67: host_connect_test_py ............... Passed 0.67 sec
Start 68: host_determinize_test_py
68/75 Test #68: host_determinize_test_py ........... Passed 0.63 sec
Start 69: host_fsa_equivalent_test_py
69/75 Test #69: host_fsa_equivalent_test_py ........ Passed 0.69 sec
Start 70: host_fsa_test_py
70/75 Test #70: host_fsa_test_py ................... Passed 0.68 sec
Start 71: host_intersect_test_py
71/75 Test #71: host_intersect_test_py ............. Passed 0.65 sec
Start 72: host_properties_test_py
72/75 Test #72: host_properties_test_py ............ Passed 0.65 sec
Start 73: host_rmepsilon_test_py
73/75 Test #73: host_rmepsilon_test_py ............. Passed 0.62 sec
Start 74: host_topsort_test_py
74/75 Test #74: host_topsort_test_py ............... Passed 0.71 sec
Start 75: host_weights_test_py
75/75 Test #75: host_weights_test_py ............... Passed 0.71 sec
100% tests passed, 0 tests failed out of 75
Total Test time (real) = 278.15 sec
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO6S4NZW6335GGPHG23SZQROHANCNFSM4VUIPSPQ>
.
|
Also this could result from over-aggressive compiler optimization. It is checking that -inf == -inf, probably. Sometimes comparisons involving infinity can be optimized out, e.g. if the compiler assumes that fabs(a-b) should be zero if a==b. |
Your means that let me to pip uninstall pytorch , Torchaudio, and reinstall k2? ok,I will to do it again. |
No I meant ununistall just k2.
the build.yml is just for github actions.
…On Tue, Jan 12, 2021 at 8:16 PM shanguanma ***@***.***> wrote:
Your means that let me to pip uninstall pytorch , Torchaudio, and
reinstall k2? ok,I will to do it again.
While I found that
https://github.com/k2-fsa/k2/blob/master/.github/workflows/build.yml#L25,
k2 build environment is only ubuntu16.04 ubuntu18.04, but my system os of
computer server cluster is centos 7.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOYD77J2K5AVSKLG6D3SZQ4RFANCNFSM4VUIPSPQ>
.
|
OK, I have reinstall k2 via below command as your suggestion:
compile processing and install processing are no error, when I run
|
Make sure your k2 codebase is reasonably up to date and that the file time
of /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_
k2.cpython-38-x86_64-linux-gnu.so is recent.
Also show `nvidia-smi` output.
May be build problem.
…On Tue, Jan 12, 2021 at 11:05 PM shanguanma ***@***.***> wrote:
OK, I have reinstall k2 via below command as your suggestion:
$ conda create -n k2-fsa2 python=3.8
$ conda activate k2-fsa2
$ conda install pytorch torchaudio cudatoolkit=10.2 -c pytorch
$ git clone https://github.com/k2-fsa/k2.git
$ cd k2
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug ..
$ make
$ python3 -m pip install --no-deps --force-reinstall graphviz
$ ctest
$ cd ..
$ pip3 install wheel twine
$ ./scripts/build_pip.sh
$ python3 -m pip install --no-deps --force-reinstall dist/k2-*.whl
install snowfall
$ git clone https://github.com/k2-fsa/snowfall.git
$ cd snowfall
$ python3 -m pip install -e .
compile processing and install processing are no error, when I run gdb
--args python3 mmi_bigram_train.py
It gives an error and it isn't same to previous error:
***@***.*** simple_v1]$ gdb --args python3 mmi_bigram_train.py
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-119.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home4/md510/anaconda3/envs/k2-fsa2/bin/python3.8...done.
(gdb) r
Starting program: /home4/md510/anaconda3/envs/k2-fsa2/bin/python3 mmi_bigram_train.py
warning: Unable to open "librpm.so.3" (/home4/md510/anaconda3/lib/liblzma.so.5: version `XZ_5.1.2alpha' not found (required by /lib64/librpmio.so.3)), missing debuginfos notifications will not be displayed
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/27/ffd1fbc69569c776e666474eed723395e6d727.debug
Missing separate debuginfo for /lib64/libpthread.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/2b/482b3bae79def4e5bc9791bc6bbdae0e93e359.debug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /lib64/libc.so.6
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d7/8066a9c36f5fd63e2f6ac851ae3515c4c9792a.debug
Missing separate debuginfo for /lib64/libdl.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/f2/c36986e11a291a0d4bcb3a81632b24ae2359ea.debug
Missing separate debuginfo for /lib64/libutil.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/15/86cefa927d26f144de15389f28c1cbf04c81ef.debug
Missing separate debuginfo for /lib64/librt.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/cc/d4be566dd5a8fc7fa62b224c14b698f51b0d0d.debug
Missing separate debuginfo for /lib64/libm.so.6
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/08/5d924f5d23b9f15a8ad28b7231ee93c09e13f1.debug
[Detaching after fork from child process 46736]
Missing separate debuginfo for /lib64/libcuda.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/3a587b4d79216ae274467480fa10f2c44ed2d0.debug
[Detaching after fork from child process 46744]
Missing separate debuginfo for /lib64/libsndfile.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/bf/637fda83ef4f46cd3e5c172031e926dac51faa.debug
Missing separate debuginfo for /lib64/libgsm.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/8c2bd826e5837d3cee7c5cee8ed85827a90d5c.debug
Missing separate debuginfo for /lib64/libFLAC.so.8
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d1/9584153c0799926a60973fb77de214161e7072.debug
Missing separate debuginfo for /lib64/libvorbisenc.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/e5/4da1382c034ef216379710265df600eb741e6d.debug
Missing separate debuginfo for /lib64/libvorbis.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/75/48d115412cc33bf67c1598e446c70daa1b7461.debug
Missing separate debuginfo for /lib64/libogg.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/6c/77e88fb8736ffe5770b2e96ee60c8a6460d782.debug
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch/audio#903 for the detail.
warnings.warn(
[New Thread 0x2aab3309b700 (LWP 46745)]
2021-01-12 22:54:24,746 INFO [mmi_bigram_train.py:310] Loading L.fst
2021-01-12 22:54:25,032 INFO [mmi_bigram_train.py:328] About to get train cuts
2021-01-12 22:54:30,810 INFO [mmi_bigram_train.py:330] About to get dev cuts
2021-01-12 22:54:30,903 INFO [mmi_bigram_train.py:333] About to create train dataset
2021-01-12 22:54:31,388 INFO [mmi_bigram_train.py:337] About to create dev dataset
2021-01-12 22:54:31,409 INFO [mmi_bigram_train.py:341] About to create train dataloader
2021-01-12 22:54:31,409 INFO [mmi_bigram_train.py:343] About to create dev dataloader
[New Thread 0x2aab451f3700 (LWP 46754)]
2021-01-12 22:54:31,441 INFO [mmi_bigram_train.py:350] About to create model
[New Thread 0x2aab453f4700 (LWP 46755)]
[New Thread 0x2aab455f5700 (LWP 46756)]
================================================================================
Model parameters summary:
================================================================================
* P_scores: 7568
* tdnn.0.weight: 60000
* tdnn.0.bias: 500
* tdnn.3.weight: 750000
* tdnn.3.bias: 500
* tdnn.6.weight: 750000
* tdnn.6.bias: 500
* lstms.0.weight_ih_l0: 1000000
* lstms.0.weight_hh_l0: 1000000
* lstms.0.bias_ih_l0: 2000
* lstms.0.bias_hh_l0: 2000
* lstms.1.weight_ih_l0: 1000000
* lstms.1.weight_hh_l0: 1000000
* lstms.1.bias_ih_l0: 2000
* lstms.1.bias_hh_l0: 2000
* lstms.2.weight_ih_l0: 1000000
* lstms.2.weight_hh_l0: 1000000
* lstms.2.bias_ih_l0: 2000
* lstms.2.bias_hh_l0: 2000
* lstms.3.weight_ih_l0: 1000000
* lstms.3.weight_hh_l0: 1000000
* lstms.3.bias_ih_l0: 2000
* lstms.3.bias_hh_l0: 2000
* lstms.4.weight_ih_l0: 1000000
* lstms.4.weight_hh_l0: 1000000
* lstms.4.bias_ih_l0: 2000
* lstms.4.bias_hh_l0: 2000
* linear.weight: 43500
* linear.bias: 87
================================================================================
Total: 11632655
================================================================================
2021-01-12 22:54:38,940 INFO [mmi_bigram_train.py:400] epoch 0, learning rate 0.001
[Detaching after fork from child process 46807]
[Detaching after fork from child process 46808]
[Detaching after fork from child process 46809]
[Detaching after fork from child process 46810]
[New Thread 0x2aab45a08700 (LWP 46811)]
[New Thread 0x2aab45c09700 (LWP 46812)]
[New Thread 0x2aab45e0a700 (LWP 46813)]
[New Thread 0x2aab48200700 (LWP 46814)]
[F] /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged.cu:bool k2::RaggedShape::Validate(bool) const:385 Problem validating row-ids: for layers_[0], row_splits = [ 0 1 3 5 9 13 15 17 20 22 25 27 29 34 39 41 43 48 53 58 60 63 65 68 71 73 76 79 81 84 87 89 91 100 102 109 111 113 115 117 119 122 124 126 129 131 134 136 139 141 144 146 149 151 154 156 159 161 164 166 169 172 174 179 181 184 186 189 191 193 196 198 201 204 206 211 ....here I ignore some number, because it contain many numbers
077 35077 35077 35077 ], see index 96409 of row_ids, whose dim is 101526
[ Stack-Trace: ]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::internal::Logger::~Logger()+0x2e) [0x2aab2cf365ee]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape::Validate(bool) const+0xe8a) [0x2aab2d083846]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape::Check()+0x1e) [0x2aab2cfdba5e]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape::RaggedShape(std::vector<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> > const&, bool)+0x57) [0x2aab2cfdba1b]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int)+0x59a) [0x2aab2d08ec52]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int, k2::Array1<int>*, k2::Array1<int>*, int)+0x27a) [0x2aab2d08f86c]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::GetIncomingArcs(k2::Ragged<k2::Arc>&, k2::Array1<int> const&)+0x38b) [0x2aab2cfc7398]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::MultiGraphDenseIntersect::MultiGraphDenseIntersect(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float)+0x551) [0x2aab2d040b2b]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::IntersectDense(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float, k2::Ragged<k2::Arc>*, k2::Array1<int>*, k2::Array1<int>*)+0x91) [0x2aab2d03b65e]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb356e) [0x2aab296be56e]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbc772) [0x2aab296c7772]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbb9b0) [0x2aab296c69b0]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb99d5) [0x2aab296c49d5]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb9a5f) [0x2aab296c4a5f]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0x48c20) [0x2aab29653c20]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0x56) [0x5555556d3f76]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x11d0) [0x555555715b90]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyVectorcall_Call+0x71) [0x555555691041]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x93d) [0x2aaacd9aa98d]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0xdb) [0x5555556d3ffb]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x4596) [0x555555718f56]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10077f) [0x55555565477f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x7df) [0x5555556def9f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10011a) [0x55555565411a]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCodeEx+0x44) [0x5555556df754]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCode+0x1c) [0x55555576dedc]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x219f84) [0x55555576df84]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x24c1f4) [0x5555557a01f4]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_FileExFlags+0xa1) [0x5555556686e1]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_SimpleFileExFlags+0x3b4) [0x555555668ac6]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x11598b) [0x55555566998b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(Py_BytesMain+0x39) [0x5555557a2d19]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaaaf0d555]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x1dee93) [0x555555732e93]
Program received signal SIGABRT, Aborted.
0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
(gdb)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4J3RZHU22CJMY7GVDSZRQLHANCNFSM4VUIPSPQ>
.
|
try running it in gdb and showing the whole stack trace with line numbers.
gdb python3 train.py
(gdb) r
...
…On Tue, Jan 12, 2021 at 11:22 PM shanguanma ***@***.***> wrote:
Yes, k2 codebase is from latest master branch. This is build file just now.
***@***.*** k2]$ ls dist/ -larth
total 54M
drwxr-xr-x 12 md510 users 4.0K Jan 12 22:47 ..
drwxr-xr-x 2 md510 users 4.0K Jan 12 22:47 .
-rw-r--r-- 1 md510 users 54M Jan 12 22:47 k2-0.1.3+cu102.dev20210112-cp38-cp38-linux_x86_64.whl
***@***.*** k2]$ ls /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so -larth
-rwxr-xr-x 1 md510 users 34M Jan 12 22:48 /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so
Also show nvidia-smi output.
May be build problem.
***@***.*** k2]$ nvidia-smi
Tue Jan 12 23:17:50 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 8000 On | 00000000:1D:00.0 Off | 0 |
| 33% 49C P2 158W / 260W | 3835MiB / 45553MiB | 49% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 8000 On | 00000000:1E:00.0 Off | 0 |
| 33% 55C P2 110W / 260W | 4013MiB / 45553MiB | 56% Default |
+-------------------------------+----------------------+----------------------+
| 2 Quadro RTX 8000 On | 00000000:20:00.0 Off | 0 |
| 33% 46C P2 136W / 260W | 3721MiB / 45553MiB | 41% Default |
+-------------------------------+----------------------+----------------------+
| 3 Quadro RTX 8000 On | 00000000:21:00.0 Off | 0 |
| 40% 64C P2 260W / 260W | 32389MiB / 45553MiB | 91% Default |
+-------------------------------+----------------------+----------------------+
| 4 Quadro RTX 8000 On | 00000000:24:00.0 Off | 0 |
| 40% 64C P2 226W / 260W | 22959MiB / 45553MiB | 84% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 164572 C ...t/tools/venv/envs/md_espnet/bin/python3 3823MiB |
| 1 164573 C ...t/tools/venv/envs/md_espnet/bin/python3 4001MiB |
| 2 164574 C ...t/tools/venv/envs/md_espnet/bin/python3 3709MiB |
| 3 56223 C nnet3-chain-train 22947MiB |
| 4 56905 C nnet3-chain-train 22947MiB |
+-----------------------------------------------------------------------------+
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO5CAVS6HJLX7QN2WTLSZRSMDANCNFSM4VUIPSPQ>
.
|
|
Do the same after doing
export K2_SYNC_KERNELS=1
.. wanna see if the error was the first one.
…On Tue, Jan 12, 2021 at 11:49 PM shanguanma ***@***.***> wrote:
***@***.*** simple_v1]$ gdb --args python3 mmi_bigram_train.py
(gdb) r
Starting program: /home4/md510/anaconda3/envs/k2-fsa2/bin/python3 mmi_bigram_train.py
warning: Unable to open "librpm.so.3" (/home4/md510/anaconda3/lib/liblzma.so.5: version `XZ_5.1.2alpha' not found (required by /lib64/librpmio.so.3)), missing debuginfos notifications will not be displayed
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/27/ffd1fbc69569c776e666474eed723395e6d727.debug
Missing separate debuginfo for /lib64/libpthread.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/2b/482b3bae79def4e5bc9791bc6bbdae0e93e359.debug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /lib64/libc.so.6
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d7/8066a9c36f5fd63e2f6ac851ae3515c4c9792a.debug
Missing separate debuginfo for /lib64/libdl.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/f2/c36986e11a291a0d4bcb3a81632b24ae2359ea.debug
Missing separate debuginfo for /lib64/libutil.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/15/86cefa927d26f144de15389f28c1cbf04c81ef.debug
Missing separate debuginfo for /lib64/librt.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/cc/d4be566dd5a8fc7fa62b224c14b698f51b0d0d.debug
Missing separate debuginfo for /lib64/libm.so.6
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/08/5d924f5d23b9f15a8ad28b7231ee93c09e13f1.debug
[Detaching after fork from child process 66884]
Missing separate debuginfo for /lib64/libcuda.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/3a587b4d79216ae274467480fa10f2c44ed2d0.debug
[Detaching after fork from child process 66894]
Missing separate debuginfo for /lib64/libsndfile.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/bf/637fda83ef4f46cd3e5c172031e926dac51faa.debug
Missing separate debuginfo for /lib64/libgsm.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/8c2bd826e5837d3cee7c5cee8ed85827a90d5c.debug
Missing separate debuginfo for /lib64/libFLAC.so.8
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d1/9584153c0799926a60973fb77de214161e7072.debug
Missing separate debuginfo for /lib64/libvorbisenc.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/e5/4da1382c034ef216379710265df600eb741e6d.debug
Missing separate debuginfo for /lib64/libvorbis.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/75/48d115412cc33bf67c1598e446c70daa1b7461.debug
Missing separate debuginfo for /lib64/libogg.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/6c/77e88fb8736ffe5770b2e96ee60c8a6460d782.debug
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch/audio#903 for the detail.
warnings.warn(
[New Thread 0x2aab3309b700 (LWP 66896)]
2021-01-12 23:40:11,250 INFO [mmi_bigram_train.py:310] Loading L.fst
2021-01-12 23:40:11,533 INFO [mmi_bigram_train.py:328] About to get train cuts
2021-01-12 23:40:17,630 INFO [mmi_bigram_train.py:330] About to get dev cuts
2021-01-12 23:40:17,727 INFO [mmi_bigram_train.py:333] About to create train dataset
2021-01-12 23:40:18,201 INFO [mmi_bigram_train.py:337] About to create dev dataset
2021-01-12 23:40:18,223 INFO [mmi_bigram_train.py:341] About to create train dataloader
2021-01-12 23:40:18,223 INFO [mmi_bigram_train.py:343] About to create dev dataloader
[New Thread 0x2aab451f3700 (LWP 66931)]
2021-01-12 23:40:18,276 INFO [mmi_bigram_train.py:350] About to create model
[New Thread 0x2aab453f4700 (LWP 66933)]
[New Thread 0x2aab455f5700 (LWP 66934)]
================================================================================
Model parameters summary:
================================================================================
* P_scores: 7568
* tdnn.0.weight: 60000
* tdnn.0.bias: 500
* tdnn.3.weight: 750000
* tdnn.3.bias: 500
* tdnn.6.weight: 750000
* tdnn.6.bias: 500
* lstms.0.weight_ih_l0: 1000000
* lstms.0.weight_hh_l0: 1000000
* lstms.0.bias_ih_l0: 2000
* lstms.0.bias_hh_l0: 2000
* lstms.1.weight_ih_l0: 1000000
* lstms.1.weight_hh_l0: 1000000
* lstms.1.bias_ih_l0: 2000
* lstms.1.bias_hh_l0: 2000
* lstms.2.weight_ih_l0: 1000000
* lstms.2.weight_hh_l0: 1000000
* lstms.2.bias_ih_l0: 2000
* lstms.2.bias_hh_l0: 2000
* lstms.3.weight_ih_l0: 1000000
* lstms.3.weight_hh_l0: 1000000
* lstms.3.bias_ih_l0: 2000
* lstms.3.bias_hh_l0: 2000
* lstms.4.weight_ih_l0: 1000000
* lstms.4.weight_hh_l0: 1000000
* lstms.4.bias_ih_l0: 2000
* lstms.4.bias_hh_l0: 2000
* linear.weight: 43500
* linear.bias: 87
================================================================================
Total: 11632655
================================================================================
2021-01-12 23:40:21,868 INFO [mmi_bigram_train.py:400] epoch 0, learning rate 0.001
[Detaching after fork from child process 66939]
[Detaching after fork from child process 66940]
[Detaching after fork from child process 66941]
[Detaching after fork from child process 66942]
[New Thread 0x2aab45a08700 (LWP 66943)]
[New Thread 0x2aab45c09700 (LWP 66944)]
[New Thread 0x2aab45e0a700 (LWP 66945)]
[New Thread 0x2aab48200700 (LWP 66946)]
[F] /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]:280 Check failed: ret == cudaSuccess (700 vs. 0) Error: an illegal memory access was encountered.
[ Stack-Trace: ]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::internal::Logger::~Logger()+0x2e) [0x2aab2cf365ee]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::Array1<int>::operator[](int) const+0x56c) [0x2aab2cf3ad80]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::Array1<int>::Back() const+0x130) [0x2aab2cf385a0]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int)+0x27f) [0x2aab2d08e937]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int, k2::Array1<int>*, k2::Array1<int>*, int)+0x70) [0x2aab2d08f662]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::GetIncomingArcs(k2::Ragged<k2::Arc>&, k2::Array1<int> const&)+0x38b) [0x2aab2cfc7398]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::MultiGraphDenseIntersect::MultiGraphDenseIntersect(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float)+0x551) [0x2aab2d040b2b]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::IntersectDense(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float, k2::Ragged<k2::Arc>*, k2::Array1<int>*, k2::Array1<int>*)+0x91) [0x2aab2d03b65e]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb356e) [0x2aab296be56e]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbc772) [0x2aab296c7772]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbb9b0) [0x2aab296c69b0]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb99d5) [0x2aab296c49d5]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb9a5f) [0x2aab296c4a5f]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0x48c20) [0x2aab29653c20]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0x56) [0x5555556d3f76]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x11d0) [0x555555715b90]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyVectorcall_Call+0x71) [0x555555691041]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x93d) [0x2aaacd9aa98d]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0xdb) [0x5555556d3ffb]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x4596) [0x555555718f56]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10077f) [0x55555565477f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x7df) [0x5555556def9f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10011a) [0x55555565411a]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCodeEx+0x44) [0x5555556df754]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCode+0x1c) [0x55555576dedc]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x219f84) [0x55555576df84]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x24c1f4) [0x5555557a01f4]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_FileExFlags+0xa1) [0x5555556686e1]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_SimpleFileExFlags+0x3b4) [0x555555668ac6]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x11598b) [0x55555566998b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(Py_BytesMain+0x39) [0x5555557a2d19]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaaaf0d555]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x1dee93) [0x555555732e93]
Program received signal SIGABRT, Aborted.
0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
(gdb) bt full
#0 0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00002aaaaaf22a78 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00002aab2cf36630 in k2::internal::Logger::~Logger (this=0x7fffffffb340, __in_chrg=<optimized out>) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/log.h:149
stack_trace = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0x5555c7e0dee8 "[ Stack-Trace: ]\n/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]\n/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/sit"...}}
#3 0x00002aab2cf3ad80 in k2::Array1<int>::operator[] (this=0x7fffffffb680, i=64) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:280
ans = 21845
ret = cudaErrorIllegalAddress
__PRETTY_FUNCTION__ = "T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]"
k2_nvtx_6 = {<No data fields>}
data = 0x2aabaae45100
type = k2::kCuda
#4 0x00002aab2cf385a0 in k2::Array1<int>::Back (this=0x7fffffffb680) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:289
__PRETTY_FUNCTION__ = "T k2::Array1<T>::Back() const [with T = int]"
#5 0x00002aab2d08e937 in k2::RaggedShape2 (row_splits=0x7fffffffb680, row_ids=0x7fffffffb6a0, cached_tot_size=35078) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu:112
k2_nvtx_65 = {<No data fields>}
__PRETTY_FUNCTION__ = "k2::RaggedShape k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int32_t)"
ctx = {<std::__shared_ptr<k2::Context, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Context, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c69b9920, _M_refcount = {_M_pi = 0x5555c69b9910}}, <No data fields>}
axes = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
_M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
_M_start = 0x5555c69c4e38, _M_finish = 0x7fffffffb498, _M_end_of_storage = 0xffffffffffffb460}}, <No data fields>}
#6 0x00002aab2d08f662 in k2::RaggedShape3 (row_splits1=0x7fffffffb680, row_ids1=0x7fffffffb6a0, cached_tot_size1=35078, row_splits2=0x7fffffffb6c0, row_ids2=0x7fffffffb6e0,
cached_tot_size2=101526) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu:193
k2_nvtx_68 = {<No data fields>}
__PRETTY_FUNCTION__ = "k2::RaggedShape k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int32_t, k2::Array1<int>*, k2::Array1<int>*, int32_t)"
shape1 = {layers_ = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
_M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
_M_start = 0x5555c69bd278, _M_finish = 0x7fffffffb5b8, _M_end_of_storage = 0x2aab29689143
<__gnu_cxx::__atomic_add_dispatch(_Atomic_word*, int)+46>}}, <No data fields>}}
temp_array = {dim_ = -962881248, byte_offset_ = 140737488337984,
region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fffffffb5a0, _M_refcount = {_M_pi = 0x12cf6eaa2}}, <No data fields>}}
#7 0x00002aab2cfc7398 in k2::GetIncomingArcs (fsas=..., dest_states=...) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/fsa_utils.cu:837
k2_nvtx_76 = {<No data fields>}
__PRETTY_FUNCTION__ = "k2::Ragged<int> k2::GetIncomingArcs(k2::FsaVec&, const k2::Array1<int>&)"
c = @0x5555c8017fa0: {<std::__shared_ptr<k2::Context, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Context, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c69b9920, _M_refcount = {_M_pi = 0x5555c69b9910}}, <No data fields>}
dest_states_tensor = {shape = {layers_ = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
_M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
_M_start = 0x5555c8014070, _M_finish = 0x5555c8014100, _M_end_of_storage = 0x5555c8014100}}, <No data fields>}}, values = {dim_ = 101526, byte_offset_ = 0,
region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c8056db0, _M_refcount = {_M_pi = 0x5555c8056da0}}, <No data fields>}}}
num_fsas = 64
num_states = 35078
num_arcs = 101526
incoming_arcs_order = {dim_ = 101526, byte_offset_ = 0,
region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c7fc3b10, _M_refcount = {_M_pi = 0x5555c7fc3b00}}, <No data fields>}}
ans_row_ids2 = {dim_ = 101526, byte_offset_ = 0,
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO7CLNFHITUTT3VAZHLSZRVQLANCNFSM4VUIPSPQ>
.
|
.. it could be a bug in GetTransposeReordering() which is called by
GetIncomingArcs().
If anyone has time to suggest what debug code to add, to verify the output
of that, it might be good. getting late for me.
…On Wed, Jan 13, 2021 at 12:12 AM Daniel Povey ***@***.***> wrote:
Do the same after doing
export K2_SYNC_KERNELS=1
.. wanna see if the error was the first one.
On Tue, Jan 12, 2021 at 11:49 PM shanguanma ***@***.***>
wrote:
> ***@***.*** simple_v1]$ gdb --args python3 mmi_bigram_train.py
> (gdb) r
> Starting program: /home4/md510/anaconda3/envs/k2-fsa2/bin/python3 mmi_bigram_train.py
> warning: Unable to open "librpm.so.3" (/home4/md510/anaconda3/lib/liblzma.so.5: version `XZ_5.1.2alpha' not found (required by /lib64/librpmio.so.3)), missing debuginfos notifications will not be displayed
> Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/27/ffd1fbc69569c776e666474eed723395e6d727.debug
> Missing separate debuginfo for /lib64/libpthread.so.0
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/2b/482b3bae79def4e5bc9791bc6bbdae0e93e359.debug
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Missing separate debuginfo for /lib64/libc.so.6
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d7/8066a9c36f5fd63e2f6ac851ae3515c4c9792a.debug
> Missing separate debuginfo for /lib64/libdl.so.2
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/f2/c36986e11a291a0d4bcb3a81632b24ae2359ea.debug
> Missing separate debuginfo for /lib64/libutil.so.1
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/15/86cefa927d26f144de15389f28c1cbf04c81ef.debug
> Missing separate debuginfo for /lib64/librt.so.1
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/cc/d4be566dd5a8fc7fa62b224c14b698f51b0d0d.debug
> Missing separate debuginfo for /lib64/libm.so.6
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/08/5d924f5d23b9f15a8ad28b7231ee93c09e13f1.debug
> [Detaching after fork from child process 66884]
> Missing separate debuginfo for /lib64/libcuda.so.1
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/3a587b4d79216ae274467480fa10f2c44ed2d0.debug
> [Detaching after fork from child process 66894]
> Missing separate debuginfo for /lib64/libsndfile.so.1
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/bf/637fda83ef4f46cd3e5c172031e926dac51faa.debug
> Missing separate debuginfo for /lib64/libgsm.so.1
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/8c2bd826e5837d3cee7c5cee8ed85827a90d5c.debug
> Missing separate debuginfo for /lib64/libFLAC.so.8
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d1/9584153c0799926a60973fb77de214161e7072.debug
> Missing separate debuginfo for /lib64/libvorbisenc.so.2
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/e5/4da1382c034ef216379710265df600eb741e6d.debug
> Missing separate debuginfo for /lib64/libvorbis.so.0
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/75/48d115412cc33bf67c1598e446c70daa1b7461.debug
> Missing separate debuginfo for /lib64/libogg.so.0
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/6c/77e88fb8736ffe5770b2e96ee60c8a6460d782.debug
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch/audio#903 for the detail.
> warnings.warn(
> [New Thread 0x2aab3309b700 (LWP 66896)]
> 2021-01-12 23:40:11,250 INFO [mmi_bigram_train.py:310] Loading L.fst
> 2021-01-12 23:40:11,533 INFO [mmi_bigram_train.py:328] About to get train cuts
> 2021-01-12 23:40:17,630 INFO [mmi_bigram_train.py:330] About to get dev cuts
> 2021-01-12 23:40:17,727 INFO [mmi_bigram_train.py:333] About to create train dataset
> 2021-01-12 23:40:18,201 INFO [mmi_bigram_train.py:337] About to create dev dataset
> 2021-01-12 23:40:18,223 INFO [mmi_bigram_train.py:341] About to create train dataloader
> 2021-01-12 23:40:18,223 INFO [mmi_bigram_train.py:343] About to create dev dataloader
> [New Thread 0x2aab451f3700 (LWP 66931)]
> 2021-01-12 23:40:18,276 INFO [mmi_bigram_train.py:350] About to create model
> [New Thread 0x2aab453f4700 (LWP 66933)]
> [New Thread 0x2aab455f5700 (LWP 66934)]
> ================================================================================
> Model parameters summary:
> ================================================================================
> * P_scores: 7568
> * tdnn.0.weight: 60000
> * tdnn.0.bias: 500
> * tdnn.3.weight: 750000
> * tdnn.3.bias: 500
> * tdnn.6.weight: 750000
> * tdnn.6.bias: 500
> * lstms.0.weight_ih_l0: 1000000
> * lstms.0.weight_hh_l0: 1000000
> * lstms.0.bias_ih_l0: 2000
> * lstms.0.bias_hh_l0: 2000
> * lstms.1.weight_ih_l0: 1000000
> * lstms.1.weight_hh_l0: 1000000
> * lstms.1.bias_ih_l0: 2000
> * lstms.1.bias_hh_l0: 2000
> * lstms.2.weight_ih_l0: 1000000
> * lstms.2.weight_hh_l0: 1000000
> * lstms.2.bias_ih_l0: 2000
> * lstms.2.bias_hh_l0: 2000
> * lstms.3.weight_ih_l0: 1000000
> * lstms.3.weight_hh_l0: 1000000
> * lstms.3.bias_ih_l0: 2000
> * lstms.3.bias_hh_l0: 2000
> * lstms.4.weight_ih_l0: 1000000
> * lstms.4.weight_hh_l0: 1000000
> * lstms.4.bias_ih_l0: 2000
> * lstms.4.bias_hh_l0: 2000
> * linear.weight: 43500
> * linear.bias: 87
> ================================================================================
> Total: 11632655
> ================================================================================
> 2021-01-12 23:40:21,868 INFO [mmi_bigram_train.py:400] epoch 0, learning rate 0.001
> [Detaching after fork from child process 66939]
> [Detaching after fork from child process 66940]
> [Detaching after fork from child process 66941]
> [Detaching after fork from child process 66942]
> [New Thread 0x2aab45a08700 (LWP 66943)]
> [New Thread 0x2aab45c09700 (LWP 66944)]
> [New Thread 0x2aab45e0a700 (LWP 66945)]
> [New Thread 0x2aab48200700 (LWP 66946)]
> [F] /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]:280 Check failed: ret == cudaSuccess (700 vs. 0) Error: an illegal memory access was encountered.
>
>
> [ Stack-Trace: ]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::internal::Logger::~Logger()+0x2e) [0x2aab2cf365ee]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::Array1<int>::operator[](int) const+0x56c) [0x2aab2cf3ad80]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::Array1<int>::Back() const+0x130) [0x2aab2cf385a0]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int)+0x27f) [0x2aab2d08e937]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int, k2::Array1<int>*, k2::Array1<int>*, int)+0x70) [0x2aab2d08f662]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::GetIncomingArcs(k2::Ragged<k2::Arc>&, k2::Array1<int> const&)+0x38b) [0x2aab2cfc7398]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::MultiGraphDenseIntersect::MultiGraphDenseIntersect(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float)+0x551) [0x2aab2d040b2b]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::IntersectDense(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float, k2::Ragged<k2::Arc>*, k2::Array1<int>*, k2::Array1<int>*)+0x91) [0x2aab2d03b65e]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb356e) [0x2aab296be56e]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbc772) [0x2aab296c7772]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbb9b0) [0x2aab296c69b0]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb99d5) [0x2aab296c49d5]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb9a5f) [0x2aab296c4a5f]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0x48c20) [0x2aab29653c20]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0x56) [0x5555556d3f76]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x11d0) [0x555555715b90]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyVectorcall_Call+0x71) [0x555555691041]
> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x93d) [0x2aaacd9aa98d]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0xdb) [0x5555556d3ffb]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x4596) [0x555555718f56]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10077f) [0x55555565477f]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x7df) [0x5555556def9f]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10011a) [0x55555565411a]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCodeEx+0x44) [0x5555556df754]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCode+0x1c) [0x55555576dedc]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x219f84) [0x55555576df84]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x24c1f4) [0x5555557a01f4]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_FileExFlags+0xa1) [0x5555556686e1]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_SimpleFileExFlags+0x3b4) [0x555555668ac6]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x11598b) [0x55555566998b]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(Py_BytesMain+0x39) [0x5555557a2d19]
> /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaaaf0d555]
> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x1dee93) [0x555555732e93]
>
>
> Program received signal SIGABRT, Aborted.
> 0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
>
> (gdb) bt full
> #0 0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
> No symbol table info available.
> #1 0x00002aaaaaf22a78 in abort () from /lib64/libc.so.6
> No symbol table info available.
> #2 0x00002aab2cf36630 in k2::internal::Logger::~Logger (this=0x7fffffffb340, __in_chrg=<optimized out>) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/log.h:149
> stack_trace = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
> _M_p = 0x5555c7e0dee8 "[ Stack-Trace: ]\n/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]\n/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/sit"...}}
> #3 0x00002aab2cf3ad80 in k2::Array1<int>::operator[] (this=0x7fffffffb680, i=64) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:280
> ans = 21845
> ret = cudaErrorIllegalAddress
> __PRETTY_FUNCTION__ = "T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]"
> k2_nvtx_6 = {<No data fields>}
> data = 0x2aabaae45100
> type = k2::kCuda
> #4 0x00002aab2cf385a0 in k2::Array1<int>::Back (this=0x7fffffffb680) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:289
> __PRETTY_FUNCTION__ = "T k2::Array1<T>::Back() const [with T = int]"
> #5 0x00002aab2d08e937 in k2::RaggedShape2 (row_splits=0x7fffffffb680, row_ids=0x7fffffffb6a0, cached_tot_size=35078) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu:112
> k2_nvtx_65 = {<No data fields>}
> __PRETTY_FUNCTION__ = "k2::RaggedShape k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int32_t)"
> ctx = {<std::__shared_ptr<k2::Context, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Context, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c69b9920, _M_refcount = {_M_pi = 0x5555c69b9910}}, <No data fields>}
> axes = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
> _M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
> _M_start = 0x5555c69c4e38, _M_finish = 0x7fffffffb498, _M_end_of_storage = 0xffffffffffffb460}}, <No data fields>}
> #6 0x00002aab2d08f662 in k2::RaggedShape3 (row_splits1=0x7fffffffb680, row_ids1=0x7fffffffb6a0, cached_tot_size1=35078, row_splits2=0x7fffffffb6c0, row_ids2=0x7fffffffb6e0,
> cached_tot_size2=101526) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu:193
> k2_nvtx_68 = {<No data fields>}
> __PRETTY_FUNCTION__ = "k2::RaggedShape k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int32_t, k2::Array1<int>*, k2::Array1<int>*, int32_t)"
> shape1 = {layers_ = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
> _M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
> _M_start = 0x5555c69bd278, _M_finish = 0x7fffffffb5b8, _M_end_of_storage = 0x2aab29689143
> <__gnu_cxx::__atomic_add_dispatch(_Atomic_word*, int)+46>}}, <No data fields>}}
> temp_array = {dim_ = -962881248, byte_offset_ = 140737488337984,
> region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fffffffb5a0, _M_refcount = {_M_pi = 0x12cf6eaa2}}, <No data fields>}}
> #7 0x00002aab2cfc7398 in k2::GetIncomingArcs (fsas=..., dest_states=...) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/fsa_utils.cu:837
> k2_nvtx_76 = {<No data fields>}
> __PRETTY_FUNCTION__ = "k2::Ragged<int> k2::GetIncomingArcs(k2::FsaVec&, const k2::Array1<int>&)"
> c = @0x5555c8017fa0: {<std::__shared_ptr<k2::Context, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Context, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c69b9920, _M_refcount = {_M_pi = 0x5555c69b9910}}, <No data fields>}
> dest_states_tensor = {shape = {layers_ = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
> _M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
> _M_start = 0x5555c8014070, _M_finish = 0x5555c8014100, _M_end_of_storage = 0x5555c8014100}}, <No data fields>}}, values = {dim_ = 101526, byte_offset_ = 0,
> region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c8056db0, _M_refcount = {_M_pi = 0x5555c8056da0}}, <No data fields>}}}
> num_fsas = 64
> num_states = 35078
> num_arcs = 101526
> incoming_arcs_order = {dim_ = 101526, byte_offset_ = 0,
> region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c7fc3b10, _M_refcount = {_M_pi = 0x5555c7fc3b00}}, <No data fields>}}
> ans_row_ids2 = {dim_ = 101526, byte_offset_ = 0,
>
>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#569 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAZFLO7CLNFHITUTT3VAZHLSZRVQLANCNFSM4VUIPSPQ>
> .
>
|
... GetTransposeReordering() should return a permutation of the
numbrers (0,1,2,3,4...). That should be reasonably easy to test, e.g. by
summing it and comparing with the formula.
We should first add a Sum() function for arrays, e.g. model it on the Max()
function declared in array_ops.h.
…On Wed, Jan 13, 2021 at 12:13 AM Daniel Povey ***@***.***> wrote:
.. it could be a bug in GetTransposeReordering() which is called by
GetIncomingArcs().
If anyone has time to suggest what debug code to add, to verify the output
of that, it might be good. getting late for me.
On Wed, Jan 13, 2021 at 12:12 AM Daniel Povey ***@***.***> wrote:
> Do the same after doing
> export K2_SYNC_KERNELS=1
> .. wanna see if the error was the first one.
>
>
> On Tue, Jan 12, 2021 at 11:49 PM shanguanma ***@***.***>
> wrote:
>
>> ***@***.*** simple_v1]$ gdb --args python3 mmi_bigram_train.py
>> (gdb) r
>> Starting program: /home4/md510/anaconda3/envs/k2-fsa2/bin/python3 mmi_bigram_train.py
>> warning: Unable to open "librpm.so.3" (/home4/md510/anaconda3/lib/liblzma.so.5: version `XZ_5.1.2alpha' not found (required by /lib64/librpmio.so.3)), missing debuginfos notifications will not be displayed
>> Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/27/ffd1fbc69569c776e666474eed723395e6d727.debug
>> Missing separate debuginfo for /lib64/libpthread.so.0
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/2b/482b3bae79def4e5bc9791bc6bbdae0e93e359.debug
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Missing separate debuginfo for /lib64/libc.so.6
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d7/8066a9c36f5fd63e2f6ac851ae3515c4c9792a.debug
>> Missing separate debuginfo for /lib64/libdl.so.2
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/f2/c36986e11a291a0d4bcb3a81632b24ae2359ea.debug
>> Missing separate debuginfo for /lib64/libutil.so.1
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/15/86cefa927d26f144de15389f28c1cbf04c81ef.debug
>> Missing separate debuginfo for /lib64/librt.so.1
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/cc/d4be566dd5a8fc7fa62b224c14b698f51b0d0d.debug
>> Missing separate debuginfo for /lib64/libm.so.6
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/08/5d924f5d23b9f15a8ad28b7231ee93c09e13f1.debug
>> [Detaching after fork from child process 66884]
>> Missing separate debuginfo for /lib64/libcuda.so.1
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/3a587b4d79216ae274467480fa10f2c44ed2d0.debug
>> [Detaching after fork from child process 66894]
>> Missing separate debuginfo for /lib64/libsndfile.so.1
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/bf/637fda83ef4f46cd3e5c172031e926dac51faa.debug
>> Missing separate debuginfo for /lib64/libgsm.so.1
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/8c2bd826e5837d3cee7c5cee8ed85827a90d5c.debug
>> Missing separate debuginfo for /lib64/libFLAC.so.8
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d1/9584153c0799926a60973fb77de214161e7072.debug
>> Missing separate debuginfo for /lib64/libvorbisenc.so.2
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/e5/4da1382c034ef216379710265df600eb741e6d.debug
>> Missing separate debuginfo for /lib64/libvorbis.so.0
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/75/48d115412cc33bf67c1598e446c70daa1b7461.debug
>> Missing separate debuginfo for /lib64/libogg.so.0
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/6c/77e88fb8736ffe5770b2e96ee60c8a6460d782.debug
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch/audio#903 for the detail.
>> warnings.warn(
>> [New Thread 0x2aab3309b700 (LWP 66896)]
>> 2021-01-12 23:40:11,250 INFO [mmi_bigram_train.py:310] Loading L.fst
>> 2021-01-12 23:40:11,533 INFO [mmi_bigram_train.py:328] About to get train cuts
>> 2021-01-12 23:40:17,630 INFO [mmi_bigram_train.py:330] About to get dev cuts
>> 2021-01-12 23:40:17,727 INFO [mmi_bigram_train.py:333] About to create train dataset
>> 2021-01-12 23:40:18,201 INFO [mmi_bigram_train.py:337] About to create dev dataset
>> 2021-01-12 23:40:18,223 INFO [mmi_bigram_train.py:341] About to create train dataloader
>> 2021-01-12 23:40:18,223 INFO [mmi_bigram_train.py:343] About to create dev dataloader
>> [New Thread 0x2aab451f3700 (LWP 66931)]
>> 2021-01-12 23:40:18,276 INFO [mmi_bigram_train.py:350] About to create model
>> [New Thread 0x2aab453f4700 (LWP 66933)]
>> [New Thread 0x2aab455f5700 (LWP 66934)]
>> ================================================================================
>> Model parameters summary:
>> ================================================================================
>> * P_scores: 7568
>> * tdnn.0.weight: 60000
>> * tdnn.0.bias: 500
>> * tdnn.3.weight: 750000
>> * tdnn.3.bias: 500
>> * tdnn.6.weight: 750000
>> * tdnn.6.bias: 500
>> * lstms.0.weight_ih_l0: 1000000
>> * lstms.0.weight_hh_l0: 1000000
>> * lstms.0.bias_ih_l0: 2000
>> * lstms.0.bias_hh_l0: 2000
>> * lstms.1.weight_ih_l0: 1000000
>> * lstms.1.weight_hh_l0: 1000000
>> * lstms.1.bias_ih_l0: 2000
>> * lstms.1.bias_hh_l0: 2000
>> * lstms.2.weight_ih_l0: 1000000
>> * lstms.2.weight_hh_l0: 1000000
>> * lstms.2.bias_ih_l0: 2000
>> * lstms.2.bias_hh_l0: 2000
>> * lstms.3.weight_ih_l0: 1000000
>> * lstms.3.weight_hh_l0: 1000000
>> * lstms.3.bias_ih_l0: 2000
>> * lstms.3.bias_hh_l0: 2000
>> * lstms.4.weight_ih_l0: 1000000
>> * lstms.4.weight_hh_l0: 1000000
>> * lstms.4.bias_ih_l0: 2000
>> * lstms.4.bias_hh_l0: 2000
>> * linear.weight: 43500
>> * linear.bias: 87
>> ================================================================================
>> Total: 11632655
>> ================================================================================
>> 2021-01-12 23:40:21,868 INFO [mmi_bigram_train.py:400] epoch 0, learning rate 0.001
>> [Detaching after fork from child process 66939]
>> [Detaching after fork from child process 66940]
>> [Detaching after fork from child process 66941]
>> [Detaching after fork from child process 66942]
>> [New Thread 0x2aab45a08700 (LWP 66943)]
>> [New Thread 0x2aab45c09700 (LWP 66944)]
>> [New Thread 0x2aab45e0a700 (LWP 66945)]
>> [New Thread 0x2aab48200700 (LWP 66946)]
>> [F] /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]:280 Check failed: ret == cudaSuccess (700 vs. 0) Error: an illegal memory access was encountered.
>>
>>
>> [ Stack-Trace: ]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::internal::Logger::~Logger()+0x2e) [0x2aab2cf365ee]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::Array1<int>::operator[](int) const+0x56c) [0x2aab2cf3ad80]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::Array1<int>::Back() const+0x130) [0x2aab2cf385a0]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int)+0x27f) [0x2aab2d08e937]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int, k2::Array1<int>*, k2::Array1<int>*, int)+0x70) [0x2aab2d08f662]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::GetIncomingArcs(k2::Ragged<k2::Arc>&, k2::Array1<int> const&)+0x38b) [0x2aab2cfc7398]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::MultiGraphDenseIntersect::MultiGraphDenseIntersect(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float)+0x551) [0x2aab2d040b2b]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::IntersectDense(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float, k2::Ragged<k2::Arc>*, k2::Array1<int>*, k2::Array1<int>*)+0x91) [0x2aab2d03b65e]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb356e) [0x2aab296be56e]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbc772) [0x2aab296c7772]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbb9b0) [0x2aab296c69b0]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb99d5) [0x2aab296c49d5]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb9a5f) [0x2aab296c4a5f]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0x48c20) [0x2aab29653c20]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0x56) [0x5555556d3f76]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x11d0) [0x555555715b90]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyVectorcall_Call+0x71) [0x555555691041]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x93d) [0x2aaacd9aa98d]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0xdb) [0x5555556d3ffb]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x4596) [0x555555718f56]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10077f) [0x55555565477f]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x7df) [0x5555556def9f]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10011a) [0x55555565411a]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCodeEx+0x44) [0x5555556df754]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCode+0x1c) [0x55555576dedc]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x219f84) [0x55555576df84]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x24c1f4) [0x5555557a01f4]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_FileExFlags+0xa1) [0x5555556686e1]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_SimpleFileExFlags+0x3b4) [0x555555668ac6]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x11598b) [0x55555566998b]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(Py_BytesMain+0x39) [0x5555557a2d19]
>> /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaaaf0d555]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x1dee93) [0x555555732e93]
>>
>>
>> Program received signal SIGABRT, Aborted.
>> 0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
>>
>> (gdb) bt full
>> #0 0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
>> No symbol table info available.
>> #1 0x00002aaaaaf22a78 in abort () from /lib64/libc.so.6
>> No symbol table info available.
>> #2 0x00002aab2cf36630 in k2::internal::Logger::~Logger (this=0x7fffffffb340, __in_chrg=<optimized out>) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/log.h:149
>> stack_trace = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
>> _M_p = 0x5555c7e0dee8 "[ Stack-Trace: ]\n/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]\n/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/sit"...}}
>> #3 0x00002aab2cf3ad80 in k2::Array1<int>::operator[] (this=0x7fffffffb680, i=64) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:280
>> ans = 21845
>> ret = cudaErrorIllegalAddress
>> __PRETTY_FUNCTION__ = "T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]"
>> k2_nvtx_6 = {<No data fields>}
>> data = 0x2aabaae45100
>> type = k2::kCuda
>> #4 0x00002aab2cf385a0 in k2::Array1<int>::Back (this=0x7fffffffb680) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:289
>> __PRETTY_FUNCTION__ = "T k2::Array1<T>::Back() const [with T = int]"
>> #5 0x00002aab2d08e937 in k2::RaggedShape2 (row_splits=0x7fffffffb680, row_ids=0x7fffffffb6a0, cached_tot_size=35078) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu:112
>> k2_nvtx_65 = {<No data fields>}
>> __PRETTY_FUNCTION__ = "k2::RaggedShape k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int32_t)"
>> ctx = {<std::__shared_ptr<k2::Context, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Context, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c69b9920, _M_refcount = {_M_pi = 0x5555c69b9910}}, <No data fields>}
>> axes = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
>> _M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
>> _M_start = 0x5555c69c4e38, _M_finish = 0x7fffffffb498, _M_end_of_storage = 0xffffffffffffb460}}, <No data fields>}
>> #6 0x00002aab2d08f662 in k2::RaggedShape3 (row_splits1=0x7fffffffb680, row_ids1=0x7fffffffb6a0, cached_tot_size1=35078, row_splits2=0x7fffffffb6c0, row_ids2=0x7fffffffb6e0,
>> cached_tot_size2=101526) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu:193
>> k2_nvtx_68 = {<No data fields>}
>> __PRETTY_FUNCTION__ = "k2::RaggedShape k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int32_t, k2::Array1<int>*, k2::Array1<int>*, int32_t)"
>> shape1 = {layers_ = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
>> _M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
>> _M_start = 0x5555c69bd278, _M_finish = 0x7fffffffb5b8, _M_end_of_storage = 0x2aab29689143
>> <__gnu_cxx::__atomic_add_dispatch(_Atomic_word*, int)+46>}}, <No data fields>}}
>> temp_array = {dim_ = -962881248, byte_offset_ = 140737488337984,
>> region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fffffffb5a0, _M_refcount = {_M_pi = 0x12cf6eaa2}}, <No data fields>}}
>> #7 0x00002aab2cfc7398 in k2::GetIncomingArcs (fsas=..., dest_states=...) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/fsa_utils.cu:837
>> k2_nvtx_76 = {<No data fields>}
>> __PRETTY_FUNCTION__ = "k2::Ragged<int> k2::GetIncomingArcs(k2::FsaVec&, const k2::Array1<int>&)"
>> c = @0x5555c8017fa0: {<std::__shared_ptr<k2::Context, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Context, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c69b9920, _M_refcount = {_M_pi = 0x5555c69b9910}}, <No data fields>}
>> dest_states_tensor = {shape = {layers_ = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
>> _M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
>> _M_start = 0x5555c8014070, _M_finish = 0x5555c8014100, _M_end_of_storage = 0x5555c8014100}}, <No data fields>}}, values = {dim_ = 101526, byte_offset_ = 0,
>> region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c8056db0, _M_refcount = {_M_pi = 0x5555c8056da0}}, <No data fields>}}}
>> num_fsas = 64
>> num_states = 35078
>> num_arcs = 101526
>> incoming_arcs_order = {dim_ = 101526, byte_offset_ = 0,
>> region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c7fc3b10, _M_refcount = {_M_pi = 0x5555c7fc3b00}}, <No data fields>}}
>> ans_row_ids2 = {dim_ = 101526, byte_offset_ = 0,
>>
>>
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#569 (comment)>, or
>> unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AAZFLO7CLNFHITUTT3VAZHLSZRVQLANCNFSM4VUIPSPQ>
>> .
>>
>
|
Will training with CPU give the same error?
Tuesday, 12 January 2021, 23:49 +0800 from notifications@github.com <notifications@github.com>:
…***@***.*** simple_v1]$ gdb --args python3 mmi_bigram_train.py
(gdb) r
Starting program: /home4/md510/anaconda3/envs/k2-fsa2/bin/python3 mmi_bigram_train.py
warning: Unable to open "librpm.so.3" (/home4/md510/anaconda3/lib/liblzma.so.5: version `XZ_5.1.2alpha' not found (required by /lib64/librpmio.so.3)), missing debuginfos notifications will not be displayed
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/27/ffd1fbc69569c776e666474eed723395e6d727.debug
Missing separate debuginfo for /lib64/libpthread.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/2b/482b3bae79def4e5bc9791bc6bbdae0e93e359.debug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /lib64/libc.so.6
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d7/8066a9c36f5fd63e2f6ac851ae3515c4c9792a.debug
Missing separate debuginfo for /lib64/libdl.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/f2/c36986e11a291a0d4bcb3a81632b24ae2359ea.debug
Missing separate debuginfo for /lib64/libutil.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/15/86cefa927d26f144de15389f28c1cbf04c81ef.debug
Missing separate debuginfo for /lib64/librt.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/cc/d4be566dd5a8fc7fa62b224c14b698f51b0d0d.debug
Missing separate debuginfo for /lib64/libm.so.6
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/08/5d924f5d23b9f15a8ad28b7231ee93c09e13f1.debug
[Detaching after fork from child process 66884]
Missing separate debuginfo for /lib64/libcuda.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/3a587b4d79216ae274467480fa10f2c44ed2d0.debug
[Detaching after fork from child process 66894]
Missing separate debuginfo for /lib64/libsndfile.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/bf/637fda83ef4f46cd3e5c172031e926dac51faa.debug
Missing separate debuginfo for /lib64/libgsm.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/8c2bd826e5837d3cee7c5cee8ed85827a90d5c.debug
Missing separate debuginfo for /lib64/libFLAC.so.8
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d1/9584153c0799926a60973fb77de214161e7072.debug
Missing separate debuginfo for /lib64/libvorbisenc.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/e5/4da1382c034ef216379710265df600eb741e6d.debug
Missing separate debuginfo for /lib64/libvorbis.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/75/48d115412cc33bf67c1598e446c70daa1b7461.debug
Missing separate debuginfo for /lib64/libogg.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/6c/77e88fb8736ffe5770b2e96ee60c8a6460d782.debug
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch/audio#903 for the detail.
warnings.warn(
[New Thread 0x2aab3309b700 (LWP 66896)]
2021-01-12 23:40:11,250 INFO [mmi_bigram_train.py:310] Loading L.fst
2021-01-12 23:40:11,533 INFO [mmi_bigram_train.py:328] About to get train cuts
2021-01-12 23:40:17,630 INFO [mmi_bigram_train.py:330] About to get dev cuts
2021-01-12 23:40:17,727 INFO [mmi_bigram_train.py:333] About to create train dataset
2021-01-12 23:40:18,201 INFO [mmi_bigram_train.py:337] About to create dev dataset
2021-01-12 23:40:18,223 INFO [mmi_bigram_train.py:341] About to create train dataloader
2021-01-12 23:40:18,223 INFO [mmi_bigram_train.py:343] About to create dev dataloader
[New Thread 0x2aab451f3700 (LWP 66931)]
2021-01-12 23:40:18,276 INFO [mmi_bigram_train.py:350] About to create model
[New Thread 0x2aab453f4700 (LWP 66933)]
[New Thread 0x2aab455f5700 (LWP 66934)]
================================================================================
Model parameters summary:
================================================================================
* P_scores: 7568
* tdnn.0.weight: 60000
* tdnn.0.bias: 500
* tdnn.3.weight: 750000
* tdnn.3.bias: 500
* tdnn.6.weight: 750000
* tdnn.6.bias: 500
* lstms.0.weight_ih_l0: 1000000
* lstms.0.weight_hh_l0: 1000000
* lstms.0.bias_ih_l0: 2000
* lstms.0.bias_hh_l0: 2000
* lstms.1.weight_ih_l0: 1000000
* lstms.1.weight_hh_l0: 1000000
* lstms.1.bias_ih_l0: 2000
* lstms.1.bias_hh_l0: 2000
* lstms.2.weight_ih_l0: 1000000
* lstms.2.weight_hh_l0: 1000000
* lstms.2.bias_ih_l0: 2000
* lstms.2.bias_hh_l0: 2000
* lstms.3.weight_ih_l0: 1000000
* lstms.3.weight_hh_l0: 1000000
* lstms.3.bias_ih_l0: 2000
* lstms.3.bias_hh_l0: 2000
* lstms.4.weight_ih_l0: 1000000
* lstms.4.weight_hh_l0: 1000000
* lstms.4.bias_ih_l0: 2000
* lstms.4.bias_hh_l0: 2000
* linear.weight: 43500
* linear.bias: 87
================================================================================
Total: 11632655
================================================================================
2021-01-12 23:40:21,868 INFO [mmi_bigram_train.py:400] epoch 0, learning rate 0.001
[Detaching after fork from child process 66939]
[Detaching after fork from child process 66940]
[Detaching after fork from child process 66941]
[Detaching after fork from child process 66942]
[New Thread 0x2aab45a08700 (LWP 66943)]
[New Thread 0x2aab45c09700 (LWP 66944)]
[New Thread 0x2aab45e0a700 (LWP 66945)]
[New Thread 0x2aab48200700 (LWP 66946)]
[F] /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]:280 Check failed: ret == cudaSuccess (700 vs. 0) Error: an illegal memory access was encountered.
[ Stack-Trace: ]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::internal::Logger::~Logger()+0x2e) [0x2aab2cf365ee]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::Array1<int>::operator[](int) const+0x56c) [0x2aab2cf3ad80]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::Array1<int>::Back() const+0x130) [0x2aab2cf385a0]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int)+0x27f) [0x2aab2d08e937]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int, k2::Array1<int>*, k2::Array1<int>*, int)+0x70) [0x2aab2d08f662]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::GetIncomingArcs(k2::Ragged<k2::Arc>&, k2::Array1<int> const&)+0x38b) [0x2aab2cfc7398]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::MultiGraphDenseIntersect::MultiGraphDenseIntersect(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float)+0x551) [0x2aab2d040b2b]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::IntersectDense(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float, k2::Ragged<k2::Arc>*, k2::Array1<int>*, k2::Array1<int>*)+0x91) [0x2aab2d03b65e]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb356e) [0x2aab296be56e]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbc772) [0x2aab296c7772]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbb9b0) [0x2aab296c69b0]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb99d5) [0x2aab296c49d5]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb9a5f) [0x2aab296c4a5f]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0x48c20) [0x2aab29653c20]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0x56) [0x5555556d3f76]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x11d0) [0x555555715b90]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyVectorcall_Call+0x71) [0x555555691041]
/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x93d) [0x2aaacd9aa98d]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0xdb) [0x5555556d3ffb]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x4596) [0x555555718f56]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10077f) [0x55555565477f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x7df) [0x5555556def9f]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10011a) [0x55555565411a]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCodeEx+0x44) [0x5555556df754]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCode+0x1c) [0x55555576dedc]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x219f84) [0x55555576df84]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x24c1f4) [0x5555557a01f4]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_FileExFlags+0xa1) [0x5555556686e1]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_SimpleFileExFlags+0x3b4) [0x555555668ac6]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x11598b) [0x55555566998b]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(Py_BytesMain+0x39) [0x5555557a2d19]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaaaf0d555]
/home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x1dee93) [0x555555732e93]
Program received signal SIGABRT, Aborted.
0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
(gdb) bt full
#0 0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00002aaaaaf22a78 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00002aab2cf36630 in k2::internal::Logger::~Logger (this=0x7fffffffb340, __in_chrg=<optimized out>) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/log.h:149
stack_trace = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0x5555c7e0dee8 "[ Stack-Trace: ]\n/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]\n/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/sit"...}}
#3 0x00002aab2cf3ad80 in k2::Array1<int>::operator[] (this=0x7fffffffb680, i=64) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:280
ans = 21845
ret = cudaErrorIllegalAddress
__PRETTY_FUNCTION__ = "T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]"
k2_nvtx_6 = {<No data fields>}
data = 0x2aabaae45100
type = k2::kCuda
#4 0x00002aab2cf385a0 in k2::Array1<int>::Back (this=0x7fffffffb680) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:289
__PRETTY_FUNCTION__ = "T k2::Array1<T>::Back() const [with T = int]"
#5 0x00002aab2d08e937 in k2::RaggedShape2 (row_splits=0x7fffffffb680, row_ids=0x7fffffffb6a0, cached_tot_size=35078) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu:112
k2_nvtx_65 = {<No data fields>}
__PRETTY_FUNCTION__ = "k2::RaggedShape k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int32_t)"
ctx = {<std::__shared_ptr<k2::Context, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Context, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c69b9920, _M_refcount = {_M_pi = 0x5555c69b9910}}, <No data fields>}
axes = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
_M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
_M_start = 0x5555c69c4e38, _M_finish = 0x7fffffffb498, _M_end_of_storage = 0xffffffffffffb460}}, <No data fields>}
#6 0x00002aab2d08f662 in k2::RaggedShape3 (row_splits1=0x7fffffffb680, row_ids1=0x7fffffffb6a0, cached_tot_size1=35078, row_splits2=0x7fffffffb6c0, row_ids2=0x7fffffffb6e0,
cached_tot_size2=101526) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu:193
k2_nvtx_68 = {<No data fields>}
__PRETTY_FUNCTION__ = "k2::RaggedShape k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int32_t, k2::Array1<int>*, k2::Array1<int>*, int32_t)"
shape1 = {layers_ = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
_M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
_M_start = 0x5555c69bd278, _M_finish = 0x7fffffffb5b8, _M_end_of_storage = 0x2aab29689143
<__gnu_cxx::__atomic_add_dispatch(_Atomic_word*, int)+46>}}, <No data fields>}}
temp_array = {dim_ = -962881248, byte_offset_ = 140737488337984,
region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fffffffb5a0, _M_refcount = {_M_pi = 0x12cf6eaa2}}, <No data fields>}}
#7 0x00002aab2cfc7398 in k2::GetIncomingArcs (fsas=..., dest_states=...) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/fsa_utils.cu:837
k2_nvtx_76 = {<No data fields>}
__PRETTY_FUNCTION__ = "k2::Ragged<int> k2::GetIncomingArcs(k2::FsaVec&, const k2::Array1<int>&)"
c = @0x5555c8017fa0: {<std::__shared_ptr<k2::Context, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Context, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c69b9920, _M_refcount = {_M_pi = 0x5555c69b9910}}, <No data fields>}
dest_states_tensor = {shape = {layers_ = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
_M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
_M_start = 0x5555c8014070, _M_finish = 0x5555c8014100, _M_end_of_storage = 0x5555c8014100}}, <No data fields>}}, values = {dim_ = 101526, byte_offset_ = 0,
region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c8056db0, _M_refcount = {_M_pi = 0x5555c8056da0}}, <No data fields>}}}
num_fsas = 64
num_states = 35078
num_arcs = 101526
incoming_arcs_order = {dim_ = 101526, byte_offset_ = 0,
region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c7fc3b10, _M_refcount = {_M_pi = 0x5555c7fc3b00}}, <No data fields>}}
ans_row_ids2 = {dim_ = 101526, byte_offset_ = 0,
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or unsubscribe .
|
I run
|
Yes, I try it again, it occurs same error, note: I use single GPU on mmi_bigram_train.py |
I am creating some extra checking code that may discover the source of the
problem.
…On Wed, Jan 13, 2021 at 10:08 AM shanguanma ***@***.***> wrote:
Do the same after doing export K2_SYNC_KERNELS=1 .. wanna see if the error
was the first one.
… <#m_4160541195865294463_>
Yes, I try it again, it occurs same error, note: I use single GPU on
mmi_bigram_train.py
The error is same as #569 (comment)
<#569 (comment)>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOZGRF33LSPAMXQSAOTSZT6ATANCNFSM4VUIPSPQ>
.
|
Please try running with this code: |
OK, I try to run it. |
Will fix it after lunch.
Wednesday, 13 January 2021, 00:14 +0800 from notifications@github.com <notifications@github.com>:
….. it could be a bug in GetTransposeReordering() which is called by
GetIncomingArcs().
If anyone has time to suggest what debug code to add, to verify the output
of that, it might be good. getting late for me.
On Wed, Jan 13, 2021 at 12:12 AM Daniel Povey < ***@***.*** > wrote:
> Do the same after doing
> export K2_SYNC_KERNELS=1
> .. wanna see if the error was the first one.
>
>
> On Tue, Jan 12, 2021 at 11:49 PM shanguanma < ***@***.*** >
> wrote:
>
>> ***@***.*** simple_v1]$ gdb --args python3 mmi_bigram_train.py
>> (gdb) r
>> Starting program: /home4/md510/anaconda3/envs/k2-fsa2/bin/python3 mmi_bigram_train.py
>> warning: Unable to open "librpm.so.3" (/home4/md510/anaconda3/lib/liblzma.so.5: version `XZ_5.1.2alpha' not found (required by /lib64/librpmio.so.3)), missing debuginfos notifications will not be displayed
>> Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/27/ffd1fbc69569c776e666474eed723395e6d727.debug
>> Missing separate debuginfo for /lib64/libpthread.so.0
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/2b/482b3bae79def4e5bc9791bc6bbdae0e93e359.debug
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Missing separate debuginfo for /lib64/libc.so.6
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d7/8066a9c36f5fd63e2f6ac851ae3515c4c9792a.debug
>> Missing separate debuginfo for /lib64/libdl.so.2
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/f2/c36986e11a291a0d4bcb3a81632b24ae2359ea.debug
>> Missing separate debuginfo for /lib64/libutil.so.1
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/15/86cefa927d26f144de15389f28c1cbf04c81ef.debug
>> Missing separate debuginfo for /lib64/librt.so.1
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/cc/d4be566dd5a8fc7fa62b224c14b698f51b0d0d.debug
>> Missing separate debuginfo for /lib64/libm.so.6
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/08/5d924f5d23b9f15a8ad28b7231ee93c09e13f1.debug
>> [Detaching after fork from child process 66884]
>> Missing separate debuginfo for /lib64/libcuda.so.1
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/3a587b4d79216ae274467480fa10f2c44ed2d0.debug
>> [Detaching after fork from child process 66894]
>> Missing separate debuginfo for /lib64/libsndfile.so.1
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/bf/637fda83ef4f46cd3e5c172031e926dac51faa.debug
>> Missing separate debuginfo for /lib64/libgsm.so.1
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ca/8c2bd826e5837d3cee7c5cee8ed85827a90d5c.debug
>> Missing separate debuginfo for /lib64/libFLAC.so.8
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d1/9584153c0799926a60973fb77de214161e7072.debug
>> Missing separate debuginfo for /lib64/libvorbisenc.so.2
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/e5/4da1382c034ef216379710265df600eb741e6d.debug
>> Missing separate debuginfo for /lib64/libvorbis.so.0
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/75/48d115412cc33bf67c1598e446c70daa1b7461.debug
>> Missing separate debuginfo for /lib64/libogg.so.0
>> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/6c/77e88fb8736ffe5770b2e96ee60c8a6460d782.debug
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch/audio#903 for the detail.
>> warnings.warn(
>> [New Thread 0x2aab3309b700 (LWP 66896)]
>> 2021-01-12 23:40:11,250 INFO [mmi_bigram_train.py:310] Loading L.fst
>> 2021-01-12 23:40:11,533 INFO [mmi_bigram_train.py:328] About to get train cuts
>> 2021-01-12 23:40:17,630 INFO [mmi_bigram_train.py:330] About to get dev cuts
>> 2021-01-12 23:40:17,727 INFO [mmi_bigram_train.py:333] About to create train dataset
>> 2021-01-12 23:40:18,201 INFO [mmi_bigram_train.py:337] About to create dev dataset
>> 2021-01-12 23:40:18,223 INFO [mmi_bigram_train.py:341] About to create train dataloader
>> 2021-01-12 23:40:18,223 INFO [mmi_bigram_train.py:343] About to create dev dataloader
>> [New Thread 0x2aab451f3700 (LWP 66931)]
>> 2021-01-12 23:40:18,276 INFO [mmi_bigram_train.py:350] About to create model
>> [New Thread 0x2aab453f4700 (LWP 66933)]
>> [New Thread 0x2aab455f5700 (LWP 66934)]
>> ================================================================================
>> Model parameters summary:
>> ================================================================================
>> * P_scores: 7568
>> * tdnn.0.weight: 60000
>> * tdnn.0.bias: 500
>> * tdnn.3.weight: 750000
>> * tdnn.3.bias: 500
>> * tdnn.6.weight: 750000
>> * tdnn.6.bias: 500
>> * lstms.0.weight_ih_l0: 1000000
>> * lstms.0.weight_hh_l0: 1000000
>> * lstms.0.bias_ih_l0: 2000
>> * lstms.0.bias_hh_l0: 2000
>> * lstms.1.weight_ih_l0: 1000000
>> * lstms.1.weight_hh_l0: 1000000
>> * lstms.1.bias_ih_l0: 2000
>> * lstms.1.bias_hh_l0: 2000
>> * lstms.2.weight_ih_l0: 1000000
>> * lstms.2.weight_hh_l0: 1000000
>> * lstms.2.bias_ih_l0: 2000
>> * lstms.2.bias_hh_l0: 2000
>> * lstms.3.weight_ih_l0: 1000000
>> * lstms.3.weight_hh_l0: 1000000
>> * lstms.3.bias_ih_l0: 2000
>> * lstms.3.bias_hh_l0: 2000
>> * lstms.4.weight_ih_l0: 1000000
>> * lstms.4.weight_hh_l0: 1000000
>> * lstms.4.bias_ih_l0: 2000
>> * lstms.4.bias_hh_l0: 2000
>> * linear.weight: 43500
>> * linear.bias: 87
>> ================================================================================
>> Total: 11632655
>> ================================================================================
>> 2021-01-12 23:40:21,868 INFO [mmi_bigram_train.py:400] epoch 0, learning rate 0.001
>> [Detaching after fork from child process 66939]
>> [Detaching after fork from child process 66940]
>> [Detaching after fork from child process 66941]
>> [Detaching after fork from child process 66942]
>> [New Thread 0x2aab45a08700 (LWP 66943)]
>> [New Thread 0x2aab45c09700 (LWP 66944)]
>> [New Thread 0x2aab45e0a700 (LWP 66945)]
>> [New Thread 0x2aab48200700 (LWP 66946)]
>> [F] /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]:280 Check failed: ret == cudaSuccess (700 vs. 0) Error: an illegal memory access was encountered.
>>
>>
>> [ Stack-Trace: ]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::internal::Logger::~Logger()+0x2e) [0x2aab2cf365ee]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::Array1<int>::operator[](int) const+0x56c) [0x2aab2cf3ad80]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::Array1<int>::Back() const+0x130) [0x2aab2cf385a0]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int)+0x27f) [0x2aab2d08e937]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int, k2::Array1<int>*, k2::Array1<int>*, int)+0x70) [0x2aab2d08f662]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::GetIncomingArcs(k2::Ragged<k2::Arc>&, k2::Array1<int> const&)+0x38b) [0x2aab2cfc7398]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::MultiGraphDenseIntersect::MultiGraphDenseIntersect(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float)+0x551) [0x2aab2d040b2b]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2context.so(k2::IntersectDense(k2::Ragged<k2::Arc>&, k2::DenseFsaVec&, float, k2::Ragged<k2::Arc>*, k2::Array1<int>*, k2::Array1<int>*)+0x91) [0x2aab2d03b65e]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb356e) [0x2aab296be56e]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbc772) [0x2aab296c7772]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xbb9b0) [0x2aab296c69b0]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb99d5) [0x2aab296c49d5]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xb9a5f) [0x2aab296c4a5f]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0x48c20) [0x2aab29653c20]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0x56) [0x5555556d3f76]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x11d0) [0x555555715b90]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyVectorcall_Call+0x71) [0x555555691041]
>> /home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x93d) [0x2aaacd9aa98d]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyCFunction_Call+0xdb) [0x5555556d3ffb]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyObject_MakeTpCall+0x22f) [0x55555569185f]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalFrameDefault+0x4596) [0x555555718f56]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10077f) [0x55555565477f]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x7df) [0x5555556def9f]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x1e3) [0x5555556df943]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x10011a) [0x55555565411a]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyFunction_Vectorcall+0x10b) [0x5555556df86b]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0xfeb84) [0x555555652b84]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(_PyEval_EvalCodeWithName+0x2d2) [0x5555556dea92]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCodeEx+0x44) [0x5555556df754]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyEval_EvalCode+0x1c) [0x55555576dedc]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x219f84) [0x55555576df84]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x24c1f4) [0x5555557a01f4]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_FileExFlags+0xa1) [0x5555556686e1]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(PyRun_SimpleFileExFlags+0x3b4) [0x555555668ac6]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x11598b) [0x55555566998b]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(Py_BytesMain+0x39) [0x5555557a2d19]
>> /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaaaf0d555]
>> /home4/md510/anaconda3/envs/k2-fsa2/bin/python3(+0x1dee93) [0x555555732e93]
>>
>>
>> Program received signal SIGABRT, Aborted.
>> 0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
>>
>> (gdb) bt full
>> #0 0x00002aaaaaf21387 in raise () from /lib64/libc.so.6
>> No symbol table info available.
>> #1 0x00002aaaaaf22a78 in abort () from /lib64/libc.so.6
>> No symbol table info available.
>> #2 0x00002aab2cf36630 in k2::internal::Logger::~Logger (this=0x7fffffffb340, __in_chrg=<optimized out>) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/log.h:149
>> stack_trace = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
>> _M_p = 0x5555c7e0dee8 "[ Stack-Trace: ]\n/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab3048cc12]\n/home4/md510/anaconda3/envs/k2-fsa2/lib/python3.8/sit"...}}
>> #3 0x00002aab2cf3ad80 in k2::Array1<int>::operator[] (this=0x7fffffffb680, i=64) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:280
>> ans = 21845
>> ret = cudaErrorIllegalAddress
>> __PRETTY_FUNCTION__ = "T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int]"
>> k2_nvtx_6 = {<No data fields>}
>> data = 0x2aabaae45100
>> type = k2::kCuda
>> #4 0x00002aab2cf385a0 in k2::Array1<int>::Back (this=0x7fffffffb680) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/array.h:289
>> __PRETTY_FUNCTION__ = "T k2::Array1<T>::Back() const [with T = int]"
>> #5 0x00002aab2d08e937 in k2::RaggedShape2 (row_splits=0x7fffffffb680, row_ids=0x7fffffffb6a0, cached_tot_size=35078) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu:112
>> k2_nvtx_65 = {<No data fields>}
>> __PRETTY_FUNCTION__ = "k2::RaggedShape k2::RaggedShape2(k2::Array1<int>*, k2::Array1<int>*, int32_t)"
>> ctx = {<std::__shared_ptr<k2::Context, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Context, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c69b9920, _M_refcount = {_M_pi = 0x5555c69b9910}}, <No data fields>}
>> axes = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
>> _M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
>> _M_start = 0x5555c69c4e38, _M_finish = 0x7fffffffb498, _M_end_of_storage = 0xffffffffffffb460}}, <No data fields>}
>> #6 0x00002aab2d08f662 in k2::RaggedShape3 (row_splits1=0x7fffffffb680, row_ids1=0x7fffffffb6a0, cached_tot_size1=35078, row_splits2=0x7fffffffb6c0, row_ids2=0x7fffffffb6e0,
>> cached_tot_size2=101526) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu:193
>> k2_nvtx_68 = {<No data fields>}
>> __PRETTY_FUNCTION__ = "k2::RaggedShape k2::RaggedShape3(k2::Array1<int>*, k2::Array1<int>*, int32_t, k2::Array1<int>*, k2::Array1<int>*, int32_t)"
>> shape1 = {layers_ = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
>> _M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
>> _M_start = 0x5555c69bd278, _M_finish = 0x7fffffffb5b8, _M_end_of_storage = 0x2aab29689143
>> <__gnu_cxx::__atomic_add_dispatch(_Atomic_word*, int)+46>}}, <No data fields>}}
>> temp_array = {dim_ = -962881248, byte_offset_ = 140737488337984,
>> region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fffffffb5a0, _M_refcount = {_M_pi = 0x12cf6eaa2}}, <No data fields>}}
>> #7 0x00002aab2cfc7398 in k2::GetIncomingArcs (fsas=..., dest_states=...) at /home4/md510/w2020/k2-fsa/k2/k2/csrc/fsa_utils.cu:837
>> k2_nvtx_76 = {<No data fields>}
>> __PRETTY_FUNCTION__ = "k2::Ragged<int> k2::GetIncomingArcs(k2::FsaVec&, const k2::Array1<int>&)"
>> c = @0x5555c8017fa0: {<std::__shared_ptr<k2::Context, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Context, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c69b9920, _M_refcount = {_M_pi = 0x5555c69b9910}}, <No data fields>}
>> dest_states_tensor = {shape = {layers_ = {<std::_Vector_base<k2::RaggedShapeLayer, std::allocator<k2::RaggedShapeLayer> >> = {
>> _M_impl = {<std::allocator<k2::RaggedShapeLayer>> = {<__gnu_cxx::new_allocator<k2::RaggedShapeLayer>> = {<No data fields>}, <No data fields>},
>> _M_start = 0x5555c8014070, _M_finish = 0x5555c8014100, _M_end_of_storage = 0x5555c8014100}}, <No data fields>}}, values = {dim_ = 101526, byte_offset_ = 0,
>> region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c8056db0, _M_refcount = {_M_pi = 0x5555c8056da0}}, <No data fields>}}}
>> num_fsas = 64
>> num_states = 35078
>> num_arcs = 101526
>> incoming_arcs_order = {dim_ = 101526, byte_offset_ = 0,
>> region_ = {<std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<k2::Region, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x5555c7fc3b10, _M_refcount = {_M_pi = 0x5555c7fc3b00}}, <No data fields>}}
>> ans_row_ids2 = {dim_ = 101526, byte_offset_ = 0,
>>
>>
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> < #569 (comment) >, or
>> unsubscribe
>> < https://github.com/notifications/unsubscribe-auth/AAZFLO7CLNFHITUTT3VAZHLSZRVQLANCNFSM4VUIPSPQ >
>> .
>>
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or unsubscribe .
|
@shanguanma I think it should fix your problem. |
I follow your code to my k2 codebase, then re-install k2. then run
OK, I try to do it now. |
I follow your code and re-install k2, when I run
|
Can you check that you did |
I add your #586 (comment) to my local k2 codebase. then re-install. is the way wrong? |
What does the line 1221 in your local |
Maybe he merged with my PR?
…On Wed, Jan 13, 2021 at 2:22 PM Fangjun Kuang ***@***.***> wrote:
/home4/md510/w2020/k2-fsa/k2/k2/csrc/ragged_ops.cu(1221): error: variable
"context" is not a type name
What does the line 1221 in your local ragged_ops.cu look like? Is it the
same as the one in #586 <#586> ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO367XJYMYO2TES5U3TSZU34HANCNFSM4VUIPSPQ>
.
|
Yes, After I merge your(@danpovey ) code, then merge @csukuangfj 's code.
|
Don't include my patch, just start from master and then merge his commit.
…On Wed, Jan 13, 2021 at 2:38 PM shanguanma ***@***.***> wrote:
Maybe he merged with my PR?
… <#m_4990092579260797244_>
Yes, After I merge ***@***.*** <https://github.com/danpovey> ) code,
then merge @csukuangfj <https://github.com/csukuangfj> 's code.
Anyway, I show my code to you .
#if __CUDACC_VER_MAJOR__ > 10 || \
1192 (__CUDACC_VER_MAJOR__ == 10 && \
1193 (__CUDACC_VER_MINOR__ > 1 || \
1194 (__CUDACC_VER_MINOR__ == 1 && __CUDACC_VER_BUILD__ > 105)))
1195 // Enable it only for NVCC > 10.1.105
1196 //
1197 // Refer to LLNL/axom#88
1198 // NVCC 10.1.105 has a known issue for cub::DeviceRadixSort
1199 int32_t num_buckets = num_cols;
1200 int32_t num_elements = src.values.Dim();
1201 int32_t log_buckets = static_cast<int32_t>(ceilf(log2f(num_buckets)));
1202
1203 //Array1<int32_t> ans = Range(context, num_elements, 0);
1204 Array1<int32_t> order = Range(context, num_elements, 0);
1205 Array1<int32_t> src_tmp_out(context, num_elements);
1206 Array1<int32_t> ans(context, num_elements);
1207
1208 cudaStream_t stream = context->GetCudaStream();
1209
1210 size_t temp_storage_bytes = 0;
1211 K2_CUDA_SAFE_CALL(cub::DeviceRadixSort::SortPairs(
1212 //nullptr, temp_storage_bytes, src.values.Data(),
1213 //static_cast<int32_t *>(nullptr), ans.Data(), ans.Data(), num_elements, 0,
1214 //log_buckets, stream));
1215 nullptr, temp_storage_bytes, src.values.Data(), src_tmp_out.Data(),
1216 order.Data(), ans.Data(), num_elements, 0, log_buckets, stream));
1217
1218
1219 Array1<int8_t> d_temp_storage(
1220 //context, temp_storage_bytes + num_elements * sizeof(int32_t));
1221 Array1<int8_t> d_temp_storage(context, temp_storage_bytes));
1222
1223 K2_CUDA_SAFE_CALL(cub::DeviceRadixSort::SortPairs(
1224 //d_temp_storage.Data() + sizeof(int32_t) * num_elements,
1225 //temp_storage_bytes, src.values.Data(),
1226 //reinterpret_cast<int32_t *>(d_temp_storage.Data()), ans.Data(),
1227 //ans.Data(), num_elements, 0, log_buckets, stream));
1228 d_temp_storage.Data(), temp_storage_bytes, src.values.Data(),
1229 src_tmp_out.Data(), order.Data(), ans.Data(), num_elements, 0,
1230 log_buckets, stream));
1231
1232 //if (!kDisableDebug && !DisableChecks())
1233 CheckGetTransposeReordering(src, ans);
1234 return ans;
1235 #else
1236 //if (src.NumAxes() == 3)
1237 // return GetTransposeReorderingThreeAxesCuda(src, num_cols);
1238 if (src.NumAxes() == 3){
1239 Array1<int3_t> ans = GetTansposeReorderingThreeAxesCuda(src, num_cols);
1240 //if (!kDisableDebug && !DisableChecks())
1241 CheckGetTransposeReordering(src, ans);
1242 return ans;
```
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#569 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOZSBD5MPOQZMT6JCKDSZU5WXANCNFSM4VUIPSPQ>
.
|
Please use |
You did not resolve the merge conflicts in a correct way. |
Yes, I merge your code with manually. I am so pool for using git command, I don't know to merge your #586 (comment) with automatically |
I would suggest
|
Thanks, I follow your suggestion. |
Solved, Thanks a lot. @danpovey , @csukuangfj |
Do did it work? |
Yes, Dan, it works. |
I install k2 on another computer serve, I encountered an error during installation, Install step is as follows:
the logger is as follows:
then I run
make _k2
, the error is as follows:The text was updated successfully, but these errors were encountered: