Sentiment analysis failures: invalid device function #34

elvinpoon · 2016-09-03T06:12:27Z

Here is the error code.

./train.sh
I0903 14:10:57.917793 18690 Util.cpp:144] commandline: /data2/package/pypaddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --save_dir=./model_output --job=train --use_gpu=1 --trainer_count=4 --num_passes=10 --log_period=10 --dot_period=20 --show_parameter_stats_period=100 --test_all_data_in_one_period=1
I0903 14:11:01.704715 18690 Util.cpp:113] Calling runInitFunctions
I0903 14:11:01.705032 18690 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-03 14:11:02,367 networks.py:1122] The input order is [word, label]
[INFO 2016-09-03 14:11:02,368 networks.py:1129] The output order is [__cost_0__]
I0903 14:11:02.395427 18690 Trainer.cpp:169] trainer mode: Normal
I0903 14:11:02.395754 18690 MultiGradientMachine.cpp:108] numLogicalDevices=1 numThreads=4 numDevices=4
F0903 14:11:02.400593 18690 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
    @     0x7fc9c37175cd  google::LogMessage::Fail()
    @     0x7fc9c3719433  google::LogMessage::SendToLog()
    @     0x7fc9c371715b  google::LogMessage::Flush()
    @     0x7fc9c3719e1e  google::LogMessageFatal::~LogMessageFatal()
    @           0x7d65b2  hl_gpu_apply_unary_op<>()
    @           0x79d156  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x79ccf0  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x780733  paddle::BaseMatrixT<>::zero()
    @           0x561960  paddle::Parameter::enableType()
    @           0x564531  paddle::parameterInitNN()
    @           0x567fe9  paddle::NeuralNetwork::init()
    @           0x55ee4b  paddle::TrainerThread::TrainerThread()
    @           0x55fab7  paddle::MultiGradientMachine::MultiGradientMachine()
    @           0x58788e  paddle::GradientMachine::create()
    @           0x6e296d  paddle::TrainerInternal::init()
    @           0x6dc144  paddle::Trainer::init()
    @           0x54622d  main
    @     0x7fc9c2699830  __libc_start_main
    @           0x54db19  _start
    @              (nil)  (unknown)
/data2/package/pypaddle/bin/paddle: line 46: 18690 Aborted                 ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

The text was updated successfully, but these errors were encountered:

gangliao · 2016-09-03T07:29:06Z

#18 Looks like you use cuda 8.0 with modern Gpu.

invalid device function indicates that you have a CUDA / GPU incompatibility.
Maybe you can modify CMake to fix it.

open cmake/flags.cmake and add following code:

if (CUDA_VERSION VERSION_GREATER "8.0")

list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_60")

endif()

then, rebuild the project

elvinpoon · 2016-09-06T01:40:51Z

I tried this but it doesn't work...same error

gangliao · 2016-10-08T02:02:09Z

Fix CUDA_VERSION Comparsion #165

update paddle.io

* add fft_c2r numpy implementation

* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30) * Documentation of the common interfaces of c2r and c2c (#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (#32) * clean code * Add numpy-based implementation of spectral ops (#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (#37) * Unittest of op with FFT C2C, C2R and r2c added (#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <lijiaqi0612@163.com> * add fft related options to CMakeLists.txt * fix typos and clean code (#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (#42) * Add api doc and update unittest. (#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (#45) * use dynload for cufft (#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (#55) explicitly specify capture mode for lambdas * fix fft sample (#53) * fix fft sample * update scipy and numpy version for unittests of fft (#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (#57) * Remove cache of cuFFT & Disable ONEMKL (#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: jeff41404 <jeff41404@gmail.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com> Co-authored-by: KP <109694228@qq.com> Co-authored-by: lijiaqi <lijiaqi0612@163.com> Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> Co-authored-by: lijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (PaddlePaddle#30) * Documentation of the common interfaces of c2r and c2c (PaddlePaddle#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (PaddlePaddle#32) * clean code * Add numpy-based implementation of spectral ops (PaddlePaddle#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (PaddlePaddle#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (PaddlePaddle#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (PaddlePaddle#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (PaddlePaddle#37) * Unittest of op with FFT C2C, C2R and r2c added (PaddlePaddle#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <lijiaqi0612@163.com> * add fft related options to CMakeLists.txt * fix typos and clean code (PaddlePaddle#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (PaddlePaddle#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (PaddlePaddle#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (PaddlePaddle#42) * Add api doc and update unittest. (PaddlePaddle#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (PaddlePaddle#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (PaddlePaddle#45) * use dynload for cufft (PaddlePaddle#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (PaddlePaddle#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (PaddlePaddle#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (PaddlePaddle#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (PaddlePaddle#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (PaddlePaddle#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (PaddlePaddle#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (PaddlePaddle#55) explicitly specify capture mode for lambdas * fix fft sample (PaddlePaddle#53) * fix fft sample * update scipy and numpy version for unittests of fft (PaddlePaddle#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (PaddlePaddle#57) * Remove cache of cuFFT & Disable ONEMKL (PaddlePaddle#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: jeff41404 <jeff41404@gmail.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com> Co-authored-by: KP <109694228@qq.com> Co-authored-by: lijiaqi <lijiaqi0612@163.com> Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> Co-authored-by: lijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

fea/init codegen c

Co-authored-by: jianghaicheng <haichengj@graphcore.ai>

* gpu_graph_infer * simplify infer * fix * remove logs * remove logs * change logs

update readme

Merge pull request #29 from qingshui/paddlebox

optimize for async

…addle#34)

Merge develop

optimize load fc tunefile performance.

* update * update readme * update * update

[MTAI-484] fix(build): repleace murand_uniform with murand_uniform2

Is Pattern check

gangliao added feature request labels Sep 3, 2016

gangliao closed this as completed Sep 5, 2016

hedaoyuan mentioned this issue Jul 18, 2017

在云上的机器跑gpu的版本的报错 #2931

Closed

typhoonzero mentioned this issue Aug 4, 2017

Paddle预测在P4机器上运行出错 #3206

Closed

typhoonzero mentioned this issue Mar 21, 2018

RuntimeError: function_attributes(): after cudaFuncGetAttributes: invalid device function #9290

Closed

xiuechen mentioned this issue Oct 28, 2019

预测出core，能帮忙看下啥原因不？paddle训练和预测的版本都是v1.3.0 #20859

Closed

qingqing01 pushed a commit to qingqing01/Paddle that referenced this issue Apr 30, 2020

Merge pull request PaddlePaddle#34 from heavengate/update_paddle_io

d8541ea

update paddle.io

DemoMoon mentioned this issue Mar 24, 2021

oneDNN 如何能提升DeepSpeech的语音处理性能 #31838

Closed

zhangting2020 pushed a commit to zhangting2020/Paddle that referenced this issue Aug 25, 2021

Try to fix compiling error of ci. (PaddlePaddle#34)

834fe71

KPatr1ck pushed a commit to KPatr1ck/Paddle that referenced this issue Sep 10, 2021

Add fft_c2r numpy based implementation for unittest. (PaddlePaddle#34)

fcd9069

* add fft_c2r numpy implementation

thisjiang pushed a commit to thisjiang/Paddle that referenced this issue Oct 28, 2021

Merge pull request PaddlePaddle#34 from Superjomn/fea/init-codegen-c

73ec6d2

fea/init codegen c

gglin001 pushed a commit to graphcore/Paddle-fork that referenced this issue Dec 8, 2021

Fix get device context error (PaddlePaddle#34)

5fc456d

Co-authored-by: jianghaicheng <haichengj@graphcore.ai>

paddle-bot-old bot referenced this issue Jan 6, 2022

update notes/docs

6ab7dbb

paddle-bot-old bot referenced this issue Jan 7, 2022

rm unused lines

76f7556

danleifeng pushed a commit to danleifeng/Paddle that referenced this issue Jun 16, 2022

support graph inference (PaddlePaddle#34)

f19ca37

* gpu_graph_infer * simplify infer * fix * remove logs * remove logs * change logs

zmxdream added a commit to zmxdream/Paddle that referenced this issue Jul 4, 2022

fix hashtable_inl.h (PaddlePaddle#34)

eb10366

niuliling123 pushed a commit to niuliling123/Paddle that referenced this issue Sep 19, 2022

Merge pull request PaddlePaddle#34 from LielinJiang/readme

5ab5b55

update readme

jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022

Merge pull request #34 from jiaoxuewu/paddlebox

1c603d5

Merge pull request #29 from qingshui/paddlebox

jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022

Merge pull request #34 from chao9527/chao9527/PaddleBox

3860903

optimize for async

marsbzp mentioned this issue Jan 11, 2023

多线程调用C++推理库进行RNN算子崩溃问题！！！！ #49737

Open

qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this issue Mar 3, 2023

parameterize lr_decay_factor, step_boundaries and log_period (PaddleP…

c87574b

…addle#34)

chlyzzo mentioned this issue Mar 29, 2023

paddle/fluid/core_avx.so paddle::memory::allocation::MemoryMapFdSet::Clear() #52269

Closed

zyfncg pushed a commit to zyfncg/Paddle that referenced this issue Sep 27, 2023

Merge pull request PaddlePaddle#34 from zyfncg/drr_pass

b731d39

Merge develop

zmxdream pushed a commit to zmxdream/Paddle that referenced this issue Jan 9, 2024

Merge pull request PaddlePaddle#34 from tiancaitzp/paddlebox

1ef774f

optimize load fc tunefile performance.

lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024

update init temperature and reduce rate docs for sa (PaddlePaddle#34)

138fe14

* update * update readme * update * update

hanhaowen-mt pushed a commit to hanhaowen-mt/Paddle that referenced this issue Feb 29, 2024

Merge pull request PaddlePaddle#34 from mthreads/fix_distribution_bug

172dc98

[MTAI-484] fix(build): repleace murand_uniform with murand_uniform2

tc20042008 pushed a commit to tc20042008/Paddle that referenced this issue Mar 7, 2024

Merge pull request PaddlePaddle#34 from feifei-111/cinn-trivalop-fuse

edcdb07

Is Pattern check

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentiment analysis failures: invalid device function #34

Sentiment analysis failures: invalid device function #34

elvinpoon commented Sep 3, 2016

gangliao commented Sep 3, 2016 •

edited

elvinpoon commented Sep 6, 2016

gangliao commented Oct 8, 2016

Sentiment analysis failures: invalid device function #34

Sentiment analysis failures: invalid device function #34

Comments

elvinpoon commented Sep 3, 2016

gangliao commented Sep 3, 2016 • edited

elvinpoon commented Sep 6, 2016

gangliao commented Oct 8, 2016

gangliao commented Sep 3, 2016 •

edited