update #24

niuliling123 · 2021-08-11T05:30:29Z

PR types

PR changes

Describe

* add gradients_with_optimizer api * modify gradients_with_optimizer * add gradients_with_optimizer api into paddle.auto.backward_mode * add gradients_with_optimizer test case * add doc for gradients_with_optimizer * add doc for gradients_with_optimizer

* fix paddle.optimizer test=document_fix * fix paddle.optimizer test=document_fix * fix bugs in paddle.nn.functional document test=document_fix * fix bugs in paddle.nn.functional document test=document_fix * fix bugs in paddle.nn.functional document test=document_fix * fix bugs in paddle.nn.functional document test=document_fix

* revert commit id 34212

* fix bug of p2p for partial * fix error

* add not_equal NPU op * add not_equal NPU op * add not_equal NPU op * add not_equal NPU op

…elop (#34625)

* add NPU support for zero_copy_tensor. * revert unnesessary codes. * revert unnesessary codes.

…34519)

…st apis (#34310) * replace boost::algorithm::ends_with with self define ends_with function * remove BOOST macro in certain operators * remove boost::lexical_cast * add test for string_helper * add more test case for string_helper * modify join_string func and test case * fix build_strategy_test failed bug * remove string_helper_test from parallel_UT_rule.py

* Support Mixed Precision training in @to_static * fix block.vars logic * fix GPU training loss diff * remove unused code

* support bool dtype for paddle.sum

…ly (#34556) * integrated gast library * integrated gast library * fix unittest and remove ast2.py * remove 'gast' from __all__ in __init__.py * add copyright in other files * fix copyright

* first test version * add test exec; * add data transfer; test=develop * add new exec head; * add memcpy; test=develop * add python fetch * add new test * add graph node; test=develop * remove useless new executor test; test=develop * remove gperf dependency; test=develop * fix compile bugs; test=develop * remove useless code; test=develop * remove useless code; test=develop * add uni test; test=develop * polish code; test=develop * polish code; test=develop * add interpreter cmakefile; test=develop * remove useless code; test=develop

* Add relu6 and relu6_grad npu op * fixed pre-commit-config.yaml * fixed for CI

* [NPU] Support npu op: (1) cos (2) cos_grad * Update test_cos_op_npu.py * Update activation_op_npu.cc * rm redundant {1}

…#34603) * fix ut * decrease gpu memory consumption * remove exclusive

* add eye npu op * remove useless headers * code style * Update eye_op_npu.cc * Update eye_op_npu.cc * remove useless code in test file * code style check * change Copyright to 2021 * add test case and do some fix * fix * update code * fix for CI * return * fix

* add lock * fix typo

This reverts commit 090c863.

* Fix error of HSigmoidLoss * update unittest * update unittest

* Support npu kernel for expand_as_v2 op * mofify the registry data type name * fix test unit * fix npu compile error, test=develop * fix compute function Co-authored-by: qili93 <qili93@qq.com>

* Support npu kernel for tile op * modify according to the comments * fix compute function

* fix for div zero * fix err;test=develop * fix lod

…any (#34613) * add any.hpp to utils and replace boost::any with self defined paddle::any * add copy any.hpp to custom op depends * modify any.hpp include path * remove boost from setup.py.in * add copy any.hpp to custom op depends * move any.hpp to paddle/utils/ dirs * move any.h to extension/include direction * copy utils to right directions

* fix npu compile error, test=develop * add fill constant batch size lilke op npu,test=develop Co-authored-by: qili93 <qili93@qq.com>

* Support npu kernel for fill_any_like op * modify the description of exception * remove useless template element * remove useless decorator * fix the code format error

* [NPU] add squared_l2_norm squared_l2_norm and tests * [NPU] replace Square&ReduceSumD with SquareSumV1

添加Kernel primitives api： ReadData, WriteData ComputeFunctor

#34642) * fix npu compile error, test=develop * [NPU] Support npu kernel for flatten_contiguous_range op, test=develop * [NPU] Support npu kernel for flatten_contiguous_range op, test=develop * [NPU] Support npu kernel for flatten_contiguous_range op, test=develop * [NPU] Support npu kernel for flatten_contiguous_range op, test=develop * [NPU] Support npu kernel for flatten_contiguous_range op, test=develop * [NPU] Support npu kernel for flatten_contiguous_range op, test=develop * [NPU] Support npu kernel for flatten_contiguous_range op, test=develop * Update flatten_op_npu.cc * Update flatten_op_npu.cc Co-authored-by: qili93 <qili93@qq.com>

* add not used output var to gc_check_list; test=develop * add useless output to gc check list; test=develop

* Add NPU kernel for TopKV2 op * deleted unnecessary cache file static_mode_white_list.cpython-37.pyc * A draft for error checking * A commit with accuracy error for float32 data * Modify codes according to the review comments * Modify codes according to the review comments

)

* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (PaddlePaddle#30) * Documentation of the common interfaces of c2r and c2c (PaddlePaddle#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (PaddlePaddle#32) * clean code * Add numpy-based implementation of spectral ops (PaddlePaddle#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (PaddlePaddle#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (PaddlePaddle#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (PaddlePaddle#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (PaddlePaddle#37) * Unittest of op with FFT C2C, C2R and r2c added (PaddlePaddle#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <lijiaqi0612@163.com> * add fft related options to CMakeLists.txt * fix typos and clean code (PaddlePaddle#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (PaddlePaddle#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (PaddlePaddle#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (PaddlePaddle#42) * Add api doc and update unittest. (PaddlePaddle#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (PaddlePaddle#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (PaddlePaddle#45) * use dynload for cufft (PaddlePaddle#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (PaddlePaddle#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (PaddlePaddle#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (PaddlePaddle#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (PaddlePaddle#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (PaddlePaddle#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (PaddlePaddle#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (PaddlePaddle#55) explicitly specify capture mode for lambdas * fix fft sample (PaddlePaddle#53) * fix fft sample * update scipy and numpy version for unittests of fft (PaddlePaddle#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (PaddlePaddle#57) * Remove cache of cuFFT & Disable ONEMKL (PaddlePaddle#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: jeff41404 <jeff41404@gmail.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com> Co-authored-by: KP <109694228@qq.com> Co-authored-by: lijiaqi <lijiaqi0612@163.com> Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> Co-authored-by: lijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

Add first-order-model to applications

kuizhiqing and others added 30 commits August 4, 2021 15:38

Elastic as module (#34572)

1f76a2f

Revert pull request 34212 (#34558)

0989211

* revert commit id 34212

add CuddEvent destructor function (#34610)

090c863

[HybridParallel]Fix bug of p2p for partial_send/recv (#34615)

4cc3d9a

* fix bug of p2p for partial * fix error

[NPU] Support npu op flatten2 (#34579)

8144a73

add not_equal NPU op (#34560)

7e707ce

* add not_equal NPU op * add not_equal NPU op * add not_equal NPU op * add not_equal NPU op

optimize ClipGradByGlobalNorm (#34586)

4d6f8f2

[pass_enhance]fix the mkldnn model performance drop problem. test=dev…

e47d8a5

…elop (#34625)

[NPU] Support npu op index_select (#34611)

7a38b76

add NPU support for zero_copy_tensor. (#34629)

a68709d

* add NPU support for zero_copy_tensor. * revert unnesessary codes. * revert unnesessary codes.

Support Ternary ops in elmentwise and broadcast (#33976)

1d7b75d

optimize pipeline performance with recompute and amp, test=allcase (#…

911c859

…34519)

[Dy2Stat]Support Mixed Precision training in @to_static (#34562)

a842828

* Support Mixed Precision training in @to_static * fix block.vars logic * fix GPU training loss diff * remove unused code

fix output dtype for paddle.sum (#34313)

ff062a4

* support bool dtype for paddle.sum

[Dy2st]Integrated gast library to fix compatibility problem permanent…

a9ee383

…ly (#34556) * integrated gast library * integrated gast library * fix unittest and remove ast2.py * remove 'gast' from __all__ in __init__.py * add copyright in other files * fix copyright

[NPU] Add relu6 and relu6_grad npu op (#34596)

6839994

* Add relu6 and relu6_grad npu op * fixed pre-commit-config.yaml * fixed for CI

[NPU] Support npu op: (1) cos (2) cos_grad (#34573)

6151ccd

* [NPU] Support npu op: (1) cos (2) cos_grad * Update test_cos_op_npu.py * Update activation_op_npu.cc * rm redundant {1}

rm detach (#34644)

6c8a10a

[paddle-trt] fix_teller_reshape (#34583)

4a52c0c

fix dygraph has_grad (#34649)

68377b4

Fix ut test_pe_fix_op_run_order by using smaller model and batch size (…

06651c4

…#34603) * fix ut * decrease gpu memory consumption * remove exclusive

fix log_softmax if any dimension is 0-d (#34635)

436a9f1

[NPU]Use another method to void c_allreduce_sum core! (#34619)

c91b1e0

del wait in sharding for npu (#34637)

ce73349

fix npu compile error, test=develop (#34656)

c16421c

zhwesky2010 and others added 26 commits August 9, 2021 14:23

Increase the speed of incremental compilation (#34616)

aab4d6e

limit chunk.axis (#34630)

3380778

[NPU] Support npu op flatten2_grad (#34669)

7afd31b

fix_trt_int8 (#34704)

8009257

[NPU] add lock for npu_pinned_allocator (#34700)

e285258

* add lock * fix typo

Revert "add CuddEvent destructor function (#34610)" (#34720)

bf54534

This reverts commit 090c863.

Fix error of HSigmoidLoss (#34719)

3f32b73

* Fix error of HSigmoidLoss * update unittest * update unittest

Support npu kernel for expand_as_v2 op (#34620)

202c240

* Support npu kernel for expand_as_v2 op * mofify the registry data type name * fix test unit * fix npu compile error, test=develop * fix compute function Co-authored-by: qili93 <qili93@qq.com>

Support npu kernel for tile op (#34606)

8a6aa59

* Support npu kernel for tile op * modify according to the comments * fix compute function

kill all procs on exiting (#34741)

84eb675

fix for div zero (#34724)

d86c26d

* fix for div zero * fix err;test=develop * fix lod

add cudaEvent destructor function (#34734)

f30a5c4

[hybrid] refine sharding code (#34678)

a160379

[bug fix] fix unfold fpe bug (#34673)

4f4662b

fix a quantization bug (#34647)

cfd49ac

[NPU] Support op kernel for Fill constant batch size like op (#34721)

ed2641c

* fix npu compile error, test=develop * add fill constant batch size lilke op npu,test=develop Co-authored-by: qili93 <qili93@qq.com>

Support npu op fill_any_like (#34518)

e8df322

* Support npu kernel for fill_any_like op * modify the description of exception * remove useless template element * remove useless decorator * fix the code format error

[NPU] add squared_l2_norm squared_l2_norm_grad and tests (#34708)

b64312f

* [NPU] add squared_l2_norm squared_l2_norm and tests * [NPU] replace Square&ReduceSumD with SquareSumV1

fix format_string_append test cast,test=develop (#34753)

8b9bd16

Kernel primitives api (#34672)

8f9d573

添加Kernel primitives api： ReadData, WriteData ComputeFunctor

Add no need output to gc check list (#34754)

17c1dae

* add not used output var to gc_check_list; test=develop * add useless output to gc check list; test=develop

modified reduce_sum_op and reduce_mean_op for higher_performance (#32885

6a9fac1

)

Optimize fused allreduce in raw program (#34509)

4d2994c

niuliling123 merged commit 7addd79 into niuliling123:develop Aug 11, 2021

niuliling123 pushed a commit that referenced this pull request Sep 19, 2022

Merge pull request #24 from LielinJiang/first-order

a7088eb

Add first-order-model to applications

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update #24

update #24

niuliling123 commented Aug 11, 2021

update #24

update #24

Conversation

niuliling123 commented Aug 11, 2021

PR types

PR changes

Describe