Something has changed in the last commits ? #326

ghost · 2018-09-20T16:35:43Z

Hi,

I'm using mkl-dnn in my convolution layer for all strided and kernel 1x1 convolutions.
Everything was working 3 days ago. After committing the latest changes in mkl-dnn to my repo (with support for Intel TBB) and rebuilding my code as I always do, all my trained models are working as before except the trained weights are not trained anymore in the sense the model act like an untrained one when testing. Nothing in the non mkl-dnn code has changed to explain this strange behaviour.
In all the convolutions I use a fixed nchw input/output format and oihw format for the weights.
Is it possible there is something changed in the behaviour of mkl-dnn when using convolutions this way?

thanks

Environment

Intel Haswell
Windows 10 x64
Visual Studio 2017
build with openmp and linked with libiomp5md.lib;mklml.lib
excluding vcomp.lib

vpirogov · 2018-09-20T17:15:17Z

TBB support included massive changes related to threading, though the configuration you describe is covered in validation. Could you please run MKL-DNN tests and see whether any fail in your configuration?

ghost · 2018-09-21T00:31:04Z

Hi,

I get errors in some convolution unit tests:

1>------ Build started: Project: RUN_TESTS, Configuration: Release x64 ------
1>Test project C:/Users/dhaen/Downloads/mkl-dnn/build
1> Start 1: api-c
1> 1/35 Test #1: api-c ......................................... Passed 0.10 sec
1> Start 2: test_batch_normalization
1> 2/35 Test #2: test_batch_normalization ...................... Passed 31.33 sec
1> Start 3: test_concat
1> 3/35 Test #3: test_concat ................................... Passed 1.07 sec
1> Start 4: test_convolution_backward_data_f32
1> 4/35 Test #4: test_convolution_backward_data_f32 ............***Failed 253.29 sec
1> Start 5: test_convolution_backward_data_s16s16s32
1> 5/35 Test #5: test_convolution_backward_data_s16s16s32 ...... Passed 37.48 sec
1> Start 6: test_convolution_backward_weights_f32
1> 6/35 Test #6: test_convolution_backward_weights_f32 .........***Failed 335.88 sec
1> Start 7: test_convolution_backward_weights_s16s16s32
1> 7/35 Test #7: test_convolution_backward_weights_s16s16s32 ... Passed 35.25 sec
1> Start 8: test_convolution_format_any
1> 8/35 Test #8: test_convolution_format_any ................... Passed 0.91 sec
1> Start 9: test_convolution_forward_f32
1> 9/35 Test #9: test_convolution_forward_f32 ..................***Failed 146.34 sec
1> Start 10: test_convolution_forward_s16s16s32
1>10/35 Test #10: test_convolution_forward_s16s16s32 ............ Passed 191.28 sec
1> Start 11: test_convolution_forward_u8s8fp
1>11/35 Test #11: test_convolution_forward_u8s8fp ............... Passed 1.30 sec
1> Start 12: test_convolution_forward_u8s8s32
1>12/35 Test #12: test_convolution_forward_u8s8s32 .............. Passed 0.98 sec
1> Start 13: test_convolution_relu_forward_f32
1>13/35 Test #13: test_convolution_relu_forward_f32 .............***Failed 31.32 sec
1> Start 14: test_convolution_relu_forward_neg_slope_f32
1>14/35 Test #14: test_convolution_relu_forward_neg_slope_f32 ...***Failed 31.55 sec
1> Start 15: test_convolution_relu_forward_s16s16s32
1>15/35 Test #15: test_convolution_relu_forward_s16s16s32 ....... Passed 34.71 sec
1> Start 16: test_deconvolution
1>16/35 Test #16: test_deconvolution ............................***Failed 4.00 sec
1> Start 17: test_eltwise
1>17/35 Test #17: test_eltwise .................................. Passed 29.42 sec
1> Start 18: test_gemm
1>18/35 Test #18: test_gemm ..................................... Passed 409.84 sec
1> Start 19: test_iface_attr
1>19/35 Test #19: test_iface_attr ............................... Passed 1.08 sec
1> Start 20: test_iface_pd_iter
1>20/35 Test #20: test_iface_pd_iter ............................ Passed 0.94 sec
1> Start 21: test_inner_product_backward_data
1>21/35 Test #21: test_inner_product_backward_data .............. Passed 1.21 sec
1> Start 22: test_inner_product_backward_weights
1>22/35 Test #22: test_inner_product_backward_weights ........... Passed 3.02 sec
1> Start 23: test_inner_product_forward
1>23/35 Test #23: test_inner_product_forward .................... Passed 1.43 sec
1> Start 24: test_lrn_backward
1>24/35 Test #24: test_lrn_backward ............................. Passed 15.25 sec
1> Start 25: test_lrn_forward
1>25/35 Test #25: test_lrn_forward .............................. Passed 5.76 sec
1> Start 26: test_memory
1>26/35 Test #26: test_memory ................................... Passed 0.96 sec
1> Start 27: test_mkldnn_threading
1>27/35 Test #27: test_mkldnn_threading ......................... Passed 0.97 sec
1> Start 28: test_pooling_backward
1>28/35 Test #28: test_pooling_backward ......................... Passed 87.18 sec
1> Start 29: test_pooling_forward
1>29/35 Test #29: test_pooling_forward .......................... Passed 60.02 sec
1> Start 30: test_relu
1>30/35 Test #30: test_relu ..................................... Passed 3.16 sec
1> Start 31: test_reorder
1>31/35 Test #31: test_reorder .................................. Passed 4.62 sec
1> Start 32: test_softmax_backward
1>32/35 Test #32: test_softmax_backward ......................... Passed 5.06 sec
1> Start 33: test_softmax_forward
1>33/35 Test #33: test_softmax_forward .......................... Passed 4.84 sec
1> Start 34: test_sum
1>34/35 Test #34: test_sum ...................................... Passed 3.47 sec
1> Start 35: benchdnn
1>35/35 Test #35: benchdnn ...................................... Passed 0.03 sec
1>
1>83% tests passed, 6 tests failed out of 35
1>
1>Total Test time (real) = 1798.61 sec
1>
1>The following tests FAILED:
1> 4 - test_convolution_backward_data_f32 (Failed)
1> 6 - test_convolution_backward_weights_f32 (Failed)
1> 9 - test_convolution_forward_f32 (Failed)
1> 13 - test_convolution_relu_forward_f32 (Failed)
1> 14 - test_convolution_relu_forward_neg_slope_f32 (Failed)
1> 16 - test_deconvolution (Failed)
1>Errors while running CTest
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: The command "setlocal
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: "C:\Program Files\CMake\bin\ctest.exe" --force-new-ctest-process -C Release
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: if %errorlevel% neq 0 goto :cmEnd
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: :cmEnd
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: endlocal & call :cmErrorLevel %errorlevel% & goto :cmDone
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: :cmErrorLevel
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: exit /b %1
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: :cmDone
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: if %errorlevel% neq 0 goto :VCEnd
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: :VCEnd" exited with code 8.
1>Done building project "RUN_TESTS.vcxproj" -- FAILED.
========== Build: 0 succeeded, 1 failed, 1 up-to-date, 0 skipped ==========

emfomenk · 2018-09-21T00:57:27Z

@zeno40,

Could you please dump the cmake output?
Also could you please specify the exact hardware?

ghost · 2018-09-21T01:09:37Z

cmake output:
CMake Deprecation Warning at CMakeLists.txt:21 (cmake_policy):
The OLD behavior for policy CMP0048 will be removed from a future version
of CMake.

The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.

CMake Deprecation Warning at CMakeLists.txt:22 (cmake_policy):
The OLD behavior for policy CMP0054 will be removed from a future version
of CMake.

The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.

Selecting Windows SDK version 10.0.17134.0 to target Windows 10.0.17763.
CMAKE_BUILD_TYPE is unset, defaulting to Release
Detecting Intel(R) MKL: trying mklml_intel
Intel(R) MKL: include C:/Users/dhaen/Downloads/mkl-dnn/external/mklml_win_2019.0.20180710/include
Intel(R) MKL: lib C:/Users/dhaen/Downloads/mkl-dnn/external/mklml_win_2019.0.20180710/lib/mklml.lib
Intel(R) MKL: OpenMP lib C:/Users/dhaen/Downloads/mkl-dnn/external/mklml_win_2019.0.20180710/lib/libiomp5md.lib
Intel(R) MKL: dll C:/Users/dhaen/Downloads/mkl-dnn/external/mklml_win_2019.0.20180710/lib/mklml.dll
Intel(R) MKL: OpenMP dll C:/Users/dhaen/Downloads/mkl-dnn/external/mklml_win_2019.0.20180710/lib/libiomp5md.dll
VTune profiling environment is unset
Configuring done

Hardware: Intel Haswell Devils canyon
OS: Windows 10/VS2017

emfomenk · 2018-09-21T01:10:57Z

For CPU -- how many cores do you have?

// still cannot reproduce the issue on my side...

ghost · 2018-09-21T01:14:36Z

I found the culprit! I was compiling the mkldnn project with /std:c++latest instead of the dafault value,
everything is working now, all unit tests are passed.

emfomenk · 2018-09-21T01:53:34Z

:) great, thx for the update!

ghost · 2018-09-27T22:44:22Z

For your information: the same is happening when compiling mkl-dnn with /permissive- conformance mode and the default C++ Language Standard.

ghost changed the title ~~Something has changed in the last commits~~ Something has changed in the last commits ? Sep 20, 2018

vpirogov added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Sep 20, 2018

ghost closed this as completed Sep 21, 2018

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Something has changed in the last commits ? #326

Something has changed in the last commits ? #326

ghost commented Sep 20, 2018 •

edited by ghost

vpirogov commented Sep 20, 2018

ghost commented Sep 21, 2018

emfomenk commented Sep 21, 2018

ghost commented Sep 21, 2018

emfomenk commented Sep 21, 2018

ghost commented Sep 21, 2018 •

edited by ghost

emfomenk commented Sep 21, 2018

ghost commented Sep 27, 2018

Something has changed in the last commits ? #326

Something has changed in the last commits ? #326

Comments

ghost commented Sep 20, 2018 • edited by ghost

Environment

vpirogov commented Sep 20, 2018

ghost commented Sep 21, 2018

emfomenk commented Sep 21, 2018

ghost commented Sep 21, 2018

emfomenk commented Sep 21, 2018

ghost commented Sep 21, 2018 • edited by ghost

emfomenk commented Sep 21, 2018

ghost commented Sep 27, 2018

ghost commented Sep 20, 2018 •

edited by ghost

ghost commented Sep 21, 2018 •

edited by ghost