Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something has changed in the last commits ? #326

Closed
ghost opened this issue Sep 20, 2018 · 8 comments
Closed

Something has changed in the last commits ? #326

ghost opened this issue Sep 20, 2018 · 8 comments
Labels
sighting Suspicious library behavior. Should be promoted to a bug when confirmed

Comments

@ghost
Copy link

ghost commented Sep 20, 2018

Hi,

I'm using mkl-dnn in my convolution layer for all strided and kernel 1x1 convolutions.
Everything was working 3 days ago. After committing the latest changes in mkl-dnn to my repo (with support for Intel TBB) and rebuilding my code as I always do, all my trained models are working as before except the trained weights are not trained anymore in the sense the model act like an untrained one when testing. Nothing in the non mkl-dnn code has changed to explain this strange behaviour.
In all the convolutions I use a fixed nchw input/output format and oihw format for the weights.
Is it possible there is something changed in the behaviour of mkl-dnn when using convolutions this way?

thanks


Environment

  • Intel Haswell
  • Windows 10 x64
  • Visual Studio 2017
    build with openmp and linked with libiomp5md.lib;mklml.lib
    excluding vcomp.lib
@ghost ghost changed the title Something has changed in the last commits Something has changed in the last commits ? Sep 20, 2018
@vpirogov vpirogov added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Sep 20, 2018
@vpirogov
Copy link
Member

TBB support included massive changes related to threading, though the configuration you describe is covered in validation. Could you please run MKL-DNN tests and see whether any fail in your configuration?

@ghost
Copy link
Author

ghost commented Sep 21, 2018

Hi,

I get errors in some convolution unit tests:

1>------ Build started: Project: RUN_TESTS, Configuration: Release x64 ------
1>Test project C:/Users/dhaen/Downloads/mkl-dnn/build
1> Start 1: api-c
1> 1/35 Test #1: api-c ......................................... Passed 0.10 sec
1> Start 2: test_batch_normalization
1> 2/35 Test #2: test_batch_normalization ...................... Passed 31.33 sec
1> Start 3: test_concat
1> 3/35 Test #3: test_concat ................................... Passed 1.07 sec
1> Start 4: test_convolution_backward_data_f32
1> 4/35 Test #4: test_convolution_backward_data_f32 ............***Failed 253.29 sec
1> Start 5: test_convolution_backward_data_s16s16s32
1> 5/35 Test #5: test_convolution_backward_data_s16s16s32 ...... Passed 37.48 sec
1> Start 6: test_convolution_backward_weights_f32
1> 6/35 Test #6: test_convolution_backward_weights_f32 .........***Failed 335.88 sec
1> Start 7: test_convolution_backward_weights_s16s16s32
1> 7/35 Test #7: test_convolution_backward_weights_s16s16s32 ... Passed 35.25 sec
1> Start 8: test_convolution_format_any
1> 8/35 Test #8: test_convolution_format_any ................... Passed 0.91 sec
1> Start 9: test_convolution_forward_f32
1> 9/35 Test #9: test_convolution_forward_f32 ..................***Failed 146.34 sec
1> Start 10: test_convolution_forward_s16s16s32
1>10/35 Test #10: test_convolution_forward_s16s16s32 ............ Passed 191.28 sec
1> Start 11: test_convolution_forward_u8s8fp
1>11/35 Test #11: test_convolution_forward_u8s8fp ............... Passed 1.30 sec
1> Start 12: test_convolution_forward_u8s8s32
1>12/35 Test #12: test_convolution_forward_u8s8s32 .............. Passed 0.98 sec
1> Start 13: test_convolution_relu_forward_f32
1>13/35 Test #13: test_convolution_relu_forward_f32 .............***Failed 31.32 sec
1> Start 14: test_convolution_relu_forward_neg_slope_f32
1>14/35 Test #14: test_convolution_relu_forward_neg_slope_f32 ...***Failed 31.55 sec
1> Start 15: test_convolution_relu_forward_s16s16s32
1>15/35 Test #15: test_convolution_relu_forward_s16s16s32 ....... Passed 34.71 sec
1> Start 16: test_deconvolution
1>16/35 Test #16: test_deconvolution ............................***Failed 4.00 sec
1> Start 17: test_eltwise
1>17/35 Test #17: test_eltwise .................................. Passed 29.42 sec
1> Start 18: test_gemm
1>18/35 Test #18: test_gemm ..................................... Passed 409.84 sec
1> Start 19: test_iface_attr
1>19/35 Test #19: test_iface_attr ............................... Passed 1.08 sec
1> Start 20: test_iface_pd_iter
1>20/35 Test #20: test_iface_pd_iter ............................ Passed 0.94 sec
1> Start 21: test_inner_product_backward_data
1>21/35 Test #21: test_inner_product_backward_data .............. Passed 1.21 sec
1> Start 22: test_inner_product_backward_weights
1>22/35 Test #22: test_inner_product_backward_weights ........... Passed 3.02 sec
1> Start 23: test_inner_product_forward
1>23/35 Test #23: test_inner_product_forward .................... Passed 1.43 sec
1> Start 24: test_lrn_backward
1>24/35 Test #24: test_lrn_backward ............................. Passed 15.25 sec
1> Start 25: test_lrn_forward
1>25/35 Test #25: test_lrn_forward .............................. Passed 5.76 sec
1> Start 26: test_memory
1>26/35 Test #26: test_memory ................................... Passed 0.96 sec
1> Start 27: test_mkldnn_threading
1>27/35 Test #27: test_mkldnn_threading ......................... Passed 0.97 sec
1> Start 28: test_pooling_backward
1>28/35 Test #28: test_pooling_backward ......................... Passed 87.18 sec
1> Start 29: test_pooling_forward
1>29/35 Test #29: test_pooling_forward .......................... Passed 60.02 sec
1> Start 30: test_relu
1>30/35 Test #30: test_relu ..................................... Passed 3.16 sec
1> Start 31: test_reorder
1>31/35 Test #31: test_reorder .................................. Passed 4.62 sec
1> Start 32: test_softmax_backward
1>32/35 Test #32: test_softmax_backward ......................... Passed 5.06 sec
1> Start 33: test_softmax_forward
1>33/35 Test #33: test_softmax_forward .......................... Passed 4.84 sec
1> Start 34: test_sum
1>34/35 Test #34: test_sum ...................................... Passed 3.47 sec
1> Start 35: benchdnn
1>35/35 Test #35: benchdnn ...................................... Passed 0.03 sec
1>
1>83% tests passed, 6 tests failed out of 35
1>
1>Total Test time (real) = 1798.61 sec
1>
1>The following tests FAILED:
1> 4 - test_convolution_backward_data_f32 (Failed)
1> 6 - test_convolution_backward_weights_f32 (Failed)
1> 9 - test_convolution_forward_f32 (Failed)
1> 13 - test_convolution_relu_forward_f32 (Failed)
1> 14 - test_convolution_relu_forward_neg_slope_f32 (Failed)
1> 16 - test_deconvolution (Failed)
1>Errors while running CTest
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: The command "setlocal
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: "C:\Program Files\CMake\bin\ctest.exe" --force-new-ctest-process -C Release
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: if %errorlevel% neq 0 goto :cmEnd
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: :cmEnd
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: endlocal & call :cmErrorLevel %errorlevel% & goto :cmDone
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: :cmErrorLevel
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: exit /b %1
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: :cmDone
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: if %errorlevel% neq 0 goto :VCEnd
1>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(138,5): error MSB3073: :VCEnd" exited with code 8.
1>Done building project "RUN_TESTS.vcxproj" -- FAILED.
========== Build: 0 succeeded, 1 failed, 1 up-to-date, 0 skipped ==========

@emfomenk
Copy link

@zeno40,

Could you please dump the cmake output?
Also could you please specify the exact hardware?

@ghost
Copy link
Author

ghost commented Sep 21, 2018

cmake output:
CMake Deprecation Warning at CMakeLists.txt:21 (cmake_policy):
The OLD behavior for policy CMP0048 will be removed from a future version
of CMake.

The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.

CMake Deprecation Warning at CMakeLists.txt:22 (cmake_policy):
The OLD behavior for policy CMP0054 will be removed from a future version
of CMake.

The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.

Selecting Windows SDK version 10.0.17134.0 to target Windows 10.0.17763.
CMAKE_BUILD_TYPE is unset, defaulting to Release
Detecting Intel(R) MKL: trying mklml_intel
Intel(R) MKL: include C:/Users/dhaen/Downloads/mkl-dnn/external/mklml_win_2019.0.20180710/include
Intel(R) MKL: lib C:/Users/dhaen/Downloads/mkl-dnn/external/mklml_win_2019.0.20180710/lib/mklml.lib
Intel(R) MKL: OpenMP lib C:/Users/dhaen/Downloads/mkl-dnn/external/mklml_win_2019.0.20180710/lib/libiomp5md.lib
Intel(R) MKL: dll C:/Users/dhaen/Downloads/mkl-dnn/external/mklml_win_2019.0.20180710/lib/mklml.dll
Intel(R) MKL: OpenMP dll C:/Users/dhaen/Downloads/mkl-dnn/external/mklml_win_2019.0.20180710/lib/libiomp5md.dll
VTune profiling environment is unset
Configuring done

Hardware: Intel Haswell Devils canyon
OS: Windows 10/VS2017

@emfomenk
Copy link

For CPU -- how many cores do you have?

// still cannot reproduce the issue on my side...

@ghost
Copy link
Author

ghost commented Sep 21, 2018

I found the culprit! I was compiling the mkldnn project with /std:c++latest instead of the dafault value,
everything is working now, all unit tests are passed.

@ghost ghost closed this as completed Sep 21, 2018
@emfomenk
Copy link

:) great, thx for the update!

@ghost
Copy link
Author

ghost commented Sep 27, 2018

For your information: the same is happening when compiling mkl-dnn with /permissive- conformance mode and the default C++ Language Standard.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sighting Suspicious library behavior. Should be promoted to a bug when confirmed
Projects
None yet
Development

No branches or pull requests

2 participants