Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Swish and Mish activations #15808

Merged
merged 13 commits into from
Dec 1, 2019
Merged

Added Swish and Mish activations #15808

merged 13 commits into from
Dec 1, 2019

Conversation

thebhatman
Copy link
Contributor

@thebhatman thebhatman commented Oct 30, 2019

I have added the Swish and Mish activation functions. This resolves #15693

###This Pull request changes
[Feature addition] New activation functions Swish and Mish.

force_builders=Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu-cuda:18.04

build_image:Custom Mac=openvino-2019r3.0
test_modules:Custom Mac=dnn,python2,python3,java

@dkurt
Copy link
Member

dkurt commented Oct 30, 2019

Hi! Thanks for contribution.

Please check that all the backends are valid (OpenCL, CUDA, Intel's Inference Engine). Otherwise keep only default C++ implementation.

@asmorkalov
Copy link
Contributor

cc @VadimLevin

@dkurt
Copy link
Member

dkurt commented Nov 22, 2019

@thebhatman, Please add the tests for both layers.

@thebhatman
Copy link
Contributor Author

Yeah I will add the tests. Where exactly are the tests for activations? Should I define sample neural networks with these activations ? Or is there a test file with all activation functions such as sigmoid and tanh?

@dkurt
Copy link
Member

dkurt commented Nov 22, 2019

@thebhatman, So due the activations are simple, you can test them without test data. So you can just create a test which applies all the backends and compares with reference data (which is computed inside the test). Take a look at test_layers.cpp and test_halide_layers.cpp. The only parameter you need to vary is target and backend.

@thebhatman
Copy link
Contributor Author

I have added the tests in test_halide_layers.cpp and they are passing. It seems like all tests in test_layers.cpp use Caffe Models to test layers by reading data from a file from opencv_extra. Is there something else to be done?

@@ -583,7 +583,7 @@ TEST_P(NoParamActivation, Accuracy)
testInPlaceActivation(lp, backendId, targetId);
}
INSTANTIATE_TEST_CASE_P(Layer_Test_Halide, NoParamActivation, Combine(
/*type*/ Values("TanH", "Sigmoid", "AbsVal", "BNLL"),
/*type*/ Values("TanH", "Sigmoid", "AbsVal", "BNLL", "Swish", "Mish"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good choice! 😄

Copy link
Contributor

@YashasSamaga YashasSamaga Nov 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDA tests are passing for the Swish activation.

The test for DNN_TARGET_CUDA is passing for the Mish activation. The test for DNN_TARGET_CUDA_FP16 is failing.

[ RUN      ] Layer_Test_Halide/NoParamActivation.Accuracy/21, where GetParam() = ("Mish", CUDA/CUDA_FP16)
.../opencv/modules/dnn/test/test_common.impl.hpp:68: Failure
Expected: (normL1) <= (l1), actual: 0.0288897 vs 0.004
.../opencv/modules/dnn/test/test_common.impl.hpp:71: Failure
Expected: (normInf) <= (lInf), actual: 0.0909604 vs 0.02

This means that the errors in the outputs are more than the accepted threshold.

You will have to increase the error tolerance for Mish activation when target is DNN_TARGET_CUDA_FP16. I am not sure what would be the correct way to do this. In the worst case, the test can be skipped for the FP16 target.

The test calls void testInPlaceActivation(LayerParams& lp, Backend backendId, Target targetId) which in turn calls void test(Mat& input, Net& net, Backend backendId, Target targetId, bool skipCheck = false). I think currently there is no way to pass custom thresholds to test. A solution could be to pass thresholds to testInPlaceActivation and test as arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default Thresholds are being called from here https://github.com/thebhatman/opencv/blob/f6221044661dedbf4ad729c8f24c2b72ff37aca0/modules/dnn/test/test_common.hpp#L113. I don't see a way to pass the thresholds as arguments without changing the prototype of testInPlaceActivation .

Copy link
Contributor

@YashasSamaga YashasSamaga Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add new parameters (l1 and lInf) to test and testInPlaceActivation. Their default values would be 0.0.

Inside the test function:

  • if l1 and lInf arguments are zero, set them to the values given by getDefaultThresholds
  • if l1 and lInf arguments are not zero, use them

@dkurt is this ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YashasSamaga, yes, that would be fine. Thanks!

@thebhatman
Copy link
Contributor Author

Is this PR ready to be merged now?

@YashasSamaga
Copy link
Contributor

YashasSamaga commented Nov 24, 2019

To enable the CUDA backend, you have to tick the following CMake options:

  • WITH_CUDA
  • WITH_CUDNN
  • OPENCV_DNN_CUDA

Are you able to build the CUDA backend on your PC?

The CUDA build is failing on CI. Please have a look at CI build. There are errors in the compile log (open the log and search for "error:").

The CI does not run tests on CUDA devices. Hence, you (or someone) has to run them on their PC to verify that the tests are passing. Let me know when you are done with the PR, I'll build and test the PR once.

@thebhatman
Copy link
Contributor Author

thebhatman commented Nov 24, 2019

The error is because of the log function using device::log;. The log is acting on __half type. It is only defined for long double and float.

@thebhatman
Copy link
Contributor Author

Thanks @YashasSamaga . The CUDA builds are now passing. They were failing due to type mismatch errors. I resolved them by using log1pexp instead of using log and exp separately.

@dkurt
Copy link
Member

dkurt commented Nov 24, 2019

@thebhatman, there is no still OpenCL kernels implementation. Please add it or remove unused applyOCL.

dkurt
dkurt previously approved these changes Nov 29, 2019
Copy link
Member

@dkurt dkurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
@alalek, Can we merge it now to master? I'll backport it to 3.4 with https://github.com/dkurt/opencv/tree/thebhatman/Mish_swish

@dkurt dkurt self-assigned this Nov 29, 2019
@dkurt dkurt dismissed their stale review November 29, 2019 12:48

please do not merge yet

Copy link
Member

@dkurt dkurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alalek, can we merge this PR to master and another one to 3.4 branch (#16025)?

@cansik
Copy link

cansik commented Apr 27, 2020

Has this already been released? In 4.3.0, I still get the error that it is unsupported:

OpenCV(4.3.0) /Users/travis/build/bytedeco/javacpp-presets/opencv/cppbuild/macosx-x86_64/opencv-4.3.0/modules/dnn/src/darknet/darknet_io.cpp:821: error: (-212:Parsing error) Unsupported activation: mish in function 'ReadDarknetFromCfgStream'

@dkurt
Copy link
Member

dkurt commented Apr 27, 2020

@cansik, If I'm not mistaken, this fix for TensorFlow importer. Please track #17148

@BlueNotesRobot
Copy link

From what I found, mish is available in 3.4.10 but not in 4.3.0.
Would be keen to know when we can expect it in 4.3.0

@YashasSamaga
Copy link
Contributor

@BlueNotesRobot It is already there on master.

@BlueNotesRobot
Copy link

Thanks @YashasSamaga I just compiled the newest master and can confirm it worked.
Master is currently at version '4.4.0-pre'.

a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
* Added Swish and Mish activations

* Fixed whitespace errors

* Kernel implementation done

* Added function for launching kernel

* Changed type of 1.0

* Attempt to add test for Swish and Mish

* Resolving type mismatch for log

* exp from device

* Use log1pexp instead of adding 1

* Added openCL kernels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: dnn feature port/backport done Label for maintainers. Authors of PR can ignore this
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request opencv-dnn] Swish Activation, Mish Activation
7 participants