New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added DNN Darknet Yolo v2 for object detection #9705

Merged
merged 1 commit into from Oct 10, 2017

Conversation

7 participants
@AlexeyAB
Contributor

AlexeyAB commented Sep 24, 2017

opencv_extra=dnn_model_darknet_yolo_v2

This pullrequest changes

Added neural network Darknet Yolo v2 for object detection: https://pjreddie.com/darknet/yolo/
Added example of usage: yolo_object_detection.cpp / example_dnn-yolo_object_detection.exe

Supported layers:

  • route (as concat-layer)
  • reorg (as an addition to the reshape-layer)
  • maxpool
  • convolutional (conv+bn+relu)
  • region (detection_out) - added layer

Merge with extra: opencv/opencv_extra#385


Comparison of use:

  • original Darknet-Yolo-v2: darknet.exe detector test data/voc.data yolo-voc.cfg yolo-voc.weights -i 0 -thresh 0.24 data/dog.jpg

  • OpenCV Yolo example: example_dnn-yolo_object_detection.exe -cfg=yolo/yolo.cfg -model=yolo/yolo.weights -image=yolo/dog.jpg -min_confidence=0.24


Comparison of results OpenCV-example vs original Darknet: https://github.com/pjreddie/darknet

For cfg, weights and jpg-s from: https://drive.google.com/drive/folders/0BwRgzHpNbsWBN3JtSjBocng5YW8

  • Network resolution: 416 x 416
  • threshold = 0.24
  • nms-threshold = 0.4
  1. yolo.cfg & yolo.weights

    • dog.jpg using yolo.cfg
      coco_dog

    • eagle.jpg using yolo.cfg
      coco_eagle

    • giraffe.jpg using yolo.cfg
      coco_giraffe


  1. yolo-voc.cfg & yolo-voc.weights

    • dog.jpg using yolo-voc.cfg
      voc_dog

    • eagle.jpg using yolo-voc.cfg
      voc_eagle

    • giraffe.jpg using yolo-voc.cfg
      voc_giraffe


How to train (to detect your custom objects): https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects


Accuracy-speed:

68747470733a2f2f6873746f2e6f72672f66696c65732f6132342f3231652f3036382f61323432316530363839666234336630383538346465396434346332323135662e6a7067

68747470733a2f2f6873746f2e6f72672f66696c65732f3361362f6664662f6235332f33613666646662353333663334636565396235326264643962623062313964392e6a7067

@vpisarev

This comment has been minimized.

Contributor

vpisarev commented Sep 25, 2017

@AlexeyAB, thank you, this is very valuable contribution! Could you please add some regression test(s) for this functionality?

@AlexeyAB

This comment has been minimized.

Contributor

AlexeyAB commented Sep 25, 2017

@vpisarev I added: modules/dnn/test/test_darknet_importer.cpp
Also added test data and models (cfg, weights) for DNN Darknet Yolo v2: opencv/opencv_extra#385

for (it_type i = net->layers_cfg.begin(); i != net->layers_cfg.end(); ++i) {
++layers_counter;
std::map<std::string, std::string> &layer_params = i->second;
std::string layer_type = layer_params["type"];

This comment has been minimized.

@dkurt

dkurt Sep 26, 2017

Member

Please, add an assertion for unknown layer types to prevent unexpected errors. In example, I can't read any model now because an every layer_type ends with ] character (convolutional], maxpool]). (Ubuntu OS).

This comment has been minimized.

@dkurt

dkurt Sep 27, 2017

Member

It works now but Reproducibility_TinyYoloVoc and Reproducibility_YoloVoc tests are failed for me. Do them passed locally?

* @param darknetModel path to the .weights file with learned network.
* @returns Pointer to the created importer, NULL in failure cases.
*/
CV_EXPORTS_W Ptr<Importer> createDarknetImporter(const String &cfgFile, const String &darknetModel = String());

This comment has been minimized.

@dkurt

dkurt Sep 26, 2017

Member

We defined methods like createCaffeImporter as deprecated. Please keep only readNetFromDarknet.

cv::Mat frame = cv::imread(parser.get<string>("image"), -1);
if (frame.channels() == 4)

This comment has been minimized.

@dkurt

This comment has been minimized.

@dkurt

This comment has been minimized.

@AlexeyAB

AlexeyAB Sep 26, 2017

Contributor

@dkurt I fixed it.
Initially I did it as in ssd_object_detection.cpp and I thought that this has some hidden meaning :)

if (frame.channels() == 4)
cvtColor(frame, frame, cv::COLOR_BGRA2BGR);
//! [Prepare blob]
Mat preprocessedFrame = preprocess(frame, network_width, network_height);

This comment has been minimized.

@dkurt

dkurt Sep 26, 2017

Member

Please, use blobFromImage's arguments to make preprocessing (http://docs.opencv.org/3.3.0/d6/d0f/group__dnn.html#ga0507466a789702eda8ffcdfa37f4d194).

return false;
// Darknet ROUTE-layer
if (useRoute) return true;

This comment has been minimized.

@dkurt

dkurt Sep 27, 2017

Member

Is there some difference between Route layer and Concat? getMemoryShapes returns true if layer can work in-place (all element-wise layers).

This comment has been minimized.

@AlexeyAB

AlexeyAB Sep 27, 2017

Contributor

I don't know why, but it doesn't work for Yolo if getMemoryShapes returns false.
Route-layer simply copies unchanged outputs from several layers: https://github.com/pjreddie/darknet/blob/master/src/route_layer.c#L83

Used copy_cpu() with INCX=1 and INCY=1: https://github.com/pjreddie/darknet/blob/master/src/blas.c#L208

This comment has been minimized.

@dkurt

dkurt Sep 30, 2017

Member

@AlexeyAB, it seems to me problem is in route layer with a single input (that means problem is in current concat layer with #inputs == 1), https://github.com/pjreddie/darknet/blob/master/cfg/yolo-voc.cfg#L208. Is it used like an identity layer, right?

This comment has been minimized.

@AlexeyAB

AlexeyAB Sep 30, 2017

Contributor

@dkurt Yes, route with a single input (bottom layer) is used as identity layer.

This comment has been minimized.

@dkurt

dkurt Oct 3, 2017

Member

@AlexeyAB, could you add an extra branch during route layer creation: add Concat layer if number of inputs more than 1 or Identity layer otherwise?

This comment has been minimized.

@AlexeyAB

AlexeyAB Oct 3, 2017

Contributor

@dkurt I added identity layer for the 1 input. But why concat layer can't work with 1 input, and why there is no CV_Assert for this case?

setParams.setConcat(layers_vec.size(), layers_vec.data());
}
else if (layer_type == "reorg")

This comment has been minimized.

@dkurt

dkurt Sep 27, 2017

Member

I'm a bit confused about reorg layer. Let the input is:

channel_0  channel_1  channel_2  channel_3
0 1        4 5        8 9        c d
2 3        6 7        a b        e f

and reorgStride = 2. So an output shape is 4x4x1 and values are:

output
1 4 1 5
8 c 9 d
2 6 3 7
a e b f

?

This comment has been minimized.

@AlexeyAB

AlexeyAB Sep 27, 2017

Contributor

I left unchanged a bit strange original implementation of this layer. It increases the field of view of each final activation.
Reshape: 26 x 26 x 64 -> 13 x 13 x 256
reorg

For stride = 2

input
    0, 1, 2, 3,
    4, 5, 6, 7, 
    8, 9, a, b,
    c, d, e, f
output
channel_0  channel_1  channel_2  channel_3
0 2        1 3        4 6        5 7
8 a        9 b        c e        d f

This comment has been minimized.

@dkurt

dkurt Sep 27, 2017

Member

Thanks! Anyway I suggest to replace it as a single layer or think how we can do the same transformations using existing ones (Permute, Reshape). Reshape layer doesn't change the data by definition neither in one of the frameworks.

This comment has been minimized.

@AlexeyAB

AlexeyAB Sep 30, 2017

Contributor

@dkurt I added reorg as separate layer reorg_layer.cpp

setParams.setReshape(stride, current_shape.input_channels, current_shape.input_h, current_shape.input_w);
current_shape.input_channels = 256;

This comment has been minimized.

@dkurt

dkurt Sep 27, 2017

Member

Magic number?

@dkurt

This comment has been minimized.

Member

dkurt commented Sep 27, 2017

@AlexeyAB, Thank you for the valuable contribution! We need to test all the new usage carefully. Can you add some unit tests with small few-layer networks like we do for the other importers? (https://github.com/opencv/opencv/blob/master/modules/dnn/test/test_torch_importer.cpp and https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/torch/torch_gen_test_data.lua, https://github.com/opencv/opencv/blob/master/modules/dnn/test/test_tf_importer.cpp and https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/tensorflow/generate_tf_models.py). In example, write simple configs, run darknet to initialize the weights, pass some random input and get the output, put configs/weights/inputs/outputs into the opencv_extra/testdata/dnn darknet subfolder?

@AlexeyAB

This comment has been minimized.

Contributor

AlexeyAB commented Sep 27, 2017

@dkurt So I already added testdata and models for object detection using DNN Darknet Yolo v2 to the opencv_extra: opencv/opencv_extra#385

  • testdata/dnn/dogr.jpg - test image resized to the network size 416x416, to eliminate the side effects of resizing
  • testdata/dnn/tiny-yolo-voc.cfg - tiny model of Yolo v2 for Pascal VOC dataset
  • testdata/dnn/yolo-voc.cfg - full model of Yolo v2 for Pascal VOC dataset
  • Changed testdata/dnn/download_models.py:
    • Downloads https://pjreddie.com/media/files/yolo-voc.weights - full model of Yolo v2 trained for Pascal VOC dataset
    • Downloads https://pjreddie.com/media/files/tiny-yolo-voc.weights - tiny model of Yolo v2 trained for Pascal VOC dataset

In this pull-request there is: modules/dnn/test/test_darknet_importer.cpp

@dkurt

This comment has been minimized.

Member

dkurt commented Sep 27, 2017

@AlexeyAB, yeah, it's great, but I meant tests for separate layers. First of all it's necessary to protect your work done from the bugs that might appear in future development. Next thing is that BuildBot doesn't test these models for now because they aren't there. My local tests are failed and I think we can solve a problem by small checks for separate layers.

[----------] 1 test from Reproducibility_TinyYoloVoc
[ RUN      ] Reproducibility_TinyYoloVoc.Accuracy
unknown file: Failure
C++ exception with description "/home/dkurtaev/opencv/modules/ts/src/ts_func.cpp:1374: error: (-215) src1.type() == src2.type() && src1.size == src2.size in function norm
" thrown in the test body.
[  FAILED  ] Reproducibility_TinyYoloVoc.Accuracy (101 ms)
[----------] 1 test from Reproducibility_TinyYoloVoc (101 ms total)

[----------] 1 test from Reproducibility_YoloVoc
[ RUN      ] Reproducibility_YoloVoc.Accuracy
/home/dkurtaev/opencv/modules/dnn/test/test_common.hpp:54: Failure
Expected: (normL1) <= (l1), actual: 0.000232658 vs 1e-05
/home/dkurtaev/opencv/modules/dnn/test/test_common.hpp:57: Failure
Expected: (normInf) <= (lInf), actual: 0.00485086 vs 0.0001
[  FAILED  ] Reproducibility_YoloVoc.Accuracy (317 ms)
[----------] 1 test from Reproducibility_YoloVoc (317 ms total)

I referenced how we write unit tests for different frameworks. The binary size of required data is not so huge (i.e. less than 0.5MB for TensorFlow layers) and you can add it in a single PR @ opencv_extra.

@AlexeyAB

This comment has been minimized.

Contributor

AlexeyAB commented Sep 28, 2017

@dkurt

  1. I replaced the test image from dogr.jpg to the lossless dog416.png in the opencv_extra and now it works.
  2. I added tests for layers: Region and Reorg. Added to the opencv_extra: region.cfg, region.npy, region.input.npy, reorg.cfg, reorg.npy, reorg.input.npy.

All tests passed on both: Windows 7 x64 and Linux Debian 8.2 x64

[----------] 2 tests from Test_Darknet
[ RUN      ] Test_Darknet.read_tiny_yolo_voc
[       OK ] Test_Darknet.read_tiny_yolo_voc (0 ms)
[ RUN      ] Test_Darknet.read_yolo_voc
[       OK ] Test_Darknet.read_yolo_voc (1 ms)
[----------] 2 tests from Test_Darknet (3 ms total)

[----------] 1 test from Reproducibility_TinyYoloVoc
[ RUN      ] Reproducibility_TinyYoloVoc.Accuracy
[       OK ] Reproducibility_TinyYoloVoc.Accuracy (134 ms)
[----------] 1 test from Reproducibility_TinyYoloVoc (135 ms total)

[----------] 1 test from Reproducibility_YoloVoc
[ RUN      ] Reproducibility_YoloVoc.Accuracy
[       OK ] Reproducibility_YoloVoc.Accuracy (475 ms)
[----------] 1 test from Reproducibility_YoloVoc (475 ms total)
...
[----------] 1 test from Layer_Test_Region
[ RUN      ] Layer_Test_Region.Accuracy
[       OK ] Layer_Test_Region.Accuracy (2 ms)
[----------] 1 test from Layer_Test_Region (3 ms total)

[----------] 1 test from Layer_Test_Reorg
[ RUN      ] Layer_Test_Reorg.Accuracy
[       OK ] Layer_Test_Reorg.Accuracy (0 ms)
[----------] 1 test from Layer_Test_Reorg (1 ms total)

Results for comparison with OpenCV version are obtained on Linux Debian 8.2 using the current last commit of Darknet Yolo v2 compiled with GPU=0, OPENMP=1 and OpenCV=1: https://github.com/pjreddie/darknet

Using commands:

  • ./darknet detector test ./cfg/voc.data ./cfg/tiny-yolo-voc.cfg ./tiny-yolo-voc.weights -thresh 0.24 ./dog416.png

  • ./darknet detector test ./cfg/voc.data ./cfg/yolo-voc.cfg ./yolo-voc.weights -thresh 0.24 ./dog416.png

}
net->transpose = (net->major_ver > 1000) || (net->minor_ver > 1000);
layerShape current_shape;

This comment has been minimized.

@dkurt

dkurt Sep 30, 2017

Member

Why we track shapes? Doesn't weights file contain kernels shapes?

This comment has been minimized.

@AlexeyAB

AlexeyAB Sep 30, 2017

Contributor

Yes, weights file doesn't contain kernels shapes.
Also Darknet tracks layers shapes while parsing a cfg-file:

int convolutional_out_height(convolutional_layer l)
{
    return (l.h + 2*l.pad - l.size) / l.stride + 1;
}

This comment has been minimized.

@dkurt

dkurt Oct 3, 2017

Member

@AlexeyAB, May be we can remove at least width/height sizes tracking? As I can see only current_shape.input_channels is used to read convolutional layer weights.

ifile.open(darknetModel, std::ios::binary);
CV_Assert(ifile.is_open());
ifile.read(reinterpret_cast<char *>(&net->major_ver), sizeof(int32_t));

This comment has been minimized.

@dkurt

dkurt Sep 30, 2017

Member

Version numbers are used only to decide how many bytes to skip for seen value. transpose aren't used at all. Please make all unused variables from NetParameter are local.

return false;
// Darknet ROUTE-layer
if (useRoute) return true;

This comment has been minimized.

@dkurt

dkurt Sep 30, 2017

Member

@AlexeyAB, it seems to me problem is in route layer with a single input (that means problem is in current concat layer with #inputs == 1), https://github.com/pjreddie/darknet/blob/master/cfg/yolo-voc.cfg#L208. Is it used like an identity layer, right?

@AlexeyAB

This comment has been minimized.

Contributor

AlexeyAB commented Sep 30, 2017

@dkurt Yes, route layer with a single input (bottom layer) is used like an identity layer.

void setMaxpool(size_t kernel, size_t pad, size_t stride, size_t channels_num)
{
cv::dnn::experimental_dnn_v1::LayerParams maxpool_param;
maxpool_param.set<cv::String>("pool", "max");

This comment has been minimized.

@dkurt

dkurt Sep 30, 2017

Member

Please, setup only actual parameters: "pool", "kernel_size", "pad", "stride".

This comment has been minimized.

@AlexeyAB

AlexeyAB Sep 30, 2017

Contributor

Ok. Also required maxpool_param.set<cv::String>("pad_mode", "SAME"); for odd sizes of layers.

This comment has been minimized.

@dkurt

dkurt Oct 3, 2017

Member

However only one way of padding strategy is used: manual values or padMode ("SAME", "VALID") from TensorFlow. Please take a look on the "ceil_mode" flag instead: https://github.com/opencv/opencv/blob/master/modules/dnn/src/layers/pooling_layer.cpp#L629.

This comment has been minimized.

@AlexeyAB

AlexeyAB Oct 3, 2017

Contributor
  • Accuracy test passed for Tiny-Yolo if padMode="SAME" with any ceil_mode value.
  • Accuracy test can't be passed for Tiny-Yolo for any values of padMode (padMode="VALID" or if padMod isn't set) with any ceil_mode value.
int w2 = i*reorgStride + offset % reorgStride;
int h2 = j*reorgStride + offset / reorgStride;
int out_index = w2 + width*reorgStride*(h2 + height*reorgStride*c2);
dstData[in_index] = srcData[out_index];

This comment has been minimized.

@dkurt

dkurt Sep 30, 2017

Member

Is there no typo in in<->out indices place?

This comment has been minimized.

@AlexeyAB

AlexeyAB Sep 30, 2017

Contributor

No, there is no typo, initially I left unchanged a bit strange original implementation of this layer.
But now I have changed this place so that there is no confusion.

CV_Assert(outputs[0][0] > 0 && outputs[0][1] > 0 && outputs[0][2] > 0 && outputs[0][3] > 0);
return true;

This comment has been minimized.

@dkurt

dkurt Sep 30, 2017

Member

It seems to me Reorg layer can't work in-place. getMemoryShapes returns true if layer can do it.

{
CV_Assert(inputs.size() > 0);
outputs = std::vector<MatShape>(inputs.size(), shape(inputs[0][1] * inputs[0][2] * anchors, inputs[0][3] / anchors));
return true;

This comment has been minimized.

@dkurt

dkurt Sep 30, 2017

Member

The same as Reorg layer: it should returns false.

darknet::LayerParameter lp;
std::string layer_name = toString(layer_id);
if (use_batch_normalize || use_relu) layer_name = "conv_" + layer_name;

This comment has been minimized.

@dkurt

dkurt Sep 30, 2017

Member

It's better to name layers using type prefix every time. Moreover some of layers just named with numbers and it's hard to debug them.

}
cv::dnn::experimental_dnn_v1::LayerParams getParamConvolution(int kernel, int pad,
int stride, int filters_num, int channels_num)

This comment has been minimized.

@dkurt

dkurt Oct 3, 2017

Member

Unused variable channels_num

fused_layer_names.push_back(last_layer);
}
void setMaxpool(size_t kernel, size_t pad, size_t stride, size_t channels_num)

This comment has been minimized.

@dkurt

dkurt Oct 3, 2017

Member

Unused variable channels_num

std::string top(const int index) const { return layer_name; }
};
struct layerShape {

This comment has been minimized.

@dkurt

dkurt Oct 6, 2017

Member

Unused structure

params.blobs = blobs;
}
void setLastLayerName(std::string layer_name)

This comment has been minimized.

@dkurt

dkurt Oct 6, 2017

Member

Unused function

inputs[0][3] / reorgStride));
CV_Assert(outputs[0][0] > 0 && outputs[0][1] > 0 && outputs[0][2] > 0 && outputs[0][3] > 0);

This comment has been minimized.

@dkurt

dkurt Oct 6, 2017

Member

Please, add an assertion that total(outputs[0]) == total(inputs[0]).

int out_c = channels / (reorgStride*reorgStride);
for (int k = 0; k < channels; ++k) {

This comment has been minimized.

@dkurt

dkurt Oct 6, 2017

Member

Please make it more clear: iterate over output dimensions and map them to input ones.

This comment has been minimized.

@AlexeyAB

AlexeyAB Oct 6, 2017

Contributor

Most likely there was made a logical mistake in the original version of Darknet in the reorg layer: https://github.com/pjreddie/darknet/blob/master/src/blas.c#L9

It works as I described if called reorg(input, output, out_w, out_h, _out_c, ...); #9705 (comment)

But in the original Darknet version function is called in this way reorg(input, output, in_w, in_h, in_c, ...);, so the one-to-one correspondence of the input and output parameters is preserved, but very strange permutations occur.

But because original Darknet works with this implementation of reorg and all models trained using it, then we can't fix this logical mistake.

Why the author has not found and corrected this error? I think:

  • Perhaps this logical error does not spoil the detection accuracy, so it was not detected.
  • Theoretically, we can assume that this error even increased the accuracy - so the author found it and left it.

iterate over output dimensions and map them to input ones.

So I can implement it in a such way,
but it works correctly only if (in_w % 2 == 0 && in_h % 2 == 0 && in_c % 4 == 0) : http://coliru.stacked-crooked.com/a/b962d20938362d4f

void reorg_my(const float*const srcData,  float *const dstData, int width, int height, int channels, int reorgStride)
{
	int outChannels = channels * reorgStride * reorgStride;
	int outHeight = height / reorgStride;
	int outWidth = width / reorgStride;

	for (int y = 0; y < outHeight; ++y) {
		for (int x = 0; x < outWidth; ++x) {
			for (int c = 0; c < outChannels; ++c) {
				int out_index = x + outWidth*(y + outHeight*c);

				int step = c / channels;
				int x_offset = step % reorgStride;
				int y_offset = reorgStride * ((step / reorgStride) % reorgStride);

				int in_x = x * reorgStride + x_offset;
				
				int out_seq_y = y + c*outHeight;
				int in_intermediate_y = out_seq_y*2 - out_seq_y%2;
				in_intermediate_y = in_intermediate_y % (channels*height);
				int in_c = in_intermediate_y / height;
				int in_y = in_intermediate_y % height + y_offset;
						
				int in_index = in_x + width*(in_y + height*in_c);
				dstData[out_index] = srcData[in_index];
			}
		}
	}
}
const float confidenceThreshold = 0.24;
for (int i = 0; i < out.rows; i++) {
float const*const prob_ptr = &out.at<float>(i, 5);

This comment has been minimized.

@dkurt

dkurt Oct 6, 2017

Member

float const*const is a bit confusing (there are 4 places with it).

May I ask you to use named constant variables or place a comments because it's hard to understand the magic numbers? Especially it's about samples and tests.

This comment has been minimized.

@AlexeyAB

AlexeyAB Oct 8, 2017

Contributor

I removed const*const, also added named constant variables and described the format of network output that compared to the reference in the tests.

But why is const*const confusing, is this contrary to the code style conventions that is accepted in OpenCV?
1st const forbids modification of values pointed to by this pointer, 2nd const forbids modification of this pointer.

getParamConvolution(kernel, pad, stride, filters_num);
darknet::LayerParameter lp;
std::string layer_name = "conv_" + toString(layer_id);

This comment has been minimized.

@dkurt

dkurt Oct 6, 2017

Member

Please try to use cv::format("conv_%d", layer_id) instead toString here and in other places.

namespace darknet {
class LayerParameter {

This comment has been minimized.

@dkurt

dkurt Oct 6, 2017

Member

I hope we can emit this structure. Layers are connected sequentially or using explicit numeric offsets starting from the newly added layer. So I think it's possible to use single vector of layers during network building. May I ask you to try it?

This comment has been minimized.

@AlexeyAB

AlexeyAB Oct 6, 2017

Contributor

Do you mean that I should to try use cv::dnn::experimental_dnn_v1::LayerParams instead of darknet::LayerParameter?

This comment has been minimized.

@dkurt

dkurt Oct 6, 2017

Member

Yeah, I think we can just parse text and binary files simultaneously: for an every entry in config we create a new one LayerParams and fill it depends on layer type. If specific layer has weights - read them from opened binary file. Then add a layer to final network (addLayerToPrev or addLayer with multiple connections based on id of the new layer and offsets i.e. -1, -4 of route).

This comment has been minimized.

@dkurt

dkurt Oct 9, 2017

Member

@AlexeyAB, on the other hand, let's keep it as is now. I'll just install darknet and compare it with PR and we can merge it.

@@ -0,0 +1,202 @@
/*M///////////////////////////////////////////////////////////////////////////////////////

This comment has been minimized.

@dkurt

dkurt Oct 9, 2017

Member

I think we need to replace it. Please take a look on more fresh versions of the licence.

This comment has been minimized.

@dkurt

dkurt Oct 9, 2017

Member

@AlexeyAB, I don't think that we should place past years copyrights to newly added source.
@vpisarev, please tell us if the following header is enough?


// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
//
// Copyright (C) 2017, Intel Corporation, all rights reserved.
// Third party copyrights are property of their respective owners.

Or the same as above but with a current date.

{
template<typename T>
static cv::String toString(const T &v)

This comment has been minimized.

@dkurt

dkurt Oct 9, 2017

Member

Can we remove it? There is only one entry of toString that might be replaced to cv::format. Thanks!

@@ -0,0 +1,623 @@
/*M///////////////////////////////////////////////////////////////////////////////////////

This comment has been minimized.

@dkurt

dkurt Oct 9, 2017

Member

The same, please update the licence.

#include "darknet_io.hpp"
namespace darknet {

This comment has been minimized.

@dkurt

dkurt Oct 9, 2017

Member

Is it possible to nest it into the cv::dnn namespace?

@@ -0,0 +1,115 @@
/*M///////////////////////////////////////////////////////////////////////////////////////

This comment has been minimized.

@dkurt

dkurt Oct 9, 2017

Member

One again: the licence update.

#include <opencv2/dnn/dnn.hpp>
namespace darknet {

This comment has been minimized.

@dkurt

dkurt Oct 9, 2017

Member

namespace cv::dnn::darknet

@@ -0,0 +1,185 @@
/*M///////////////////////////////////////////////////////////////////////////////////////

This comment has been minimized.

@dkurt

dkurt Oct 9, 2017

Member

The licence update.

//M*/
#include "test_precomp.hpp"
#include "npy_blob.hpp"

This comment has been minimized.

@dkurt

dkurt Oct 9, 2017

Member

An extra include directive.

@dkurt

This comment has been minimized.

Member

dkurt commented Oct 10, 2017

👍
@AlexeyAB, thank you!

@opencv-pushbot opencv-pushbot merged commit ecc34dc into opencv:master Oct 10, 2017

1 check passed

default Required builds passed
Details
@adamhrv

This comment has been minimized.

adamhrv commented Nov 30, 2017

Will there be a Python example for CV2 Darknet DNN?

When running readNetFromDarknet --> net.forward() in Python, the YoloV2 yolo-voc.cfg and yolo-voc.weights predictions result doesn't seem to provide any detection info.

This works OK:

net = cv2.dnn.readNetFromDarknet(path_to_prototxt, path_to_model)

This seems to work OK:

imw,imh = (416,416)
blob = cv2.dnn.blobFromImage(cv2.resize(im, (416, 416)))  
net.setInput(blob)
detections = net.forward()

But the detection result doesn't make sense:

print('Detections len: {}'.format(len(detections)))
Detections len: 845
print('Detections 0: {}'.format(detections[0]))
Detections 0: [  5.25983423e-02   2.70044785e-02   9.04742535e-03   2.62199971e-03
   2.96895425e-10   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00]

Referring to yolo_object_detection.cpp

Thanks for your work porting to opencv.

@fabito

This comment has been minimized.

fabito commented Jan 27, 2018

This repo shows a custom face detection model demo using python. The models were trained on the Widerface dataset and are available for download (weights and cfgs).

@cansik

This comment has been minimized.

cansik commented Apr 12, 2018

It seems that sometimes the implementation does not return the correct detection class type. Even with the default pre-trained 80 class model, sometimes the kind of class is unknown (only zeros in the 5 to 85 column).

Also when I train my own model with two different classes, it only tells me the confidence, but not the class itself. Here is the dump of the result matrix (2 class model, every class is always zero):

[0.0401995, 0.031487927, 0.031374719, 0.056794502, 0.0012705148, 0, 0;
 0.04043559, 0.049086317, 0.22208165, 0.25339442, 0.00040254797, 0, 0;
 0.037753381, 0.036353372, 0.60338461, 0.96344143, 0.00046182834, 0, 0;
 0.034598812, 0.039104842, 0.82008636, 0.50834161, 0.0003691405, 0, 0;
 0.038062122, 0.038740713, 1.2965844, 0.84108508, 0.00030294483, 0, 0;
 0.11062718, 0.023617726, 0.083281159, 0.038289066, 0.00061990606, 0, 0;
 0.11572818, 0.046723619, 0.24326266, 0.19994149, 0.00070420129, 0, 0;
 0.11441542, 0.03014034, 0.48347056, 0.84074426, 0.00024233655, 0, 0;
 0.11178426, 0.035311423, 0.71218336, 0.46975818, 0.00027817892, 0, 0;
 0.11641605, 0.038460143, 1.1835779, 0.68950963, 0.00037787345, 0, 0;
 0.19388326, 0.02510196, 0.091353334, 0.028100489, 0.00042861109, 0, 0;
 0.19292459, 0.048351433, 0.22517532, 0.21802522, 0.00040042991, 0, 0;
 0.18578193, 0.031620376, 0.63702989, 0.94969654, 0.0002352587, 0, 0;
 0.18794204, 0.038929399, 0.74781221, 0.44536978, 0.00019949637, 0, 0;
 0.19130239, 0.03961565, 0.9100669, 0.58336502, 0.00025648749, 0, 0;
 0.27135891, 0.02685423, 0.090977632, 0.032328393, 0.00058053283, 0, 0;
 0.27235663, 0.050213691, 0.25882462, 0.26637048, 0.00038077682, 0, 0;
 0.25689453, 0.0360987, 0.69405353, 0.82805032, 0.00038927642, 0, 0;
 0.25953323, 0.045636557, 0.58337092, 0.38006222, 0.00038867182, 0, 0;
 0.2683081, 0.042552318, 0.89848542, 0.72007042, 0.00038684186, 0, 0;
...

When using the darkent to detect the objects, it is always able to tell what kind of object it is (with the confidence), even if the confidence is very low. Do you experience the same behaviour?

@AlexeyAB

This comment has been minimized.

Contributor

AlexeyAB commented Apr 12, 2018

@cansik The fact is that for values that less than threshold: Darknet zeroes the scale, but OpenCV zeroes the prob. And Darknet zeroes prob only if scale isn't zeroed.

You can get the same bounded boxes with the same probability in both OpenCV and Darknet (but not the same probs which less than threshold) only with:

  • 1 year old Darknet
  • default Yolo v2 model
  • the same threshold
  • the same nms threshold
  • on image whose size is equal to the size of network (416x416) due to different resize approaches: AlexeyAB/darknet#232 (comment)

Note: if in the original Darknet scale < thresh, then objectness=0 and prob will not be zeroed even if prob < thresh. But in OpenCV-dnn prob will be zeroed if prob < thresh.

What thresh did you use for Darknet and for OpenCV-dnn-yolo?


  1. OpenCV-dnn-yolo:
    for (int j = 0; j < classes; ++j) {
    float prob = scale*dstData[class_index + j]; // prob = IoU(box, object) = t0 * class-probability
    dstData[class_index + j] = (prob > thresh) ? prob : 0; // if (IoU < threshold) IoU = 0;
    }

  1. Darknet:

Note: if in the original Darknet scale < thresh, then objectness=0 and prob will not be zeroed even if prob < thresh. But in OpenCV-dnn prob will be zeroed if prob < thresh.

dets[index].objectness = scale > thresh ? scale : 0;
                if(dets[index].objectness){
                    for(j = 0; j < l.classes; ++j){
                        int class_index = entry_index(l, 0, n*l.w*l.h + i, l.coords + 1 + j);
                        float prob = scale*predictions[class_index];
                        dets[index].prob[j] = (prob > thresh) ? prob : 0;
                    }
                }
@cansik

This comment has been minimized.

cansik commented Apr 19, 2018

@AlexeyAB As far as I know, the probability gets cleared by opencv. That is ok for me if the confidence is under the threshold.

How do I set the threshold in opencv? Is it possible to lower it zero, to get the probabilities of all predictions? Or do you mean with threshold the threshold defined in the cfg file?

Here is an example which is really strange:

image

On this image the trained network finds three characters (in opencv). All of them have a confidence higher than 80%, but the probability for these classes is zeroed. Do you know why this happens?

Class: none, Confidence: 0.8652
Class: none, Confidence: 0.8734
Class: none, Confidence: 0.8448

Second example:

image

For this picture I have exported the result matrix: yolo_results.sheets

Why are there only three probabilities, and not more for each item? Or do I understand the result matrix wrongly?

@dkurt

This comment has been minimized.

Member

dkurt commented Apr 19, 2018

@cansik, Have you tried to vary thresh parameter of [region] layer in .cfg file? You may set it zero and threshold detections with low confidence.

@cansik

This comment has been minimized.

cansik commented Apr 19, 2018

@dkurt Yes that helped, thank you. I was not sure where to set the threshold, and a (stupid) bug in my result evaluation let to no difference, even when I played with this param.

Now everything works as expected. But is it possible to set this threshold directly on the Net object?

@dkurt

This comment has been minimized.

Member

dkurt commented Apr 19, 2018

@cansik, OpenCV parses .cfg file and extracts this threshold for non-maximum suppression procedure. I think the best solution is to set thresh to zero but post-process output detections in your application. You may try object_detection.py sample. There is a slider you can use to change a confidence threshold and see the difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment