Resizing 448 darknet: ./src/network.c:392: resize_network: Assertion `0' failed. #426

AhaEdgar · 2018-01-21T08:11:32Z

anyone can help me to solve this issue？
i change yolo.2.0cfg and add one cnn layer.

Li-Lai · 2018-01-24T03:26:57Z

The way you ask for help is funny.You don't post the changes, the ghost knows what you've changed.

christopher5106 · 2018-03-08T11:11:42Z

I have the same issue on a NVIDIA V100 (I choose -gencode arch=compute_70,code=sm_70) while everything works well on NVIDIA 1080 TI:

Region Avg IOU: 0.141527, Class: 0.026888, Obj: 0.652683, No Obj: 0.566986, Avg Recall: 0.000000, count: 4
10: 363.796783, 444.599030 avg, 0.000000 rate, 0.045145 seconds, 10 images
Resizing
544
darknet: ./src/network.c:392: resize_network: Assertion `0' failed.
Aborted (core dumped)

AlexeyAB · 2018-03-08T11:40:23Z

@Ahagpp @christopher5106 You can try to use this fork, I fixed excessively memory allocation for several (unfortunate) network sizes: https://github.com/AlexeyAB/darknet

Also if you use GPU V100 - you can use Tensor Cores for Mixed Precision calculations - how to use it: (now mixed precision supported for 1xGPu and for multi-GPU): AlexeyAB#407

christopher5106 · 2018-03-08T14:44:55Z

Sounds good, working well on DGX-Station with V100. On Power9 with V100, I have a problem when using CUDNN=1 with CUDNN 7.0

27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024
30 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125
31 detection
Loading weights from darknet19_448.conv.23...
seen 32

after what it freezes. Without CUDNN it works well but I cannot benefit from half precision.

AlexeyAB · 2018-03-08T21:50:34Z

@christopher5106 To localize the problem, there are a few questions:

Does it freez only for training, or for detection too?
Does it work with GPU=1 CUDNN=0 in the Makefile?
Does it work with GPU=0 CUDNN=0?
Do you use OpenCV?
Did you try to use mixed-precision -DCUDNN_HALF in the Makefile to train on V100? (now it supports multi-GPU for DGX)
Do you use little endian 64-bit Linux?

christopher5106 · 2018-03-09T18:20:46Z

it seems like there was a performance issue, we did a complete reinstall and the problem sounds to have disappeared. thanks a lot, I ll tell you more about this next week

christopher5106 · 2018-03-12T14:54:07Z

On some runs, I get

Region Avg IOU: nan, Class: nan, Obj: -nan, No Obj: -nan, Avg Recall: 0.000000, count: 6
78444: -nan, -nan avg, 0.000010 rate, 0.180000 seconds, 78444 images

Is that normal ? When I re run it, it is ok.

AlexeyAB · 2018-03-12T14:59:12Z

@christopher5106

Region Avg IOU: nan, Class: nan, Obj: -nan, No Obj: -nan, Avg Recall: 0.000000, count: 6

If these lines occur sometimes - then this is normal.
If at some point all the lines contain nan, then the training went wrong.

78444: -nan, -nan avg, 0.000010 rate, 0.180000 seconds, 78444 images

This is always - the training went wrong.

christopher5106 · 2018-03-12T15:05:13Z

The training went wrong indeed... is that normal ?

AlexeyAB · 2018-03-12T15:14:21Z

@christopher5106 No, this is not normal. Something wrong in the: dataset, model or source code.

nurCoban · 2018-03-29T14:44:29Z

@Ahagpp I've got same prob. Increase subdivision in cfg file. Its solve this problem.

interface-bin · 2018-08-02T06:35:49Z

I have the same same problem and I solved it by two steps:

edit the Makefile and rebuild the project
ARCH= -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=[sm_50,compute_50] \ -gencode arch=compute_52,code=[sm_52,compute_52] \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=sm_61
because my GPU is GTX 1080 and it's corresponding compute is 6.1
edit src/network.c and comment the sentence out
if(l.workspace_size > 2000000000) assert(0);

and after this two steps, I solved the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resizing 448 darknet: ./src/network.c:392: resize_network: Assertion `0' failed. #426

Resizing 448 darknet: ./src/network.c:392: resize_network: Assertion `0' failed. #426

AhaEdgar commented Jan 21, 2018

Li-Lai commented Jan 24, 2018

christopher5106 commented Mar 8, 2018

AlexeyAB commented Mar 8, 2018 •

edited

Loading

christopher5106 commented Mar 8, 2018 •

edited

Loading

AlexeyAB commented Mar 8, 2018 •

edited

Loading

christopher5106 commented Mar 9, 2018

christopher5106 commented Mar 12, 2018

AlexeyAB commented Mar 12, 2018 •

edited

Loading

christopher5106 commented Mar 12, 2018

AlexeyAB commented Mar 12, 2018

nurCoban commented Mar 29, 2018

interface-bin commented Aug 2, 2018

Resizing 448 darknet: ./src/network.c:392: resize_network: Assertion `0' failed. #426

Resizing 448 darknet: ./src/network.c:392: resize_network: Assertion `0' failed. #426

Comments

AhaEdgar commented Jan 21, 2018

Li-Lai commented Jan 24, 2018

christopher5106 commented Mar 8, 2018

AlexeyAB commented Mar 8, 2018 • edited Loading

christopher5106 commented Mar 8, 2018 • edited Loading

AlexeyAB commented Mar 8, 2018 • edited Loading

christopher5106 commented Mar 9, 2018

christopher5106 commented Mar 12, 2018

AlexeyAB commented Mar 12, 2018 • edited Loading

christopher5106 commented Mar 12, 2018

AlexeyAB commented Mar 12, 2018

nurCoban commented Mar 29, 2018

interface-bin commented Aug 2, 2018

AlexeyAB commented Mar 8, 2018 •

edited

Loading

christopher5106 commented Mar 8, 2018 •

edited

Loading

AlexeyAB commented Mar 8, 2018 •

edited

Loading

AlexeyAB commented Mar 12, 2018 •

edited

Loading