-
-
Notifications
You must be signed in to change notification settings - Fork 55.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNN inference is using 3.5 time more memory for 4.8.0 when compared to 4.5.2 #24134
Comments
Seems like I misunderstood you initially. The speed of the model if OK but on 4.8.0 it consumes more memory, is that right? |
Yes and that might lead to swapping and drastically longer runtimes as reported in the other 2 issues. |
Hi @ukoehler, in 4.8.0, we use the Winograd optimize for 3x3 stride 1 convolution, which speeds up the inference speed but costs more memory (maybe about 3 times). If you don't want this feature, you can use In conclusion, the memory (4.8) consumed is at least twice as much as before (4.5). Since we have saved the original convolution weight and re-packed the weight blob. And in the future, we will have the full graph optimized and a better memory strategy for optimizing this part. |
FYI: #22825 |
Hmm, if that was merged Jan 8th, why do I still see such a high memory usage? |
hi @ukoehler plese check the detail, it is not fixed completely. And due to current dnn memory strategy, it can not be fixed in 4.x. We try to completely fix it in 5.x. |
I did read that the change brought little advance for some models and the tests showed it broke others. Then the tests were just disabled. |
Just ran other tests with non-YOLO models: memory usage increased from 1.352 GB to 3.147 GB and it looks like it is leaking memory: 4.5.2:
4.8.0:
|
@ukoehler I'm curious, have you tried disabling Winograd? |
No. There is no build switch I could find in the documentation. How would I got about it. Very eager to test. |
I though @zihaomu said you could disable it with |
Right. but after disabling that, it still costs about twice the memory than 4.5.2, since we repack every convolution weight blob. |
Hmm, found it and did a speed test. net.enableWinograd(false) makes it faster ???? It returns to the 4.5.2 speed. The first memory test is still running |
That depends on your machine platform. For some models, the main time-consuming bottleneck is in memory rather than in computation, winograd will indeed lead to a time-consuming extension because it needs more memory. |
For the currently running test on a AMD EPIC 7302P virtual machine with AVX2 enabled. It slows down from 1.368 s to 4.351 s. In my recent speed tests I saw fluctuations of 18%. |
net.enableWinograd(false) still leaks memory and uses therefore about twice as much as version 4.5.2. I do not consider this an optimization case, but a serious bug.
|
@zihaomu could you check memory leaks? |
Hi @asmorkalov, I will work on it. |
@zihaomu: Here is the RAM graph for the YOLO3 code shown above with net.enableWinograd(false):
As a hint: In this case I run several images through the same network. In the graphs shown before, I run one image through 4 networks. It appears that memory is not cleaned up after an inference run (as was the case for version 4.5.2). The memory is cleared when the network is destructed or another inference is triggered. That means that using valgrind to find the problem will not help. I will run the above test case with Winograd enabled, but that will take a couple of hours. |
Hi @ukoehler, how can I reproduce your issue? Test environment: Mac intel, i9.
|
Hi @zihaomu , I do not think the while loop in necessary. I would only run through that once and see accumulating memory usage. Looks like memory is not cleaned after running forward(), but cleaned at destruction and the before the next forward() run. I use |
Hi @ukoehler, in opencv dnn, we will allocate memory for every layer by layer on the first run (for convolution weight re-packing and Winograd initializing). So, I think the accumulating memory usage is reasonable for the first run, it should not be a memory leak. If the memory usage is still increasing during the following We can not do the convolution weight re-packing and Winograd initializing at every beginning of For a big model, it dose not make sense to forward only with CPU. And such memory increasing aims to speed up forward. For small models, on ARM, we can speed up about 2X, test with ReseNet50. |
As you can see from the graphs, that behaviour changed from version 4.5.2. The drastic increase in memory usage poses a big problem. The memory usage is not increasing when running the same net again, but is accumulating when running different nets one after the other. It also takes to long to reload the nets from disk every time. Something must have changed that causes that memory usage buildup. Memory should be freed at the end of a forward, or the possibility should at least exist. As for speed: The changes make all the nets I tested dramatically slower, not faster. For regression alone, Winograd should not be on as a default. Add swapping due to massively increased memory usage (not in any of the cases above. These are just our unit tests) and we would have to seriously think about replacing OpenCV altogether (specially in the light of the other, obscure bugs I found) |
That's correct. We have found this issue, and try to fix it in the 5.x, which needs refactoring all our memory allocation strategy, a lot of work to do.
Have your test on ARM platform? |
@ukoehler are your findings specific to a give backend or all of them? |
@zihaomu , sorry for not specifying here:
This is the only platform we target for inference, sorry. |
Finish the 1 hour
|
Again, still a bug. Your test is not testing the problem at all. The memory is wasted while running several nets the first time not one net lots of times. |
Hi @asmorkalov, please take a look. The reply from my site is #24134 (comment).
That's true. Try to fix the future. For my point of view, this is optimization, not a memory leak. |
Keeping the memory might be fine when only dealing with one network. I am dealing with six nets now and that will likely be increasing. The old behaviour work fine for that, the new behaviour leads to swapping due to the large amount of memory used. Is there a way to at least release memory to restore the old behaviour? All in all, this is a massive regression problem without warning in the release notes. I am clearly not the only one having the problem as the two other issues about slower inference proof. Looks like the changes have a negative effect for many cases. So far I am just glad that out unit test caught the problem early. |
Hi @ukoehler, currently we do not have such flag to do so. And in the future, we would like to add a new API like
That's a good idea. |
|
Significant memory footprint improvement: #25163 |
@asmorkalov, related to import stage only, but I probably take a look on inference too. |
Yes, I know. The significant memory consumption is caused by Winograd branch in convolution. |
Release convolution weightsMat after usage #25181 ### Pull Request Readiness Checklist related (but not resolved): #24134 Minor memory footprint improvement. Also, adds a test for VmHWM. RAM top memory usage (-230MB) | YOLOv3 (237MB file) | 4.x | PR | |---------------------|---------|---------| | no winograd | 808 MB | 581 MB | | winograd | 1985 MB | 1750 MB | See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Release convolution weightsMat after usage opencv#25181 ### Pull Request Readiness Checklist related (but not resolved): opencv#24134 Minor memory footprint improvement. Also, adds a test for VmHWM. RAM top memory usage (-230MB) | YOLOv3 (237MB file) | 4.x | PR | |---------------------|---------|---------| | no winograd | 808 MB | 581 MB | | winograd | 1985 MB | 1750 MB | See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
@asmorkalov, yes, it's still actual |
System Information
OpenCV version 4.8.0 vs. 4.5.2 compiled from source
Operating System: Both Windows and Linux
Compiler GCC 11
Detailed description
During regression testing between version 4.8.0 and 4.5.2 most regression tests ran perfect. All results were the same, however the runtime with larger data sets increased dramatically. For smaller dataset I maybe see a slightly longer runtime. However, it turns out that swapping was the problem for larger datasets.
I used valgrind massif to measure memory usage for a singled out unit test with one of the networks and noticed that 4.5.2 used 553,8 MB of RAM while version 4.8.0 needed 1.89 GB of RAM for the same task.
Find the network data here:
https://drive.google.com/file/d/1hPetNOt76xze8cD3DriDuZira65IeKc4/view?usp=sharing
and
https://drive.google.com/file/d/1swyevb0xsQhKeFxHOz-G00Dl-mz2wVCD/view?usp=sharing
As requested in issue #23223 I provide the performance data (there was NO SWAPPING) for 4.8.0:
and 4.5.2
Steps to reproduce
Issue submission checklist
The text was updated successfully, but these errors were encountered: