Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicam: CMA Memory leak #3220

Closed
kbingham opened this issue Sep 11, 2019 · 26 comments
Closed

unicam: CMA Memory leak #3220

kbingham opened this issue Sep 11, 2019 · 26 comments

Comments

@kbingham
Copy link
Contributor

Describe the bug
Capturing through the unicam driver leaks CMA memory after the device is closed.

To reproduce

  • Build libcamera
  • Identify current CMA consumption
    • cat /proc/meminfo | grep Cma
  • Run qcam for a bit then exit
    • ./build/src/qcam/qcam
  • Calculate difference in CMA usage
    • cat /proc/meminfo | grep Cma
    • echo $((198732-187880))
      • 10852
    • Each repeated launch consumes another ~10mB of CMA.

Expected behaviour
There should be no considerable consumption of CMA when an application closes and releases buffers.
(even if there is a bug in libcamera, the driver should have released all resources on close)

Actual behaviour

Each iteration and launch of qcam (from libcamera) consumes about 10mB from CMA. I doubled the allocation on the unicam from 4 buffers of 1080p to 8 buffers of 1080p, and the memory leak doubled to ~20mB per iteration.

System

  • Which model of Raspberry Pi? e.g. Pi3B+, PiZeroW

    • RPi 3 with Camera v2
  • Which OS and version (cat /etc/rpi-issue)?

    • Raspberry Pi reference 2019-07-10
  • Which firmware version (vcgencmd version)?

Jul  9 2019 14:42:57 
Copyright (c) 2012 Broadcom
version 6c3fe3f096a93de3b34252ad98cdccadeb534be2 (clean) (release) (start_x)
@kbingham
Copy link
Contributor Author

@6by9 Have you experienced any memory leaks with your testing on the unicam driver?

@kbingham
Copy link
Contributor Author

https://github.com/kbingham/libcamera/ branch kbingham/rpi for the libcamera pipeline development

@6by9
Copy link
Contributor

6by9 commented Sep 11, 2019

I hadn't seen anything from bcm2835-unicam. All the buffer handling is within videobuf2, so that should deal with most of the housekeeping.

There was a report that the kernel module refcount for videobuf2_dma_contig keeps on increasing when using GStreamer. I'd put that down to bcm2835-codec, but rereading it could be bcm2835-unicam as that is also used in their pipeline.
https://www.raspberrypi.org/forums/viewtopic.php?f=67&t=250634
There is a comment in there that dropping the use of dmabufs between unicam and encoder solved the issue.

@6by9
Copy link
Contributor

6by9 commented Sep 11, 2019

I haven't had a chance to investigate that report, but my first steps were going to be checking as to whether vcm-cma was keeping references to dmabufs.
sudo cat /sys/kernel/debug/vcsm-cma/state should dump out the state of all the buffers that it is keeping hold of on behalf of the VPU. It'd be interesting to know if that reported anything in your use case as that would narrow it down as being unicam or codec related.

@kbingham
Copy link
Contributor Author

Yes, @6by9 - I can indeed see the module use count is increasing:

lsmod | grep ^videobuf2_dma
videobuf2_dma_contig    20480  38 bcm2835_unicam,bcm2835_codec

./build/src/qcam/qcam

lsmod | grep ^videobuf2_dma
videobuf2_dma_contig    20480  46 bcm2835_unicam,bcm2835_codec

./build/src/qcam/qcam

lsmod | grep ^videobuf2_dma
videobuf2_dma_contig    20480  54 bcm2835_unicam,bcm2835_codec

So the usecount increases by 8 each iteration
(the number of buffers I am currently allocating and believe to be leaking)

@kbingham
Copy link
Contributor Author

Hrm: On an idle run (No current QCam executing)

sudo cat /sys/kernel/debug/vcsm-cma/state
VC-ServiceHandle     8c729b63
Resources
Total resource count:   0

@6by9
Copy link
Contributor

6by9 commented Sep 11, 2019

Thanks for running those couple of tests.
If vcsm-cma isn't reporting issues then that does point towards unicam.

I suspect something is unbalanced on the dmabuf refcounts, therefore each dmabuf is incrementing the exporters module refcount.
Does sudo cat /sys/kernel/debug/dma_buf/bufinfo reveal anything about who may be doing dodgy things?

@kbingham
Copy link
Contributor Author

Aha ! Thanks - that's just the debug I was looking for :-D I'd been grepping for CMA info and all sorts.

I'm afraid that's pointing to vcsm-cma!

pi@rpi:~/libcamera $ sudo cat /sys/kernel/debug/dma_buf/bufinfo

Dma-buf Objects:
size    	flags   	mode    	count   	exp_name
02613248	00000002	00080007	00000005	videobuf2_dma_contig
	Attached Devices:
	vcsm-cma
Total 1 devices attached

02613248	00000002	00080007	00000005	videobuf2_dma_contig
	Attached Devices:
	vcsm-cma
Total 1 devices attached

02613248	00000002	00080007	00000006	videobuf2_dma_contig
	Attached Devices:
	vcsm-cma
Total 1 devices attached
....

pi@rpi:~/libcamera $ sudo cat /sys/kernel/debug/dma_buf/bufinfo | grep "devices attached" | wc -l
52

So 52 leaked buffers, which closely matches the current:
pi@rpi:~/libcamera $ lsmod | grep ^videobuf2_dma_contig
videobuf2_dma_contig 20480 54 bcm2835_unicam,bcm2835_codec

assuming an extra refcount for each of the unicam/codec modules.

@6by9
Copy link
Contributor

6by9 commented Sep 11, 2019

Hmm, so vcsm-cma thinks it has released all resources it had, but the dmabufs don't.

I don't know how easy it is for you to do, but I've just asked on the forum thread if we can try swinging the allocations round so that codec allocates, and unicam imports. Ignore the request if it is more than 5mins of effort as I'll hook a GStreamer pipeline together to replicate the forum setup instead.

(Ooh, I see an interesting WIP commit at the top of your libcamera branch. Now I'm curious what you're up to with that!)

@kbingham
Copy link
Contributor Author

Actually - those dmabufs get exported into the ISP, so it could equally be an issue with not correctly unexporting them there :-D

Aha- and your next comment just popped up - yes, I'll try to swap the allocations around. I think that's easy.

@kbingham
Copy link
Contributor Author

(Ooh, I see an interesting WIP commit at the top of your libcamera branch. Now I'm curious what you're up to with that!)

I bought a B101 to play with and experiment :-)

@6by9
Copy link
Contributor

6by9 commented Sep 11, 2019

I bought a B101 to play with and experiment :-)

Next you'll be integrating the audio side into libcamera! Nothing like a bit of feature creep ;-)

@kbingham
Copy link
Contributor Author

I don't know how easy it is for you to do, but I've just asked on the forum thread if we can try swinging the allocations round so that codec allocates, and unicam imports. Ignore the request if it is more than 5mins of effort as I'll hook a GStreamer pipeline together to replicate the forum setup instead.

Ok - so it seems there is no leak in this direction...
Buffers allocated at ISP, exported and imported into the unicam - the module use count doesn't go up, and there is no where near the same leakage (edit, CmaFree went up!):

Before
CmaFree: 106520 kB

After 2 runs:
CmaFree: 106724 kB (gone up so must be system usage...?)

@6by9
Copy link
Contributor

6by9 commented Sep 11, 2019

Thanks. That does pretty much confirm that it is dmabuf import with vcsm-cma.
Now to stare at the code to try and work out where.....

@6by9
Copy link
Contributor

6by9 commented Sep 16, 2019

Is there some magic required to run qcam?
I've compiled your kbingham/rpi branch, it runs, select imx219 0-0010, hit OK, and it crashes

qt5ct: using qt5ct plugin
[0:09:55.710525042] INFO Camera camera_manager.cpp:94 libcamera v0.0.0+708-893f2dde
Using camera imx219 0-0010
[0:10:12.339871854] INFO Camera camera.cpp:666 configuring streams: (0) 320x240-0x56595559
[0:10:12.340219534]  ERR RPI raspberrypi.cpp:188 Failed to set format on Video device: 1920x1080-0x41414270
Failed to configure camera
qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 705, resource id: 20973393, major code: 40 (TranslateCoords), minor code: 0

/dev/video0 is set to 1920x1080 raw10 BGGR

Gstreamer seems not to be hitting issues when doing things like video transcode (filesrc ! matroskademux ! h264parse ! v4l2h264dec ! v4l2h264enc). That may be because it intercepts ctrl-c and does some cleanup.
v4l2convert doesn't support raw10, and throws a minor grump with raw8 from ov5647. I'm putting that down to a Gstreamer quirk rather than anything else.

@kbingham
Copy link
Contributor Author

@6by9, No there "shouldn't" be ... but we are lacking any kind of format negotiation with the sensor currently. In my tests, it always just worked with a fixerd format from the IMX219.

This should just ask the sensor what it can support and pick one, and then set that on the ISP.
Let me clean up my local branch and test...

@andrey-konovalov
Copy link
Contributor

Is there some magic required to run qcam?
I've compiled your kbingham/rpi branch, it runs, select imx219 0-0010, hit OK, and it crashes

[0:10:12.340219534]  ERR RPI raspberrypi.cpp:188 Failed to set format on Video device: 1920x1080-0x41414270

...

/dev/video0 is set to 1920x1080 raw10 BGGR

The 0x41414270 (in the error message above) is indeed V4L2_PIX_FMT_SBGGR10P (what your /dev/video0 is using). But the pipeline handler (raspberrypi.cpp) tries to set V4L2_PIX_FMT_SRGGB10P (the hardcoded value), and returns the error ("Failed to set format on Video device...") if the resulting format is different from RGGB.

The quick hack I've used was to change the V4L2_PIX_FMT_SRGGB10P to V4L2_PIX_FMT_SBGGR10P in src/libcamera/pipeline/raspberrypi.cpp. (The proper fix would be to make the pipeline handler more flexible WRT sensor pixel format)

@6by9
Copy link
Contributor

6by9 commented Sep 16, 2019

Thanks @andrey-konovalov - that was exactly it.

@kbingham
Copy link
Contributor Author

BTW @6by9 (in relation to this issue) kbingham/libcamera@2332f1d shows how I swapped the buffer allocation around from the unicam to the ISP. If you are testing this issue on my kbingham/rpi branch, please be aware of that commit. It has a #if 0 to switch allocations.

@6by9
Copy link
Contributor

6by9 commented Sep 16, 2019

I was running your old TOT with manual hack to swap from RGGB to BGGR. That's reproduced the issue, so now trying to remember how this all works....

@6by9
Copy link
Contributor

6by9 commented Sep 16, 2019

It looks like the cleanup code for releasing an imported dmabuf is just plain wrong.
The function vc_sm_clean_up_dmabuf which releases the attachment is only called from the release function for self allocated buffers (vc_sm_dma_buf_release).
(It's feeling totally wrong as we're unmapping and detaching BEFORE the VPU has acknowledged that it has finished with the buffer, which feels very wrong. If only I could rememeber why I'd done that!)

It's going to require a touch more thought, but the cause looks fairly obvious.

@6by9
Copy link
Contributor

6by9 commented Oct 7, 2019

I'm not sure if this is the bug you were seeing, but I've noted that we're unbalanced in calling dma_buf_get/dma_buf_put when reusing buffers vs seeing them for the first time. I'm not sure why I haven't seen this before, but there you go.

https://github.com/raspberrypi/linux/blob/rpi-4.19.y/drivers/staging/vc04_services/bcm2835-codec/bcm2835-v4l2-codec.c#L2063 needs an

                        buf->mmal.dma_buf = dma_buf;
+               } else {
+                       /* We already have a reference count on the dmabuf, so
+                        * release the one we acquired above.
+                        */
+                       dma_buf_put(dma_buf);
                }

Before this I was seeing dmabufs left described in /sys/kernel/debug/dma_buf/bufinfo with no attachments but a refcount roughly equal to the number of times the buffer had been presented to the component. I can't say for definite that that is the leak you're seeing, but it would generally follow.

PR incoming.

@kbingham
Copy link
Contributor Author

kbingham commented Oct 9, 2019

Sounds like it's along the right lines. I'll be back in the office next week so I can test things for you then.

@andrey-konovalov
Copy link
Contributor

andrey-konovalov commented Oct 25, 2019

https://github.com/raspberrypi/linux/blob/rpi-4.19.y/drivers/staging/vc04_services/bcm2835-codec/bcm2835-v4l2-codec.c#L2063 needs an

                        buf->mmal.dma_buf = dma_buf;
+               } else {
+                       /* We already have a reference count on the dmabuf, so
+                        * release the one we acquired above.
+                        */
+                       dma_buf_put(dma_buf);
                }

I've tried this change on RPi0 with recent rpi/rpi-4.19.y (when it was at commit a287d34 "rpi-wm8804-soundcard: Fixed MCLKDIV for Allo Digione"). After I ran qcam 5 times, the amount of free CMA memory didn't change ('cat /proc/meminfo |grep Cma'), the used count of videobuf2_dma_contig didn't increase ('lsmod | grep ^videobuf2_dma'), and 'sudo cat /sys/kernel/debug/dma_buf/bufinfo' showed "Total 0 objects, 0 bytes".

So this change fixes the CMA memory leak in the unicam driver.

Guess this issue can be resolved after #3293 is merged.

@andrey-konovalov
Copy link
Contributor

#3293 has been merged into rpi-4.19.y.
@kbingham are you OK to close it?

@kbingham
Copy link
Contributor Author

Hi @andrey-konovalov

Thanks for the ping here, Yes I have confirmed with @6by9 that the updates here have fixed the issues we saw.

Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants