GStreamer, Jupyter, and V4L2 clash #1769

projekter · 2023-01-07T15:33:38Z

I reported this first with GStreamer and was then advised to put the report here.

Describe your issue

Using the v4l2jpegdec within a Jupyter session will work in the first run, but then permanently (until a restart) breaks all further uses of this element.

Setup

Operating System: Raspberry Pi OS 4.0 Lite x64

> cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
> uname -a
Linux raspberrypi 5.15.76-v8+ #1597 SMP PREEMPT Fri Nov 4 12:16:41 GMT 2022 aarch64 GNU/Linux
> vcgencmd version
Oct 26 2022 11:09:05
Copyright (c) 2012 Broadcom
version c72ad6b26ff40c91ef776b847436094ee63fabee (clean) (release) (start)

Device: Raspberry Pi 4B
GStreamer Version: 1.20.5 or 1.21.3 (compiled from source)
Python Version: 3.9.2 (also tested in 3.9.15)

Steps to reproduce the bug

You will need some device that produces motion jpeg, here I have a camera on /dev/video0. The sink doesn't matter, a fakesink will do, but to "see" better, I use a visible output.

Good behavior

Run

gst-launch-1.0 v4l2src device=/dev/video0 io-mode=4 ! image/jpeg,width=1280,height=720,framerate=30/1 ! v4l2jpegdec ! kmssink

(or similar caps, or another sink) on the console. The video device will open and display properly (or, in case of a fakesink, the clock will start to run). Note that you can cancel gst-launch-1.0 and then run the same command again, as often as you like, as expected.

The same good behavior can be found when running the corresponding Python code (say, test.py):

import gi, time
gi.require_version("Gst", "1.0")
from gi.repository import Gst
Gst.init(None)
pipe = Gst.parse_launch("v4l2src device=/dev/video0 io-mode=4 ! image/jpeg,width=1280,height=720,framerate=30/1 ! v4l2jpegdec ! kmssink")
pipe.set_state(Gst.State.PLAYING)
print("Running")
time.sleep(10)
pipe.set_state(Gst.State.NULL)
del pipe
print("Finished")
# maybe do something else or just end the script.

You can now run this code using python test.py again and again, and everything is fine.

Bad behavior

Run the python code in a Jupyter notebook. I checked it both with native Jupyter Lab, VS Code Jupyter, and also using IPython and XPython as kernels (which is also the reason why I submit the issue here, as I'm not sure where to put it in Jupyter).
The code will run perfectly fine the first time and everything works as expected. However, when I now try to run the code a second time - or even when I try to use any of the "good" procedures (command line, Python script) after first running the code in Jupyter, the pipeline won't start. The command line stops after "Redistribute latency" and after a couple of seconds gives the error (here for 1.20.5)

ERROR: from element /GstPipeline:pipeline0/v4l2jpegdec:v4l2jpegdec0: Failed to allocate required memory.
Additional debug info:
../subprojects/gst-plugins-good/sys/v4l2/gstv4l2videodec.c(668): gst_v4l2_video_dec_setup_capture (): /GstPipeline:pipeline0/v4l2jpegdec:v4l2jpegdec0:
Buffer pool activation failed
Execution ended after 0:00:06.572893138
Setting pipeline to NULL ...
ERROR: from element /GstPipeline:pipeline0/GstV4l2Src:v4l2src0: Internal data stream error.
Additional debug info:
../subprojects/gstreamer/libs/gst/base/gstbasesrc.c(3127): gst_base_src_loop (): /GstPipeline:pipeline0/GstV4l2Src:v4l2src0:
streaming stopped, reason not-negotiated (-4)
Freeing pipeline ...

This also happens if instead of deleting the pipeline in the Jupyter session, I just try to set the same pipeline object to PLAYING again. And, most importantly, it also happens if I completely kill the Jupyter kernel. Until I restart the pi, the pipeline will never start again whatsoever. However, note that the device itself is opened, as indicated by the LED on the camera, it is just the decoder that fails (and jpegdec works, but the performance and resource consumption is much worse, of course).

How reproducible is the bug?

Always, and also using different kinds of sinks (tried with kmssink, fbdevsink, glimagesink, fakesink).

Solutions you have tried

Killing pretty much any nonessential process, including everything responsible for the SSH connection and reconnecting. Tried two different GStreamer setups. Installed Miniconda with the latest Python 3.9 and a XEUS Jupyter kernel, to see whether it is IPython-related.

More details

In a freshly botted RPi, I'm doing the following:

Create the script
Launch the pipeline once using the command line to init all drivers
dmesg into 00-dmesg.log
Run the python script, logging into 01-pyscript.log
dmesg into 02-dmesg.log
Start jupyter (all logging goes into 03-jupyter.log as well as the notebook cells themselves), create the notebook (where here I replace the sleep by two different cells which I execute one after the other). Run the pipeline, when it's been running for some time shut it down.
Run the pipeline again in the same jupyter session (will not start). Get the status (async). Shut down the pipeline (takes a couple of seconds).
Shut down jupyter. This also takes a couple of seconds, which is unusual.
dmesg into 04-dmesg.log
Run the python script again (will also not start any more), logging into 05-pyscript.log
dmesg into 06-dmesg.log

I hope all the log files help to narrow down the cause. The notebook, the script and the full commandline are also included.

From a cursory glance, the kernel log seems pretty suggestive:

[ 253.894858] videobuf2_common: driver bug: stop_streaming operation is leaving buf 00000000b765ecde in active state

The text was updated successfully, but these errors were encountered:

pelwell · 2023-01-09T13:11:10Z

gstreamer is outside my comfort zone so I can't offer any real help, but it sounds as though something isn't being cleaned up when the script exits, perhaps because Jupyter doesn't run up a new Python interpreter for each script. Does the Gst class have an explicit uninit (or similar) method that you could add to the end of the script?

pelwell · 2023-01-09T13:16:00Z

Gst.deinit(), perhaps?

6by9 · 2023-01-09T13:33:59Z

I'm surprised gst-launch-1.0 v4l2src device=/dev/video0 io-mode=4 ! image/jpeg,width=1280,height=720,framerate=30/1 ! v4l2jpegdec ! kmssink works if this is a USB webcam (uvcvideo).
UVC uses the vmalloc heap as it has to copy the incoming image data out of the USB buffers into the V4L2 buffers. v4l2jpegdec on the Pi requires contiguous buffers, so I would expect it to fail on the import.

Please can you try adding start_debug=1 to /boot/config.txt. Run your script to lock the system, and then use vclog to sudo ./vclog --assert to dump any asserts on the firmware side? sudo ./vclog --msg may also be of interest.

Likewise there is extra logging available from the V4L2 kernel driver via module parameter bcm2835_codec.debug. It's a level from 0 (no debug logging) to 5 (most verbose).

ndufresne · 2023-01-09T16:30:11Z

@6by9 io-mode=4 is the same as io-mode=dmabuf, which is the default / dmabuf exporter, in this mode, as for v4l2jpegdec it does not implement impoter role, in fact, all the v4l2 decoder will copy the bitstream into the driver allocated buffer, there is no zero-copy path here.

From v4l2jpegec to kmssink, DRM/KMS importation will be tried, though an automatic copy fallback is in-place, I cannot comment if this part is zero-copy.

What you should notice, is that the V4L2 common code in the Linux kernel is reporting a kernel bug:

[ 253.894858] videobuf2_common: driver bug: stop_streaming operation is leaving buf 00000000b765ecde in active state

User space cannot be blamed for a kernel bug. After this error is reported, the driver no longer works.

6by9 · 2023-01-09T19:32:20Z

io-mode=4 is the same as io-mode=dmabuf, which is the default / dmabuf exporter, in this mode, as for v4l2jpegdec it does not implement impoter role, in fact, all the v4l2 decoder will copy the bitstream into the driver allocated buffer, there is no zero-copy path here.

I'm confused then as all the v4l2dec (and v4l2enc) components list

  output-io-mode      : Output side I/O mode (matches sink pad)
                        flags: readable, writable
                        Enum "GstV4l2IOMode" Default: 0, "auto"
                           (0): auto             - GST_V4L2_IO_AUTO
                           (1): rw               - GST_V4L2_IO_RW
                           (2): mmap             - GST_V4L2_IO_MMAP
                           (3): userptr          - GST_V4L2_IO_USERPTR
                           (4): dmabuf           - GST_V4L2_IO_DMABUF
                           (5): dmabuf-import    - GST_V4L2_IO_DMABUF_IMPORT

Why is dmabuf-import listed here if it isn't supported? What am I missing?
The V4L2 device supports dmabuf import, so is this a GStreamer component limitation?

I'm not blaming userspace for this, just commenting that I was surprised it works as I'm expecting a heap type mismatch on the zero-copy path I was expecting to see.

Yes the kernel reports

videobuf2_common: driver bug: stop_streaming operation is leaving buf 00000000b765ecde in active state

however videobuf2 cleans up on behalf of the driver by calling vb2_buffer_done(..., VB2_BUF_STATE_ERROR), therefore there is nothing fundamentally broken at that point.

It looks like something has locked up on the firmware, hence needing further debug.

projekter · 2023-01-09T20:55:32Z

@6by9 I'll try to fit in time tomorrow morning and post the logs that you requested.

ndufresne · 2023-01-09T21:14:36Z

@6by9 I was just trying to explain that zero-copy was not enabled in this pipeline. But yes, I was wrong, it seems that you can zero-copy bitstream data now, but as said, this is not used at all, io-mode=dmabuf is exporter, if you have two exporters chained in gstreamer, a copy is made. If you want to try zero-copy, you'd have to have one side dmabuf-import and the other dmabuf, there is a reason why zero copy is not enabled by default, which is basically because there is not enough code to fallback when it fails.

projekter · 2023-01-10T07:48:17Z

So I now set bcm2835_codec.debug=3 (modinfo bcm2835_codec lists 3 as the highest possible level for debug, not 5).

Something that completely baffles me: after setting the parameter and rebooting, all v4l2 elements were just completely gone from gstreamer. I had to re-install the gstreamer files from my build directory to recover them (I tried resetting the parameters to see if it was a load error, to no avail - and the second time I set them after the recovery, this didn't happen again). No idea if or how this could be related.

Next observation: It look like if I run the command in Jupyter for the very first time on the system, the lock also does not happen. Only if GStreamer was called (in any kind of fashion) before, it happens. So the attached log files are generated when I first call the command line, then do the same using Jupyter, and then verify via the command line that indeed the driver is locked.

vclogassert.log
vclogmsg.log

Finally, @ndufresne thank you for this info, I learn a lot probably unrelated to this bug. I was pretty sure that some time ago I read that GStreamer now always tries to match the DMA io-modes (before, I always set capture-io-mode and then again output-io-mode , which I dropped afterwards), and that it also tries to auto-select zero-copy buffers whenever possible (so the io-mode=4 is more a sign of me trying to be extra sure, although in my head I saw it as redundant). But I don't know whether I'll be able to find out again where I read this - perhaps it was more of a development plan instead of a completed feature.

projekter · 2023-04-07T16:08:21Z

A quick report after a few months: The problem seems to be solved. I just the same steps again with the latest GStreamer 1.22 and also updated via apt, which included kernel updates. I'm now running on

> uname -a
Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
> vcgencmd version
Mar 17 2023 10:51:11
Copyright (c) 2012 Broadcom
version 82f3750a65fadae9a38077e3c2e217ad158c8d54 (clean) (release) (start_db)

It appears that I can now also run and close the pipeline as often as I want from Jupyter. I don't know whether this is due to the kernel update or whether GStreamer now contains some additional safeguarding that is maybe able to circumvent the bug, but I'm happy with this either way.

projekter closed this as completed Apr 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GStreamer, Jupyter, and V4L2 clash #1769

GStreamer, Jupyter, and V4L2 clash #1769

projekter commented Jan 7, 2023

pelwell commented Jan 9, 2023

pelwell commented Jan 9, 2023

6by9 commented Jan 9, 2023

ndufresne commented Jan 9, 2023

6by9 commented Jan 9, 2023

projekter commented Jan 9, 2023

ndufresne commented Jan 9, 2023

projekter commented Jan 10, 2023

projekter commented Apr 7, 2023

GStreamer, Jupyter, and V4L2 clash #1769

GStreamer, Jupyter, and V4L2 clash #1769

Comments

projekter commented Jan 7, 2023

Describe your issue

Setup

Steps to reproduce the bug

Good behavior

Bad behavior

How reproducible is the bug?

Solutions you have tried

More details

pelwell commented Jan 9, 2023

pelwell commented Jan 9, 2023

6by9 commented Jan 9, 2023

ndufresne commented Jan 9, 2023

6by9 commented Jan 9, 2023

projekter commented Jan 9, 2023

ndufresne commented Jan 9, 2023

projekter commented Jan 10, 2023

projekter commented Apr 7, 2023