-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GStreamer, Jupyter, and V4L2 clash #1769
Comments
gstreamer is outside my comfort zone so I can't offer any real help, but it sounds as though something isn't being cleaned up when the script exits, perhaps because Jupyter doesn't run up a new Python interpreter for each script. Does the |
|
I'm surprised Please can you try adding Likewise there is extra logging available from the V4L2 kernel driver via module parameter bcm2835_codec.debug. It's a level from 0 (no debug logging) to 5 (most verbose). |
@6by9 io-mode=4 is the same as io-mode=dmabuf, which is the default / dmabuf exporter, in this mode, as for v4l2jpegdec it does not implement impoter role, in fact, all the v4l2 decoder will copy the bitstream into the driver allocated buffer, there is no zero-copy path here. From v4l2jpegec to kmssink, DRM/KMS importation will be tried, though an automatic copy fallback is in-place, I cannot comment if this part is zero-copy. What you should notice, is that the V4L2 common code in the Linux kernel is reporting a kernel bug:
User space cannot be blamed for a kernel bug. After this error is reported, the driver no longer works. |
I'm confused then as all the v4l2dec (and v4l2enc) components list
Why is dmabuf-import listed here if it isn't supported? What am I missing? I'm not blaming userspace for this, just commenting that I was surprised it works as I'm expecting a heap type mismatch on the zero-copy path I was expecting to see. Yes the kernel reports
however videobuf2 cleans up on behalf of the driver by calling vb2_buffer_done(..., VB2_BUF_STATE_ERROR), therefore there is nothing fundamentally broken at that point. It looks like something has locked up on the firmware, hence needing further debug. |
@6by9 I'll try to fit in time tomorrow morning and post the logs that you requested. |
@6by9 I was just trying to explain that zero-copy was not enabled in this pipeline. But yes, I was wrong, it seems that you can zero-copy bitstream data now, but as said, this is not used at all, io-mode=dmabuf is exporter, if you have two exporters chained in gstreamer, a copy is made. If you want to try zero-copy, you'd have to have one side dmabuf-import and the other dmabuf, there is a reason why zero copy is not enabled by default, which is basically because there is not enough code to fallback when it fails. |
So I now set Something that completely baffles me: after setting the parameter and rebooting, all v4l2 elements were just completely gone from gstreamer. I had to re-install the gstreamer files from my build directory to recover them (I tried resetting the parameters to see if it was a load error, to no avail - and the second time I set them after the recovery, this didn't happen again). No idea if or how this could be related. Next observation: It look like if I run the command in Jupyter for the very first time on the system, the lock also does not happen. Only if GStreamer was called (in any kind of fashion) before, it happens. So the attached log files are generated when I first call the command line, then do the same using Jupyter, and then verify via the command line that indeed the driver is locked. Finally, @ndufresne thank you for this info, I learn a lot probably unrelated to this bug. I was pretty sure that some time ago I read that GStreamer now always tries to match the DMA io-modes (before, I always set |
A quick report after a few months: The problem seems to be solved. I just the same steps again with the latest GStreamer 1.22 and also updated via apt, which included kernel updates. I'm now running on > uname -a
Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux
> vcgencmd version
Mar 17 2023 10:51:11
Copyright (c) 2012 Broadcom
version 82f3750a65fadae9a38077e3c2e217ad158c8d54 (clean) (release) (start_db) It appears that I can now also run and close the pipeline as often as I want from Jupyter. I don't know whether this is due to the kernel update or whether GStreamer now contains some additional safeguarding that is maybe able to circumvent the bug, but I'm happy with this either way. |
I reported this first with GStreamer and was then advised to put the report here.
Describe your issue
Using the
v4l2jpegdec
within a Jupyter session will work in the first run, but then permanently (until a restart) breaks all further uses of this element.Setup
Steps to reproduce the bug
You will need some device that produces motion jpeg, here I have a camera on
/dev/video0
. The sink doesn't matter, a fakesink will do, but to "see" better, I use a visible output.Good behavior
Run
(or similar caps, or another sink) on the console. The video device will open and display properly (or, in case of a fakesink, the clock will start to run). Note that you can cancel
gst-launch-1.0
and then run the same command again, as often as you like, as expected.The same good behavior can be found when running the corresponding Python code (say,
test.py
):You can now run this code using
python test.py
again and again, and everything is fine.Bad behavior
Run the python code in a Jupyter notebook. I checked it both with native Jupyter Lab, VS Code Jupyter, and also using IPython and XPython as kernels (which is also the reason why I submit the issue here, as I'm not sure where to put it in Jupyter).
The code will run perfectly fine the first time and everything works as expected. However, when I now try to run the code a second time - or even when I try to use any of the "good" procedures (command line, Python script) after first running the code in Jupyter, the pipeline won't start. The command line stops after "Redistribute latency" and after a couple of seconds gives the error (here for 1.20.5)
This also happens if instead of deleting the pipeline in the Jupyter session, I just try to set the same pipeline object to
PLAYING
again. And, most importantly, it also happens if I completely kill the Jupyter kernel. Until I restart the pi, the pipeline will never start again whatsoever. However, note that the device itself is opened, as indicated by the LED on the camera, it is just the decoder that fails (andjpegdec
works, but the performance and resource consumption is much worse, of course).How reproducible is the bug?
Always, and also using different kinds of sinks (tried with
kmssink
,fbdevsink
,glimagesink
,fakesink
).Solutions you have tried
Killing pretty much any nonessential process, including everything responsible for the SSH connection and reconnecting. Tried two different GStreamer setups. Installed Miniconda with the latest Python 3.9 and a XEUS Jupyter kernel, to see whether it is IPython-related.
More details
In a freshly botted RPi, I'm doing the following:
00-dmesg.log
01-pyscript.log
02-dmesg.log
03-jupyter.log
as well as the notebook cells themselves), create the notebook (where here I replace the sleep by two different cells which I execute one after the other). Run the pipeline, when it's been running for some time shut it down.04-dmesg.log
05-pyscript.log
06-dmesg.log
I hope all the log files help to narrow down the cause. The notebook, the script and the full commandline are also included.
From a cursory glance, the kernel log seems pretty suggestive:
The text was updated successfully, but these errors were encountered: