Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VC4 freeze in (atomic,non-atomic) KMS on arm64 #4627

Closed
satmandu opened this issue Oct 13, 2021 · 29 comments
Closed

VC4 freeze in (atomic,non-atomic) KMS on arm64 #4627

satmandu opened this issue Oct 13, 2021 · 29 comments
Labels
Close within 30 days Issue will be closed within 30 days unless requested to stay open KMS Issue Issues related to KMS/DRM drivers

Comments

@satmandu
Copy link
Contributor

Background:
Mutter 40 (using in Ubuntu 21.10 and beyond) is using atomic KMS, which can be disabled by setting MUTTER_DEBUG_ENABLE_ATOMIC_KMS=0.

The issue is this: After mutter sets the display to go to sleep, it never comes up again, reporting this error within minutes of going to sleep:

[  500.967377] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:77:crtc-3] flip_done timed out
[  500.967374] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  500.967659] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:77:crtc-3] commit wait timed out
[  511.207505] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  511.207773] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:71:plane-3] commit wait timed out
[  521.447604] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  521.447862] vc4-drm gpu: [drm] *ERROR* Timed out waiting for commit

A reboot is needed to resolve this.

This isn't unique to kernel 5.15. Ubuntu is seeing it in their release 5.13 kernels too, but I have verified that this issue still exists by building the current 5.15-rc5 kernel here.

The issue exists with current firmware.

It has been mentioned at https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/1944397/comments/20 and https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/1944397/comments/45 by people with familiarity with KMS far far above mine.

As @vanvugt suggests, this may be a kernel issue:


Oct 08 16:16:31 pi400 gnome-shell[968]: Mode setting implementation for '/dev/dri/card1' (vc4) forced (non-atomic).

...

Oct 08 16:24:38 pi400 kernel: [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
Oct 08 16:24:38 pi400 kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:76:crtc-3] commit wait timed out
Oct 08 16:24:48 pi400 kernel: [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
Oct 08 16:24:48 pi400 kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:70:plane-3] commit wait timed out

I guess the vc4 kernel is trying to implement non-atomic KMS using a buggy atomic implementation. :(

To reproduce
dtoverlay=vc4-kms-v3d-pi4 is being used.
With a kernel newer than 5.11, install mutter 40, and let the display go to sleep. The display can then not be woken.

Expected behaviour
Display should wake again.

Actual behaviour
Display goes blank.

System
Debug output:
gowat-10_13_2021_1435EDT.txt

  • Which model of Raspberry Pi? RPi4B
  • Which OS and version (cat /etc/rpi-issue)? Ubuntu 21.10
  • Which firmware version (vcgencmd version)?
    (This is the release firmware for Ubuntu 21.10)
Aug 31 2021 14:52:47
Copyright (c) 2012 Broadcom
version 67615e950e1e28b92dfae6303cf7a8b879a8908f (clean) (release) (start)

I have also tried this firmware:

Sep 30 2021 19:21:03
Copyright (c) 2012 Broadcom
version 51215b4f6e3966401ecd99652a35cf1c25069113 (clean) (release) (start)
  • Which kernel version (uname -a)?

5.15.0-rc5-v8-gc0acaa66d (built from c0acaa6 )

Logs

[  500.967377] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:77:crtc-3] flip_done timed out
[  500.967374] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  500.967659] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:77:crtc-3] commit wait timed out
[  511.207505] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  511.207773] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:71:plane-3] commit wait timed out
[  521.447604] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  521.447862] vc4-drm gpu: [drm] *ERROR* Timed out waiting for commit
@pelwell
Copy link
Contributor

pelwell commented Oct 13, 2021

FYI you have a typo in your config.txt:

dtovelay=rpivid-v4l2

@satmandu
Copy link
Contributor Author

@pelwell Thanks for the correction! I should stop doing edits over ssh from my phone!

Unfortunately, I see the same issue with that fix:

[  501.633276] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:77:crtc-3] flip_done timed out
[  501.633279] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  501.633541] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:77:crtc-3] commit wait timed out
[  511.873337] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  511.873620] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:71:plane-3] commit wait timed out
[  522.113376] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  522.113631] vc4-drm gpu: [drm] *ERROR* Timed out waiting for commit

@popcornmix
Copy link
Collaborator

Can you test with our 5.10 kernel?
I think this is an upstream regression that hit 5.10 when we pulled in upstream commits, and was resolved with a (partial) revert.
I don't think upstream has been fixed yet.

@satmandu
Copy link
Contributor Author

This is indeed not an issue with the 5.10 kernel from here.

Any chance of getting this fix upstreamed, or merged with the 5.15 kernel here to confirm that it isn't some other issue?

@popcornmix
Copy link
Collaborator

@mripard has been investigating but it wasn't trivial to see what was wrong with the upstream code.

@mripard
Copy link
Contributor

mripard commented Oct 14, 2021

One of the issue is that it's not clear what the fix is exactly :)

But as it turns out, I'm working on it at the moment. One of the reason it's been taking some time to address has been that there's several issues intertwined, and it's hard to isolate and fix them. It could also be that there's been a regression in 5.11 that has never been uncovered since it receives less testing than 5.10 can.

I've pushed my current fixes rebased on rpi-5.15.y here:
https://github.com/mripard/rpi-linux/tree/rpi/5.15-async-modeset-helpers

If it doesn't fix the issue you're seeing, could you let me know where I can grab/create an image of the system that triggers it?

@satmandu
Copy link
Contributor Author

Thanks. I'll do a build from that branch and see what happens.

@satmandu
Copy link
Contributor Author

@mripard building with your current kernel tree

/build/source/rpi-linux/drivers/gpu/drm/vc4/vc4_kms.c: In function 'vc4_atomic_commit_tail':
/build/source/rpi-linux/drivers/gpu/drm/vc4/vc4_kms.c:376:27: error: redeclaration of 'commit' with no linkage
  376 |   struct drm_crtc_commit *commit;
      |                           ^~~~~~
/build/source/rpi-linux/drivers/gpu/drm/vc4/vc4_kms.c:374:27: note: previous declaration of 'commit' was here
  374 |   struct drm_crtc_commit *commit;
      |                           ^~~~~~
make[8]: *** [/build/source/rpi-linux/scripts/Makefile.build:277: drivers/gpu/drm/vc4/vc4_kms.o] Error 1
make[7]: *** [/build/source/rpi-linux/scripts/Makefile.build:540: drivers/gpu/drm/vc4] Error 2
make[6]: *** [/build/source/rpi-linux/scripts/Makefile.build:540: drivers/gpu/drm] Error 2
make[5]: *** [/build/source/rpi-linux/scripts/Makefile.build:540: drivers/gpu] Error 2
make[4]: *** [/build/source/rpi-linux/Makefile:1868: drivers] Error 2
make[4]: *** Waiting for unfinished jobs....

@pelwell
Copy link
Contributor

pelwell commented Oct 14, 2021

It's a bad merge. Try deleting one of the two commit declarations.

@satmandu
Copy link
Contributor Author

After patching the rpi top of tree 5.15 with the two last commits in @mripard's tree, namely mripard@061eb3b and mripard@72be6d5, and then running sed -i '/struct drm_crtc_commit \*commit\;/{x;//!d;x}' drivers/gpu/drm/vc4/vc4_kms.c and building the kernel, I still get:

[  500.196128] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:77:crtc-3] flip_done timed out
[  500.196117] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  500.196379] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:77:crtc-3] commit wait timed out
[  510.436203] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  510.436475] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:71:plane-3] commit wait timed out
[  520.676264] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  520.676542] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:206:plane-25] commit wait timed out

Building your kernel @mripard I get the same error after the system goes to sleep:

[  500.196607] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:77:crtc-3] flip_done timed out
[  500.196601] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  500.196928] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:77:crtc-3] commit wait timed out
[  510.436696] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  510.436972] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:71:plane-3] commit wait timed out
[  520.676780] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  520.677050] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:206:plane-25] commit wait timed out
[  530.916876] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  530.917163] vc4-drm gpu: [drm] *ERROR* Timed out waiting for commit

Happy to help debug this further.

My configuration may be particular because of the edid file I am using in cmdline.txt...

@mripard
Copy link
Contributor

mripard commented Oct 15, 2021

Can you give me a way to download or create your image so that I can test?

@satmandu
Copy link
Contributor Author

@mripard Let me see if I can get a reproducer setup based on the Ubuntu Desktop 21.10 image at https://ubuntu.com/download/raspberry-pi

@satmandu
Copy link
Contributor Author

Actually, I made minor settings changes to the stock Ubuntu 21.10 image and got this issue running on a RPI4B. I think the only relevant change was to enable automatic login (during setup) and then disable automatic screen lock thus:

Settings -> Privacy -> Automatic Screen Lock (set to disabled.)

I left the blank screen delay at 5 minutes.

I then rebooted and waited. Once the screen went to sleep, I got the error.

[   35.804756] vcc-sd: disabling
[  348.637920] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:76:crtc-3] flip_done timed out
[  348.637912] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  348.638118] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:76:crtc-3] commit wait timed out
[  358.878171] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  358.878399] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:70:plane-3] commit wait timed out
[  369.118405] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  369.118626] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:205:plane-25] commit wait timed out
[  379.358641] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[  379.358909] vc4-drm gpu: [drm] *ERROR* Timed out waiting for commit

@juergh
Copy link
Contributor

juergh commented Oct 18, 2021

@satmandu You're running Ubuntu and we are working on this: https://bugs.launchpad.net/bugs/1946368.
I'm in discussion with Maxime so please be a little a patient. In the meantime you can try the kernel mentioned in comment 13 .

@satmandu
Copy link
Contributor Author

@juergh It was silly of me to assume this discussion had not already gone upstream. Thanks all!

@mripard
Copy link
Contributor

mripard commented Oct 19, 2021

Indeed, I can reproduce it with the setup you provided, I'll try to figure out what's going on. Thanks!

@juergh
Copy link
Contributor

juergh commented Oct 19, 2021

Indeed, I can reproduce it with the setup you provided, I'll try to figure out what's going on. Thanks!

Oh. You as in me? I was just about to upload an image for you Maxime... I guess you don't need that anymore?

@mripard
Copy link
Contributor

mripard commented Oct 19, 2021

I meant you as in @satmandu

@satmandu
Copy link
Contributor Author

For what it is worth. I built a 5.15-rc7 kernel at 3a3fc22 yesterday, and I noticed that the screen shows up correctly at boot, but that playing videos with ffplay appears to not actually show videos on the screen, which is interesting. Reverting to the stock ubuntu kernel (linux-image-5.13.0-1008-raspi) does allow videos to play correctly. Maybe this is another issue though...

@popcornmix
Copy link
Collaborator

popcornmix commented Oct 28, 2021

I've noticed kodi doesn't display video for the first few seconds on 5.15-rc7, but then starts and is fine for rest of file.
No errors in dmesg. Is this ffplay from ubuntu?
Any idea if it is trying to display a drm YUV plane? (modetest from another ssh session may show what should be onscreen).

@satmandu
Copy link
Contributor Author

Yes I'm using the ubuntu ffplay.

Here's what I see running FFREPORT=level=56:file=/tmp/ffplay.log XDG_RUNTIME_DIR=/run/user/1000 DISPLAY=:0 WAYLAND_DISPLAY=wayland-0 SDL_VIDEODRIVER=x11 /usr/bin/flock -n /tmp/ffplayvideo.lock nice -n 20 /usr/bin/ffplay -vcodec h264_v4l2m2m -rtsp_transport tcp rtsp://<rtsp URL> -fs -framedrop -fast

ffplay started on 2021-10-28 at 12:09:05
Report written to "/tmp/ffplay.log"
Log level: 56
ffplay version 4.4-6ubuntu5 Copyright (c) 2003-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-7ubuntu1)
  configuration: --prefix=/usr --extra-version=6ubuntu5 --toolchain=hardened --libdir=/usr/lib/aarch64-linux-gnu --incdir=/usr/include/aarch64-linux-gnu --arch=arm64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, rtsp, from 'rtsp://<rtsp URL>/live':0
  Metadata:
    title           : Session streamed by the WYZE Media Server
    comment         : live
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709, progressive), 1920x1080, 20 fps, 20 tbr, 90k tbn, 40 tbc
  Stream #0:1: Audio: pcm_alaw, 16000 Hz, 1 channels, s16, 128 kb/s
[h264_v4l2m2m @ 0x7f6c09a450] Using device /dev/video10
[h264_v4l2m2m @ 0x7f6c09a450] driver 'bcm2835-codec' on card 'bcm2835-codec-decode' in mplane mode
[h264_v4l2m2m @ 0x7f6c09a450] requesting formats: output=H264 capture=YU12

Turns out that adding -vcodec h264_v4l2m2m to the ffplay command line is what breaks the output. When I remove that I get video on the screen:

ffplay started on 2021-10-28 at 12:14:24
Report written to "/tmp/ffplay.log"
Log level: 56
ffplay version 4.4-6ubuntu5 Copyright (c) 2003-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-7ubuntu1)
  configuration: --prefix=/usr --extra-version=6ubuntu5 --toolchain=hardened --libdir=/usr/lib/aarch64-linux-gnu --incdir=/usr/include/aarch64-linux-gnu --arch=arm64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, rtsp, from 'rtsp://<rtsp URL>/live':0
  Metadata:
    title           : Session streamed by the WYZE Media Server
    comment         : live
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709, progressive), 1920x1080, 20 fps, 20 tbr, 90k tbn, 40 tbc
  Stream #0:1: Audio: pcm_alaw, 16000 Hz, 1 channels, s16, 128 kb/s
   5.83 A-V: -0.005 fd=   0 aq=   39KB vq=  248KB sq=    0B f=0/0

@popcornmix
Copy link
Collaborator

Funnily enough my kodi issue seems to be h264 only. hevc and sw decode are okay. Possibly a v4l2 decoder issue.

@satmandu
Copy link
Contributor Author

@popcornmix It does look like trying to offload h264 decoding causes the problem here.

@satmandu
Copy link
Contributor Author

Aside from that, 3a3fc22 does appear to have the sleep -> vc4-drm gpu: [drm] *ERROR* Timed out waiting for commit issue resolved. I can run sudo vcgencmd display_power 1 to bring the screen back on...

@popcornmix
Copy link
Collaborator

Are you able to revert 56c8fad and test?
That fixes it for kodi.

@popcornmix
Copy link
Collaborator

Changing line enable = (unsigned int)-5; to -6 makes it work for me.
I think the limit to the number of buffers allowed is too low and can deadlock if client holds onto more than expected.
Kodi gets out of it as it resets stream after a few seconds (and for some reason that is subsequently happy).

@jc-kynesim
Copy link
Contributor

I thought/hoped/prayed that that setting only affected input buffers, but I guess I was too hopeful. At this point I'm not sure what the correct answer is.

@pelwell pelwell added the KMS Issue Issues related to KMS/DRM drivers label Mar 3, 2022
@popcornmix
Copy link
Collaborator

Sounds like this was resolved - okay to close?

@popcornmix popcornmix added the Close within 30 days Issue will be closed within 30 days unless requested to stay open label Jan 24, 2023
@satmandu
Copy link
Contributor Author

Definitely resolved. I haven't seen this in a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Close within 30 days Issue will be closed within 30 days unless requested to stay open KMS Issue Issues related to KMS/DRM drivers
Projects
None yet
Development

No branches or pull requests

6 participants