Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H264 MMAL encoder stops sending output buffers #1562

Closed
mdevaev opened this issue Apr 13, 2021 · 16 comments
Closed

H264 MMAL encoder stops sending output buffers #1562

mdevaev opened this issue Apr 13, 2021 · 16 comments

Comments

@mdevaev
Copy link

mdevaev commented Apr 13, 2021

Describe the bug
The H264 MMAL encoder stops sending output buffers shortly after the process begins. It seems that the problem is somehow related to the hardware and possibly a specific board. Maybe it's a hardware bug or a defect. The problem is reproduced on two boards out of at least several hundred. On my own Pi4 is not reproduced but is reproduced by my user. The problem is not related to any video capture devices, just start the process of encoding the RAW image.

To reproduce
The exact method of reproduction is unknown, it's only known that on two different boards (rev1.2 4gb and rev1.1 1gb) with the same cores and firmware, the results are different.

  • Build ustreamer with omx/mmal support (WITH_OMX=1 make flag).
  • Dummy run: ustreamer --encoder=noop --device=foo --h264-sink=test.
  • Observe (or not) hanging after ~5 seconds.

I've attached two logs: for rev 1.2 where there is a hang and for rev 1.1 where there is no hang. For 1.2, the hang occurs in 100% of cases after startup due to the inability to wait for the output buffer. After killing and restarting the ustreamer, mmal_wrapper_create() freezes as well. I guess the component goes into some kind of dead state. A couple of logs are attached at the bottom: reports from a hung board and one log with a working one.

System
Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:

  • Pi 4 rev 1.2 4Gb
  • Arch Linux ARM
vcgencmd version
Mar 15 2021 15:49:58
Copyright (c) 2012 Broadcom
version 15a030ce95da2e128fe35f05e9246edca4839eb9 (clean) (release) (start)
  • Linux pikvm 5.10.23-2-ARCH #1 SMP Thu Mar 18 18:33:35 MSK 2021 armv7l GNU/Linux

Logs
Hanging: fail.txt and vcdbg.txt
No hanging:
ok.txt

@mdevaev
Copy link
Author

mdevaev commented Apr 13, 2021

@6by9 Hi again. I brought a strange problem here that looks like #417 from 2015. It looks a little different, but it may be related. I would like to think that the problem is in my code, but it seems that something is wrong with the board. Interestingly, the encoders behave a little differently. The encoder on a "working" board generates a key frame and small frames of only a few bytes. The "faulty" board generates frames each 1 kilobyte in size and then the encoder freezes. See fail.txt and ok.txt.

@mdevaev
Copy link
Author

mdevaev commented Apr 13, 2021

Found another two another 1.2 4gb. Can't reproduce. Only one board affected.

PS: Kernel params: cma=128, gpu_mem=128M. Tried both to 256 also. No underpower issues.

@mdevaev
Copy link
Author

mdevaev commented Apr 16, 2021

My user bought another 4gb rev 1.2 board and this problem does not reproduce on it. Could it be that he just came across a device with a defect? Maybe it's some difference in microcode (I don't know if it's in the RPi), EEPROM?

@popcornmix
Copy link
Contributor

popcornmix commented Apr 16, 2021

Just to confirm, are you using the same sdcard on the two Pi4s?
The bootloader is unlikely to affect this, but you should ensure both are up to date to confirm this see [here].(https://www.raspberrypi.org/documentation/hardware/raspberrypi/booteeprom.md)
It might be worth trying over_voltage=2 (added to config.txt) on the failing Pi in case it is a bit marginal at default voltage.

@mdevaev
Copy link
Author

mdevaev commented Apr 16, 2021

There was a new memory card for the new Pi. However, on the old Pi and the old memory card, we completely overwritten the images. Do you think it could be related to a broken card?

About the eeprom and power supply: I'll ask him to do a test and let him know. I will also ask you to swap the memory cards and check again.

@popcornmix
Copy link
Contributor

If you want to confirm if the problem is a hardware issue in the Pi4 then it's much more conclusive to test with the same sdcard on each. Otherwise there may be some unexpected difference in configuration between the two sdcards that is causing the issue.

@mdevaev
Copy link
Author

mdevaev commented Apr 16, 2021

You're right. We'll check it out.

@mdevaev
Copy link
Author

mdevaev commented Apr 18, 2021

@popcornmix The user says that he tried 5 different memory cards on the broken board. Not the same as on the working one, but it seems to rule out the problem of a broken memory card. However, he will test this and overvoltage in the coming days. We also checked the eeprom version: there was an older one on the working board. We updated it, but nothing broke, so it's not about the eeprom.

@popcornmix
Copy link
Contributor

but it seems to rule out the problem of a broken memory card

I wasn't thinking of a broken memory card. More a difference in configuration.
e.g. if one user has added custom config.txt settings, or has different versions of packages, then you want to rule that out as the source of the issue. To do that you want to test the same sdcard on each board.

You are currently saying
sdcard A on board 1 works
sdcard B on board 2 fails

That doesn't narrow down the issue. If you can say:
sdcard A on board 1 works
sdcard B on board 1 works
sdcard A on board 2 fails
sdcard B on board 2 fails

it's a lot easier to say the hardware is faulty. Even better if you only swap the Pi board. e.g. leave sdcard, power supply, attached peripherals, display connected, camera etc the same.

@mdevaev
Copy link
Author

mdevaev commented Apr 19, 2021

I understand you now. We have already excluded the difference in the software - each time it was the same OS image.

Right now we checked over_voltage=2 and it looks like using this option has stopped the hang. I've done several reboots with and without it. It hangs without it, but not with it.

Does checking for a raspberry replacement still make sense? If yes, I will ask the user to run the test exactly as you said.

@White-SAndS
Copy link

ok, i'm the user with the somewhat broken pi4.
i have used 6 usd cards, one frome samsung and 5 from sandisk. all of this cards where written with the same untouched image. if there is an change in any of the files of the image it has to be done from the pi4 or the system. not by me.
the cards i used are working in one of the pi4 but not in the other.
the change of "over_voltage=2" seems to be working. as i understand the overclocking document it does not touch the warranty and also not reliability.

@White-SAndS
Copy link

i have indeed exactly done this: "it's a lot easier to say the hardware is faulty. Even better if you only swap the Pi board. e.g. leave sdcard, power supply, attached peripherals, display connected, camera etc the same."

@popcornmix
Copy link
Contributor

Can you report output of vcgencmd get_version on the problematic board?

@mdevaev
Copy link
Author

mdevaev commented Apr 19, 2021

You mean vcgencmd version? Because get_version is unknown command. If so:

Mar 15 2021 15:49:58 
Copyright (c) 2012 Broadcom
version 15a030ce95da2e128fe35f05e9246edca4839eb9 (clean) (release) (start)

@popcornmix
Copy link
Contributor

Correct. That version is new enough.

I think we'd describe that board as an outlier, where one of the blocks is unreliable at default voltage.
(h264 and PCIE tend to be the blocks that have the least margin here).

You could also try with over_voltage=1 which may also work.

over_voltage settings have no effect on warranty bit and should have no effect on reliability, so should be a safe workaround.

Up to you if you stick with that setting in config.txt or seek a replacement unit from your seller (which is unlikely to have this issue).

@mdevaev
Copy link
Author

mdevaev commented Apr 19, 2021

@popcornmix It looks like over_voltage=1 is also working. Well, I'm glad that this is not some hard-to-debug software problem, and even more glad that there is a solution. That's pretty enough for me. Thank you!

@mdevaev mdevaev closed this as completed Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants