Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pi 3 (B/B+) video output goes blank if gamma is enabled in HVS #971

Open
rps102 opened this issue Apr 11, 2018 · 20 comments
Open

Pi 3 (B/B+) video output goes blank if gamma is enabled in HVS #971

rps102 opened this issue Apr 11, 2018 · 20 comments

Comments

@rps102
Copy link

rps102 commented Apr 11, 2018

Recently the RISC OS video driver has enabled the gamma unit. There doesn't appear to be a mailbox message to do this, so instead we program the 3*256 tables in (using auto-increment, by setting b31 of the respective DISPCTRLx register) then setting b29 of the respective DISPBKGNDx register to enable.

The gamma correction itself operates as expected.

However, after ~4h (sometimes longer, but always less than 12h) the screen blanks as though the pixel clock has stopped - ie. the monitor reports no signal, as opposed to the image being black.

Of 9 recorded reports all have been on a Pi 3B or 3B+, other models seem unaffected, though that could just be skew because new purchases of Pi 3's need to get the newer OS to run on it.

Having found issue #407 it sounded like the GPU may have run out of memory bandwidth, but (via VNC) we find the OS is still running and:

*vcgencmd get_hvs_asserts
hvs_asserts=0
*vcgencmd dispmanx_list
display:2 format:RGBX32 transform:0 layer:-127 src:0,0,1280,1024 dst:0,0,1280,1024 cost:801 lbm:0
display:2 format:ARGB8888 transform:0 layer:2000 src:0,0,32,32 dst:516,499,32,32 cost:107 lbm:0

which seem comparatively low cost values compared with that issue, and no asserts.

The whole Pi is drawing ~600mA consistently during normal operation and after the video cuts out, from a TTi bench power supply. The chip package temperature is ~60C (40C above ambient) from a thermocouple. Heating the package with a hot air gun to ~100C (80C above ambient) didn't trigger the problem when applied for 5 minutes constantly.

The ARM can be idling or busy.

When the problem occurs it is still possible to read values via the mailbox interface.

Removing the setting of b29 of DISPBKGNDx allows good video output for several days (ie. all the other code is still active, still using autoincrement to write 3*256 table entries). It's the action of enabling then waiting a few hours...then no video signal.

@popcornmix
Copy link
Contributor

Can you show the code you use to write the tables and DISPCTRLx/DISPBKGNDx
Writing to registers that the firmware believes it owns is in the "undefined behaviour" territory - I'd like to know exactly which parts of the HVS are being changed without the firmware's knowledge.

@rps102
Copy link
Author

rps102 commented Apr 11, 2018

At startup this code enables the respective b31 auto increment and the b29 enable:
https://www.riscosopen.org/viewer/view/mixed/RiscOS/Sources/Video/HWSupport/BCMVideo/s/BCMVideo?rev=1.40;content-type=text%2Fx-cvsweb-markup#l1040

And a linear gamma is programmed by just writing 0-255 into slots 0-255 by ResetGamma:
https://www.riscosopen.org/viewer/view/mixed/RiscOS/Sources/Video/HWSupport/BCMVideo/s/BCMVideo?rev=1.40;content-type=text%2Fx-cvsweb-markup#l1470

At runtime the user can supply a buffer of their own which is copied in similarly to ResetGamma, but as mentioned above it's not necessary to call this at all. For completeness, here's SetGamma:
https://www.riscosopen.org/viewer/view/mixed/RiscOS/Sources/Video/HWSupport/BCMVideo/s/BCMVideo?rev=1.40;content-type=text%2Fx-cvsweb-markup#l1416

Not sure how you are reading ARM assembler, I could convert it to C if that's helpful?

@Behodar
Copy link

Behodar commented Jun 25, 2018

Is there a "proper" solution for this (eg. creating a proper mailbox message for it)?

@pittdj
Copy link

pittdj commented Jun 25, 2018

This issue continues to be a problem with the current development RISC OS builds, OS5.25. Disabling gamma by not setting bit 29 in the RISC OS source has been a complete 'fix' here with no further screen blanks.

Is there a way forward?

The RISC OS Forum thread may be of interest. https://www.riscosopen.org/forum/forums/11/topics/10346?page=1

@timg236
Copy link

timg236 commented Jul 13, 2018

A mailbox would certainly be possible and as as @popcornmix says writing to HVS registers directly can cause problems. Although, looking at the code I can't see anything that would explain this failure and I would expect the failure to happen immediately.

The memory bandwidth for the composition looks reasonable and the gamma table read overhead should be pretty small.

If VideoCore is still alive then it should be possible to capture all the HVS registers when the failure occurs e.g. if you poll DISPSTATx you should see the LSBs advancing as it scans out.

Dumping the context/display list memory (see SCALER_DLIST_START in vc4 kms driver) and the gamma tables would also be useful just in-case it's been corrupted.

Afterwards, it would be interesting to know whether a HDMI hotplug made any difference.

@JonAbbott2
Copy link

HDMI hotplugging does not recover the situation. I can pretty much reproduce this issue within seconds of enabling Gamma with the desktop resolution set at 1360x768, knock the resolution down to 640x480 (HVS is upscaling to 1360x768) and I've yet to reproduce the issue, so its possible bandwidth and/or source resolution is a factor.

When the issue occurs, both audio and video output cease but there does appear to be a video signal on the HDMI. The CPU is unaffected by the issue - the OS is still usable in the background.

CONFIG.TXT for reference:

hdmi_cvt=1360 768 50 3 0 0 0
hdmi_group=2
hdmi_mode=87
hdmi_drive=2
hdmi_force_edid_audio=1
fake_vsync_isr=1
framebuffer_swap=0
gpu_mem=64
init_emmc_clock=100000000

@Behodar
Copy link

Behodar commented Jul 14, 2018

For what it's worth, I've had the issue at 1280x1024, without any specific HDMI lines in config.txt (I do normally have hdmi_blanking=1 but confirmed that it still happens without it). All other settings are the RISC OS image defaults.

@timg236
Copy link

timg236 commented Aug 8, 2018

Would it be possible to get a dump of the DISPSTATx registers and the HVS context memory from the failing case? That would help confirm whether it's a bandwidth issue or whether the HVS is unhappy with a display list.

If it's a bandwidth issue then dispmanx_offline=1 in config.txt might help

@JonAbbott2
Copy link

Would it be possible to get a dump of the DISPSTATx registers and the HVS context memory from the failing case?

Sure, what are the address ranges?

@timg236
Copy link

timg236 commented Aug 9, 2018

The HVS register address range can be found from the VC4 DRM driver dtsi

base: 0x7e400000
size: 0x6000

https://github.com/raspberrypi/linux/blob/9d2ad143e40c38d34be86578840499a976c0a5b0/arch/arm/boot/dts/bcm283x.dtsi

The normal registers are in the range
0x7e400000 to 0x7e4000e0

Gamma data starts at 0x7e4000e0 and the display list memory is 0x7e402000 to 0x7e406000

https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/gpu/drm/vc4/vc4_regs.h

The registers are the most important bit but it's useful to get a snapshot of context memory. IIRC reads from dlist memory from the ARM are not 100% reliable so it can be worth dumping it a couple of times in order to get a consistent view.

Another register of interest is the pixelvalve status because this is responsible for the scanout/timing (see PV_STAT in vc4_crtc.c)
0x7e20602c, 0x7e20702c, 0x7e80702c
https://dri.freedesktop.org/docs/drm/gpu/vc4.html

@JonAbbott2
Copy link

You can download the register dump from here

I've dumped the HVS register range three times and there's dumps from pre and post the issue occurring. Where an address is unreadable, the value is set to 0xDEADDEAD

@timg236
Copy link

timg236 commented Aug 17, 2018

Thanks for attaching those files. I don't know the interval between the memory dumps (e.g. within same v-sync). However, DISPSTAT1 at offset 0x58 is the same in each of the 3 post files but changes in the pre_ files.

Bits 28/29 are both zero so it doesn't look like an underrun but the frame counter bits 17:12 is unchanged so perhaps it's not seen a VSTART from pixelvalve

I haven't gone through the display lists yet. It might be worth grabbing all of pixelvalve 1 registers. 7E807000 - 7E80707c

If I get chance I'll enable gamma on Pi3 and see if I can duplicate the issue with a couple of fullscreen RGBA32 layers.

@JonAbbott2
Copy link

I don't know the interval between the memory dumps (e.g. within same v-sync). However, DISPSTAT1 at offset 0x58 is the same in each of the 3 post files but changes in the pre_ files.

The three dumps are within a few VSync of each other.

It might be worth grabbing all of pixelvalve 1 registers. 7E807000 - 7E80707c

I've done another dump with all the pixelvalve registers, which you can download here.

@JamesH65
Copy link
Contributor

JamesH65 commented Jan 9, 2019

There is a new mailbox call for setting gamma. 0x00008012.

// first parameter table display ID, second VC pointer to buffer
// of 768 byte entries, R,G,B tables

See example linux code here. https://github.com/JamesH65/setgamma

I've not tried running for the length of time indicated in the OP though.

@JonAbbott2
Copy link

Thanks for providing the mailbox to set gamma.

Unfortunately, having coded this up today the issue remains, with the screen blanking within seconds on my Pi3.

Was any analysis done on the additional register dumps I grabbed on 17th Aug to confirm the state of pixelvalve 1 after it blanks?

@JamesH65
Copy link
Contributor

Weird, as I did run it for longer than a few seconds - running through the test app linked above, and saw no problems with blanking at all. Does it fail on multiple different Pi's? Did you try the test app?

@timg236 Any ideas?

@JonAbbott2
Copy link

The tests are being done under RISC OS, as that's the OS that's affected. For completeness however I have run the test app today, which works as expected under Linux with no blanking issues.

Testing is being done by myself on a Pi3 (original version) in a pi-top. Other users have seen the issue on Pi3B's as reported in the OP and at resolutions different to mine. I can however substantially reduce the issue by lowering the desktop resolution in RISC OS but leave the hardware at 1360x768 @ 50Hz.

@JamesH65
Copy link
Contributor

I have no idea what that could be. Under RISCOS do you set up all the HVS/Pixelvalve stuff manually or do you rely on the firmware to do it?

@Phlamethrower
Copy link

RISC OS relies on the firmware to set up the hardware. We use the mailbox property interface for resizing and controlling the desktop overlay, and dispmanx/vchiq for a hardware mouse pointer overlay. We also use dispmanx for YUV overlays for video playback and tvservice for switching to different mode timings, but this screen blanking problem happens even if those features are disabled / not used.

Jon's recent reports on the RISC OS Open forums suggest that the hardware mouse pointer is a contributor, as switching to a software pointer appears to avoid the problem. But since that's a mere 32x32 image I get the feeling that there must be more going on - I don't remember running into any screen blanking problems when I was implementing & testing the YUV overlay support. (But I've never been able to find a reliable repro for the problem anyway - it seems very dependent on specific Pis or setups).

I'll double-check our code for the hardware mouse pointer, and see if I can feed some more test builds/code to Jon so that we can find out any more about what's causing it to fail.

@JonAbbott2
Copy link

JonAbbott2 commented Jan 26, 2019

I've done a lot of testing on this issue since being notified about the gamma mailbox. Where we initially thought it was gamma causing the issue, it's looking more likely that its creating a hardware pointer via VCHIQ that's the root of the problem - at the time RISC OS was modified for the Pi, there was no mailbox to create a hardware pointer and the advice from @popcornmix was to implement via VCHIQ.

VCHIQ in RISC OS is based on the linux source, with a RISC OS front end and was updated in December, so is based on recent source code.

Although I've yet to produce a repro to force the screen to blank, I have managed to get pointers to be randomly left on screen when the pointer is created via VCHIQ and updated constantly. This does eventually result in the GPU either shutting down or zooming in on a small section of the desktop. @Phlamethrower is investigating to see if this is down to an issue in RISC OS, although from what we can tell its possibly timing related.

I have however managed to completely eliminate the blanking issue by lowering the low CPU/GPU speeds in CONFIG.TXT:
arm_freq_min=100
gpu_freq_min=75

Oddly, even when the CPU is forced to full speed the issue doesn't occur so WFI/WFE, internal CPU throttling etc may be coming into play.

Rather interestingly, these settings also result in a blank screen from boot on my Pi3 in probably 3/10 boots. The only fix is to power cycle the Pi, so this may be a new issue, or possibly related to the one we're investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants