Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ninja install fails with install_megadrivers.py error #10

Open
emulti opened this issue Jul 15, 2020 · 35 comments
Open

ninja install fails with install_megadrivers.py error #10

emulti opened this issue Jul 15, 2020 · 35 comments

Comments

@emulti
Copy link

emulti commented Jul 15, 2020

I'm trying to update my Toshiba AC100 with the latest grate-driver (kernel 5.7.8, Arch Linux Arm) using the instructions on https://github.com/grate-driver/grate/wiki/Grate-driver:

Building natively on the AC100.

1.  git clone https://github.com/grate-driver/mesa.git
2.     cd mesa
3.     meson -Dprefix=/usr -Dgallium-drivers=grate -Ddri-drivers=swrast -Dplatforms=x11,drm -Dshared-glapi=true -Dgbm=true -Dglx=dri -Dosmesa=none -Dgles1=false -Dgles2=true -Degl=true -Dgallium-xa=false -Dgallium-vdpau=false -Dgallium-va=false -Dgallium-xvmc=false -Duse-elf-tls=false -Dgallium-nine=false -Db_ndebug=true -Dvulkan-drivers= -Dlibunwind=false -Dllvm=false build
4.     cd build/
5.     ninja && ninja install

Step 3 failed at first with 'no choice for 'grate'' error. I added 'grate' to the array under gallium-drivers in meson_option.txt and was then able to build mesa master branch with ninja in the first part of step 5.

However, the 'sudo ninja install' command fails:

[1/15] Generating git_sha1.h with a custom command
[1/2] Installing files.
installing /home/chris/Downloads/build/mesa/build/src/mesa/drivers/dri/libmesa_dri_drivers.so to /usr/lib/dri/swrast_dri.so
Installing src/mapi/shared-glapi/libglapi.so.0.0.0 to /usr/lib
Installing src/mapi/es2api/libGLESv2.so.2.0.0 to /usr/lib
Installing src/mesa/drivers/dri/libmesa_dri_drivers.so to /usr/lib/dri
Installing src/glx/libGL.so.1.2.0 to /usr/lib
Installing src/gbm/libgbm.so.1.0.0 to /usr/lib
Installing src/egl/libEGL.so.1.0.0 to /usr/lib
Installing src/gallium/targets/dri/libgallium_dri.so to /usr/lib/dri
Installing /home/chris/Downloads/build/mesa/include/KHR/khrplatform.h to /usr/include/KHR
Installing /home/chris/Downloads/build/mesa/include/GLES2/gl2.h to /usr/include/GLES2
Installing /home/chris/Downloads/build/mesa/include/GLES2/gl2ext.h to /usr/include/GLES2
Installing /home/chris/Downloads/build/mesa/include/GLES2/gl2platform.h to /usr/include/GLES2
Installing /home/chris/Downloads/build/mesa/include/GLES3/gl3.h to /usr/include/GLES3
Installing /home/chris/Downloads/build/mesa/include/GLES3/gl31.h to /usr/include/GLES3
Installing /home/chris/Downloads/build/mesa/include/GLES3/gl32.h to /usr/include/GLES3
Installing /home/chris/Downloads/build/mesa/include/GLES3/gl3ext.h to /usr/include/GLES3
Installing /home/chris/Downloads/build/mesa/include/GLES3/gl3platform.h to /usr/include/GLES3
Installing /home/chris/Downloads/build/mesa/include/GL/gl.h to /usr/include/GL
Installing /home/chris/Downloads/build/mesa/include/GL/glcorearb.h to /usr/include/GL
Installing /home/chris/Downloads/build/mesa/include/GL/glext.h to /usr/include/GL
Installing /home/chris/Downloads/build/mesa/include/GL/glx.h to /usr/include/GL
Installing /home/chris/Downloads/build/mesa/include/GL/glxext.h to /usr/include/GL
Installing /home/chris/Downloads/build/mesa/include/EGL/egl.h to /usr/include/EGL
Installing /home/chris/Downloads/build/mesa/include/EGL/eglext.h to /usr/include/EGL
Installing /home/chris/Downloads/build/mesa/include/EGL/eglplatform.h to /usr/include/EGL
Installing /home/chris/Downloads/build/mesa/include/EGL/eglmesaext.h to /usr/include/EGL
Installing /home/chris/Downloads/build/mesa/include/EGL/eglextchromium.h to /usr/include/EGL
Installing /home/chris/Downloads/build/mesa/include/GL/internal/dri_interface.h to /usr/include/GL/internal
Installing /home/chris/Downloads/build/mesa/src/gbm/main/gbm.h to /usr/include
Installing /home/chris/Downloads/build/mesa/src/util/00-mesa-defaults.conf to /usr/share/drirc.d
Installing /home/chris/Downloads/build/mesa/build/meson-private/glesv2.pc to /usr/lib/pkgconfig
Installing /home/chris/Downloads/build/mesa/build/meson-private/dri.pc to /usr/lib/pkgconfig
Installing /home/chris/Downloads/build/mesa/build/meson-private/gbm.pc to /usr/lib/pkgconfig
Installing /home/chris/Downloads/build/mesa/build/meson-private/egl.pc to /usr/lib/pkgconfig
Installing /home/chris/Downloads/build/mesa/build/meson-private/gl.pc to /usr/lib/pkgconfig
Running custom install script '/usr/bin/python /home/chris/Downloads/build/mesa/bin/install_megadrivers.py /home/chris/Downloads/build/mesa/build/src/mesa/drivers/dri/libmesa_dri_drivers.so /usr/lib/dri swrast_dri.so'
Running custom install script '/usr/bin/python /home/chris/Downloads/build/mesa/bin/install_megadrivers.py /home/chris/Downloads/build/mesa/build/src/gallium/targets/dri/libgallium_dri.so /usr/lib/dri'
FAILED: meson-install
/usr/bin/meson install --no-rebuild
ninja: build stopped: subcommand failed.

It looks like there is an argument missing to the install_megadrivers.py script for the libgallium_dri.so file.
Can you offer advice on how to fix this error please? I am new to the Meson build system, so don't know where the install script is taking values from.

@kusma
Copy link
Member

kusma commented Jul 16, 2020

Sounds like you're trying to use the wrong Mesa branch. We don't have a grate driver upstream, only our own fork. And that driver isn't useful for much more than running glxgears.

@emulti
Copy link
Author

emulti commented Jul 16, 2020

Thanks, seems it's the Mesa 19.3 branch that contains the grate driver. I will try and build that one.
The last build I did was 13 May 2019 when Autotools were still used.
I am happy to help with testing if there is interest. The AC100 is a nice little device though constrained with RAM and storage.

@digetx
Copy link
Member

digetx commented Jul 16, 2020

The 19.3 is the most actual branch and indeed you could only run glxgears using the current Mesa driver. Besides 3D, Mesa is also useful for the libvdpau-tegra because libvdpau core uses DRI for retrieving the VDPAU driver name, otherwise you'll need to manually specify the driver name in environment variables.

Help with the testing is very appreciated! And you could do quite a lot things on AC100 without a 3D driver!

All mobile devices are very resource-constrained and this is a big part of the development fun to optimize everything :)

@emulti
Copy link
Author

emulti commented Jul 17, 2020

Thanks for the info. I can confirm the 19.3 branch of Mesa builds fine, along with the other components including libvdpau-tegra. Maybe put a note on the wiki page to use 193. branch rather than Master?
I also built the libraries in 'Grate' which get installed in /usr/local/lib by default. Is there an associated test application?
The AC100 hardware is very nice, well balanced with a good keyboard. It runs nicely with i3 and lightweight apps like Sylpheed, Gnumeric, Abiword and company, and I got rid of a lot of bloat to make space on the EMMC.
There are some kernel oops during boot associated with clk.c and the nvec keyboard/touchpad doesn't always work on every boot. Nothing to do with Xorg/Mesa of course, but do you know where bug reports can be submitted?

@digetx
Copy link
Member

digetx commented Jul 17, 2020

I guess it should be better to replace the master branch with 19.3. Actually, I was going to update the master sometime ago, but haven't got to it yet. It will be fixed sometime soon, thank you for getting attention to it!

The 'Grate' test applications aren't installed, but you could run them manually by executing tests/grate/* tests/host1x/*. Also, there is no need to install the libraries because they are not used anywhere. There was intention to utilize the libgrate in the past, but then plans changed.

The vanilla upstream 5.7 kernel is known to produce the clk warnings, they are harmless and eventually will be fixed once patches will be backported from 5.8.

The NVEC driver isn't actively maintained, so should be more productive if you could submit patches instead of the bug reports :)

@emulti
Copy link
Author

emulti commented Jul 18, 2020

Thanks again. I built the grate-driver kernel (5.8.0-rc4...) and that doesn't have the problem with clk.c oops or the nvec failing to reset/touchpad freezing. I also rebuilt the libdrm, xf86-opentegra, mesa and libvdpau-grate packages.
While I would be happy to submit patches sadly it's beyond my technical capability.

However, I did find an issue with the 5.8.0-rc4 kernel on AC100.
It is not present with mainline 5.7.8.

After a cold start from power off, Xorg fails to start with repeated messages:
Jul 18 09:22:23 alarm kernel: [drm] tegra_drm_sched_timedout_job: 3d channel: pipes 0x2 (process:Xorg pid:616)
Jul 18 09:22:23 alarm kernel: [drm:tegra_drm_sched_timedout_job] ERROR 3d channel: pipes 0x2 (process:Xorg pid:616)
Jul 18 09:22:23 alarm kernel: tegra-gr3d 54180000.gr3d: [drm:tegra_drm_sched_timedout_job] resetting hardware
Jul 18 09:22:23 alarm kernel: [drm] tegra_drm_sched_timedout_job: 3d channel: pipes 0x2 (process:Xorg pid:616)
Jul 18 09:22:23 alarm kernel: [drm:tegra_drm_sched_timedout_job] ERROR 3d channel: pipes 0x2 (process:Xorg pid:616)
Jul 18 09:22:23 alarm kernel: tegra-gr3d 54180000.gr3d: [drm:tegra_drm_sched_timedout_job] resetting hardware
Jul 18 09:22:24 alarm kernel: [drm] tegra_drm_sched_timedout_job: 2d channel: pipes 0x1 (process:Xorg pid:616)
Jul 18 09:22:24 alarm kernel: [drm:tegra_drm_sched_timedout_job] ERROR 2d channel: pipes 0x1 (process:Xorg pid:616)
Jul 18 09:22:24 alarm kernel: tegra-gr2d 54140000.gr2d: [drm:tegra_drm_sched_timedout_job] resetting hardware
Jul 18 09:22:24 alarm kernel: [drm] tegra_drm_sched_timedout_job: 2d channel: pipes 0x1 (process:Xorg pid:616)
Jul 18 09:22:24 alarm kernel: [drm:tegra_drm_sched_timedout_job] ERROR 2d channel: pipes 0x1 (process:Xorg pid:616)
Jul 18 09:22:24 alarm kernel: tegra-gr2d 54140000.gr2d: [drm:tegra_drm_sched_timedout_job] resetting hardware
Jul 18 09:22:24 alarm kernel: [drm] tegra_drm_sched_timedout_job: 2d channel: pipes 0x1 (process:Xorg pid:616)

In the dmesg in this state there is this message:
tegra-mc 7000f000.memory-controller: host1xdmar: DMA blocked
tegra-mc 7000f000.memory-controller: host1xdmar: read @0xb565fa30: EMEM address decode error (EMEM decode error)

After a warm reboot, Xorg starts fine. The issue only happens when starting from power off. I am wondering if it is related to how u-boot (2013-07) initialises hardware, and if it is related to the inability to power off completely with poweroff command in recent kernels.

On a separate question, I tried running the tests in grate and host1x.
Most seem (gr3d) seem to fail with
INFO: x11_overlay_create:39 overlay unsupported
ERROR: grate_overlay_create: host1x_overlay_create() failed: -1

dmesg-cold.txt
dmesg-warm.txt

Xorg.0.cold.log
Xorg.0.warm.log

@digetx
Copy link
Member

digetx commented Jul 19, 2020

Thank you very much for the report!

Could you please try the recent grate-kernel update? I added this change grate-driver/linux@15996b0. It should be a kernel driver problem which pops up only if hardware is in a certain state during boot. This problem was already reported for AC100 not so long time ago and I couldn't reproduce it.

Host1x driver doesn't support recovering from a blocked DMA. So hardware is getting a reset, but DMA stays blocked until the warm reboot, hence that's why it works after warm reboot.

The overlay unsupported message and further error are supposed to happen when running tests under Xorg, you may ignore it. Could you please clarify what do you mean by seem to fail, do you get any other errors?

@emulti
Copy link
Author

emulti commented Jul 20, 2020

I'll update the kernel on AC100 and test again tonight if possible.
I should not have said 'seem', the following are results, running from the grate folder with tests/xxx/yyy so the files in asm are found:
host1x:
gr2d-blit: INFO: main:175: test passed
gr2d-clear: -displays magenta window, and 'overlay support missing'
gr2d-context: - see attachment
gr3d-triangle - displays gamut triangle, then magenta window, and 'overlay support missing'

grate:
clear: magenta window, ERROR: grate_overlay_create: host1x_overlay_create() failed: -1
cube, cube-textured: 'CgDrv_Create: BLOB compiler is unavailable' (x2) 'grate_program_new() failed'
cube-textured2: displays jailbars scene keyed over floating cubes behind
cube-textured3 fails after loading to '24%' with ERROR: host1x bo_create_helper:237: host1x_bo_create failed; ERROR:grate_create_texture: failed to allocate texture 2048x2048 bpp:8 pitch:4096; Segmentation fault (AC100 display too small?)
interactive: floating cube responds correctly to keyboard commands, in 'face cull mode' the text on cube is mirrored except in 'none', can be flipped with '2' key, front face direction. '3' depth function blanks cube in some modes, maybe as intended.
quad: 'CgDrv_Create: BLOB compiler is unavailable' (x2) 'grate_program_new() failed'
stencil: "Stencil test works!"
texture-filter: works, stepping pixelation on cube
texture-wrap: runs, not sure what is expected behaviour!
triangle: 'CgDrv_Create: BLOB compiler is unavailable' (x2) 'grate_program_new() failed'
triangle-rotate: 'CgDrv_Create: BLOB compiler is unavailable' (x2) 'grate_program_new() failed'
gr2-context.txt

@digetx
Copy link
Member

digetx commented Jul 20, 2020

All results look good! Please ignore the failed tests because they depend on extra bits, and thus, expected to fail in a default userspace/kernel configuration.

The cube-textured3 requires a lot of free contiguous memory, you may try to get it working by adding cma=128M (maybe even 256M) into the kernel's cmdline arguments.

@emulti
Copy link
Author

emulti commented Jul 20, 2020

Does allocating larger cma reduce the amount of memory for non-gpu-related applications? I was thinking of trying reducing it from the current 64M if it does, memory is short on ac100.

I built the linux-grate 5.8.0-rc5-g81e240239be6 (after yesterday's commits)
Unfortunately on cold boot it hangs, before systemd journaling starts, and I don't have a serial console connection to capture the issue. But we probably know what it's from...
dmesg-5.8.0-rc5.txt

On warm boot there is an issue with a duplicate regulator name regulator.5:cpu0 (from the device tree, maybe cpufreq/DVFS related?) as in the attached dmesg.
Strange, Device tree binary is different size but there were no new dts commits I have found, maybe I am looking in the wrong place.

@digetx
Copy link
Member

digetx commented Jul 20, 2020

Larger CMA shouldn't reduce the amount of memory. It's a reusable (system) memory that could be swapped out during of contiguous allocation, it's not a carveout.

Hmm.. now I'm also seeing that there is some kernel problem using next-20200717:

 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281
 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 12, name: kworker/0:1
 CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.8.0-rc5-next-20200717-00162-g4bcedc60754a #2833
 Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
 Workqueue: rcu_gp srcu_invoke_callbacks
 [<c010dad5>] (unwind_backtrace) from [<c0109481>] (show_stack+0x11/0x14)
 [<c0109481>] (show_stack) from [<c0469da9>] (dump_stack+0x8d/0x9c)
 [<c0469da9>] (dump_stack) from [<c013e67d>] (___might_sleep+0xed/0x11c)
 [<c013e67d>] (___might_sleep) from [<c09b8d6d>] (mutex_lock+0x1d/0x54)
 [<c09b8d6d>] (mutex_lock) from [<c055bd03>] (device_del+0x2b/0x2a4)
 [<c055bd03>] (device_del) from [<c055bfd9>] (__device_link_free_srcu+0x41/0x50)
 [<c055bfd9>] (__device_link_free_srcu) from [<c017037f>] (srcu_invoke_callbacks+0x9b/0x100)
 [<c017037f>] (srcu_invoke_callbacks) from [<c0133dbd>] (process_one_work+0x145/0x408)
 [<c0133dbd>] (process_one_work) from [<c0134179>] (worker_thread+0xf9/0x3c4)
 [<c0134179>] (worker_thread) from [<c0138ef3>] (kthread+0x10b/0x13c)
 [<c0138ef3>] (kthread) from [<c010015d>] (ret_from_fork+0x11/0x34)
 Exception stack(0xef159fb0 to 0xef159ff8)

It probably has the same root as yours regulator issue. I'll check if today's next still has that issue and will ping you once the problem will be resolved.

@digetx
Copy link
Member

digetx commented Jul 21, 2020

Please give a try to the recent grate-kernel update, I reverted the offending commits.

@emulti
Copy link
Author

emulti commented Jul 23, 2020

Yes, the reverted patches (5.8.0-rc6...) put things back as they were (host1xdmar: DMA blocked on first cold boot).
In testing with the previous kernel 5.8.0-rc5... I found that the results are not consistent from cold boot. It depends whether the system was powered off by software (holding down power button after 'poweroff', because the system is in unresponsive state after with power LED still on), or by removing the battery. So the condition of the EC varies.
This incomplete powerdown issue has been reported on kernels after at least 5.4, but is not present on 5.1.1 which I have used previously and tested again today. Perhaps this is linked to the incorrect initialisation of host1x. I will try and find out what changed in the nvec code between 5.1 and 5.4, but if doesn't look like much.
The issue 'sysfs: cannot create duplicate filename '/devices/virtual/devlink/regulator.5:cpu0' is not present any more in rc6. I changed the nvec driver to built-in rather than an module, it is loaded much earlier. uvcvideo driver is also now compiled-in.

@digetx
Copy link
Member

digetx commented Jul 26, 2020

Thank you for the report! I'll add some debug messages to the host1x driver that may help to figure out what's wrong, will ping you once debugging will be ready for the testing.

@digetx
Copy link
Member

digetx commented Aug 3, 2020

@emulti Could you please fetch a recent grate-kernel update and post kernel boot log? I added some debug messages which may shed some light on the host1x problem.

@emulti
Copy link
Author

emulti commented Aug 3, 2020

Updated- here are cold and warm boot dmesg

I am tracking down the issue with incorrect power-off using git bisect. It takes a while... Somewhere between 5.4.8 (good) and 5.4.10 (bad) on the linux-stable tree. But the indication is that these issues are not linked, after a 'good' shutdown, the 'Host1x DMA blocked' issue still occurs with linux-grate after a cold boot.

dmesg-grate-warm.txt
dmesg-grate-cold.txt

@digetx
Copy link
Member

digetx commented Aug 3, 2020

Thank you! I pushed another update to the grate-kernel, now host1x driver resets the memory client state and there are couple more messages. Please give it a try and post the cold-boot log.

@emulti
Copy link
Author

emulti commented Aug 4, 2020

Here are cold and warm boot dmesg of master branch as of 4 Aug.
One patch has been applied to fix the 'incomplete power off" issue with AC100. This is a revert of offending commit:
43cf75d96409a20ef06b756877a2e72b10a026fc upstream.
exit: panic before exit_mm() on global init exit (21 Dec 2019)

Cold boot log was taken after sudo poweroff following warm boot, the errors
tegra-mc 7000f000.memory-controller: host1xdmar: DMA blocked
tegra-mc 7000f000.memory-controller: host1xdmar: read @0xb560fa30: EMEM address decode error (EMEM decode error)

There is also an error
tegra2-devfreq: memory controller has no timings
which I think is because this unit has the type of DRAM for which no timing info is available for the device tree.

dmesg-grate-cold-0408.txt
dmesg-grate-warm-0408.txt
poweroff first bad commit.txt

@digetx
Copy link
Member

digetx commented Aug 4, 2020

Thanks for the testing!

I don't have any good comments regarding the power-off issue, could be that the offending change unmasks some other problem.

The host1x trouble remains mysterious for now.

@thierryreding @cyndis do you have any idea why host1x isn't idling on AC100 on a cold boot? That's likely to be a bug in the grate-kernel host1x driver, but it's not apparent to me what's wrong.. although the problem is known to exist only on AC100. Does host1x have any register-writes buffering (not mentioned in TRM) that needs to be flushed? The upstream host1x driver also uses a different DMA usage scheme and it could be that the problem isn't visible in upstream because the CDMA limits are set to the push buffer's start/end, while in grate-kernel the addressing is unlimited, hence host1x should silently stop on fetching from a wrong memory address in upstream.

@emulti
Copy link
Author

emulti commented Aug 4, 2020

The bad commit causing the power-off issue was identified with a git bisect on the linux-stable tree, which I assume means it is definitely the culprit.

I also built 5.7.9 from the stable tree. Reverting that commit also restores power-off behaviour. When the nvec keyboard is probed an oops in the clk driver is reliably caused, as in attached dmesg. Sorry, I don't know how to analyze the stack traces.

dmesg-5.7.9.txt

The logging is different and includes:
[ 0.085113] tegra20-emc 7000f400.memory-controller: no memory timings for RAM code 1 found in device tree
[ 0.085151] tegra20-emc: probe of 7000f400.memory-controller failed with error -22

linux-grate has:
[ 1.866723] tegra20-emc 7000f400.memory-controller: no memory timings for RAM code 1 found in device tree
[ 1.877055] tegra20-devfreq tegra20-devfreq: memory controller has no timings

Should I build again with DVFS disabled?

@digetx
Copy link
Member

digetx commented Aug 4, 2020

Looking at the NVEC driver, I think it's the source of the problem because interrupts might be disabled at the kernel's power-off stage.

https://elixir.bootlin.com/linux/v5.8/source/drivers/staging/nvec/nvec.c#L760
https://elixir.bootlin.com/linux/v5.8/source/drivers/staging/nvec/nvec.c#L273

The Tegra I2C suffered from a similar problem until it got support for atomic transfers, which allow I2C transfers to be made with disabled interrupts.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ede2299f7101a79fe8610ca0000734c9887ad4b2

Somebody needs to implement the polling transfer mode for NVEC driver and use it for nvec_power_off() in order to resolve the problem.

The backtrace in the 5.7 log is harmless, it's a known problem that eventually should be fixed by backporting fix from 5.8.

The EMC / devfreq messages are also okay, the EMC driver improvement which will silence the error is pending to be upstreamed.

@emulti
Copy link
Author

emulti commented Aug 6, 2020

Thanks for your advice! The EC does shut down reliably on power-off (battery doesn't drain, or very slowly) once the commit mentioned is reverted. Because, I assume, the shutdown message gets through (by luck) to the EC before interrupts are disabled. The commit maybe makes that happen earlier, so the EC remains on and isn't listening for a power-button-press to power up again.

I read up on atomic transfers enough to understand that implementing them in the nvec driver is way beyond my capabilities...
Would it be enough to just implement just for nvec_power_off and presumably nvec_suspend, and leave the other transfers as interrupt driven?

I do have a trivial but useful patch to nvec.c that unmutes the AC100 internal speakers on Resume. I have no idea how to get that into the kernel though.

I have surprised myself with how much functionality and simultaneous tasks can be done on an AC100 with a dual-core 1Ghz CPU and 512MB of RAM, once bloat is carefully removed.

@digetx
Copy link
Member

digetx commented Aug 8, 2020

Yeah, it probably happened to work by luck before. If NVEC driver could re-enable interrupts on power-off, then it could become a one-line fix. Somebody should check the kernel's shutdown code path in order to see if it's a safe thing to do.

Adding a polling alternative to the interrupt-driven code shouldn't be much work to do, I may try to type a draft patch sometime later on. I don't know whether it's possible to change only the nvec_power_off() because not very familiar with the NVEC. Maybe @paulfertser could help?

Regrading submitting patches to upstream, please see https://www.kernel.org/doc/html/latest/process/submitting-patches.html, you may also find video tutorials on YouTube. And of course please feel free to ask any questions on IRC, I'll be glad to help.

Short example of submitting a kernel patch:

# git format-patch -v1 -1 df8476db20b7

# ./scripts/checkpatch.pl --strict v1*

# ./scripts/get_maintainer.pl v1*

# git send-email --smtp-server=smtp.gmail.com --smtp-user=digetx@gmail.com --smtp-encryption=tls --smtp-server-port=587 --suppress-cc=all --to 'Thierry Reding <thierry.reding@gmail.com>' --to 'Jonathan Hunter <jonathanh@nvidia.com>'  --cc 'linux-tegra@vger.kernel.org' --confirm=always v1*

Could you please give a try to the recent grate-kernel update? I added DMA addressing limitation for Host1x https://github.com/grate-driver/linux/blob/master/drivers/gpu/host1x/soc/channel_hw.c#L288, maybe it will help. Although, in best case it will put grate-kernel driver on par with the upstream driver, the real origin of the problem will remain unknown.

@emulti
Copy link
Author

emulti commented Aug 8, 2020

Thanks for info on submitting patches etc.
After building grate-kernel update, the "tegra-mc 7000f000.memory-controller: host1xdmar: DMA blocked" is still present after cold boot.
I also noticed the Xorg X-video extension is most often not initialised correctly, ("opentegra(0) xv.c:
Xorg.0.log_bad_0807.txt

This after a warm reboot. Maybe one time in ten it will initialise correctly:
Xorg.0.log_good_0807.txt

On one occasion Xorg failed to start (VGA arbiter: cannot open kernel arbiter, no multi-card support") but I think this could be a systemd or configuration issue ((EE) systemd-logind: failed to get session: PID 475 does not belong to any known session)
Xorg.0.log_fail.txt

I noticed a new Staging driver is in WIP for an Acer Iconia A500 EC, an ENE KB930 with custom firmware according to the file. This is actually broadly similar to the AC100 EC, which is an ENE KB926 with Toshiba (Compal?) firmware. In the Toshiba case the KB926 handles the keyboard and touchpad rather than the T250. But maybe some parts of the code can point to how to improve the Nvec driver?

@digetx
Copy link
Member

digetx commented Aug 10, 2020

Thanks, it's a good sign that the DMA isn't fixed, meaning that it should be a local driver issue. I pushed update to the grate-kernel which dumps channels hardware state on boot, could you please show the cold boot log?

The Xv problem is odd, I haven't ever seen it and not sure how it could happen. Could you please give a detailed steps of how to reproduce it?

The Acer EC uses Tegra I2C driver for the I2C transfers, hence it uses atomic transfer for sending the power-off command. NVEC driver should do the same thing as the Terga I2C, i.e. to poll interrupt status instead of waiting for interrupt event on shutdown.

@emulti
Copy link
Author

emulti commented Aug 10, 2020

I'll rebuild tomorrow and send the new log. By local driver issue, do you mean a config mistake in the kernel? The config I am using is based on tegra_defconfig as attached.

The XV initialization issue occurs just by starting Xorg using xinit/.xinitrc/startx. It is the same whether Xorg is running as root or as the unprivileged user. Just sometimes, it inits correctly, with no other changes in configuration, maybe a warm reboot. 'xvinfo' shows 'no adaptors present' for screen 0.

I fixed the session issue with systemd-logind by adding '-keeptty' to the Xserver config file. It was linked to the communication between systemd-logind and Xorg-server introduced in 1.16 to allow Xorg to run as an unprivileged user. The user is in the 'video' group, and permissions on /dev/dri/card0 are 660, owner root, group video.

config.txt

I'll see if I can figure out how to adapt the polling method used in the Acer EC driver to work in NVEC.

@digetx
Copy link
Member

digetx commented Aug 10, 2020

I meant it should be a grate-kernel driver bug.

Could you reproduce the Xv problem by starting Xorg without using startx? Just by running Xorg from root.

Unprivileged Xorg doesn't work well on many Linux distros, I only managed to load unprivileged Xorg on Debian and even then it doesn't work from ssh session.

@emulti
Copy link
Author

emulti commented Aug 11, 2020

Running Xorg from root with sudo Xorg: XVideo adaptor is initialised correctly every time. But if -keeptty is appended (to allow systemd-logind to control the session) it fails.
Same with a DM like lxdm: Xvideo is OK every time
But from startx or xinit it fails (almost) every time, both with Xorg running as root or as a user.
I haven't been able to find out the reason. The environment seems to be the same, and the same files .xinitrc .xserverrc and .xprofile are sourced with a DM and with startx/xinit. I am guessing it is to do with systemd-logind and permissions. Under the DM systemd-logind complains it can't keep track of the session: "PID xxx does not belong to any known session"

@emulti
Copy link
Author

emulti commented Aug 11, 2020

Taking the various init, profile files etc. out of the equation:

sudo Xorg
...
(II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
...

(II) opentegra(0): XV adaptor initialized

But

sudo Xorg -keeptty (or sudo Xorg vt$XDG_VTNR) : 
...
systemd-logind: took control of session /org/freedesktop/login1/session/c1
...

(EE) opentegra(0): xv.c:1749/TegraXvGetDrmPlaneProperty(): Failed to get "CRTC_ID" property
(EE) opentegra(0): xv.c:1750/TegraXvGetDrmPlaneProperty(): Available properties:
(EE) opentegra(0): xv.c:1757/TegraXvGetDrmPlaneProperty():         "type"
(EE) opentegra(0): xv.c:1757/TegraXvGetDrmPlaneProperty():         "IN_FORMATS"
(EE) opentegra(0): xv.c:1757/TegraXvGetDrmPlaneProperty():         "zpos"
(EE) opentegra(0): xv.c:1757/TegraXvGetDrmPlaneProperty():         "YUV to RGB CSC"
(EE) opentegra(0): xv.c:1757/TegraXvGetDrmPlaneProperty():         "rotation"
(EE) opentegra(0): xv.c:1757/TegraXvGetDrmPlaneProperty():         "colorkey.plane_mask"
(EE) opentegra(0): xv.c:1757/TegraXvGetDrmPlaneProperty():         "colorkey.mode"
(EE) opentegra(0): xv.c:1757/TegraXvGetDrmPlaneProperty():         "colorkey.mask"
(EE) opentegra(0): xv.c:1757/TegraXvGetDrmPlaneProperty():         "colorkey.min"
EE) opentegra(0): xv.c:1757/TegraXvGetDrmPlaneProperty():         "colorkey.max"
(EE) opentegra(0): xv.c:2011/TegraXvScreenInit(): XV initialization failed
(II) opentegra(0): VBLANK initialized
(II) opentegra(0): [DRI2] Setup complete
(II) opentegra(0): [DRI2]   DRI driver: tegra
(II) opentegra(0): [DRI2]   VDPAU driver: tegra
(II) opentegra(0): DRI2 initialized
(EE) opentegra(0): failed to set mode: Permission denied(II) Initializing extension Generic Event Extension

It appears that systemd-logind and its partner systemd-pam are somehow restricting access to the hardware if they grab control of the session.

@emulti
Copy link
Author

emulti commented Aug 11, 2020

Cold and warm boot logs with linux-grate kernel from 20200810 are attached.
dmesg_cold_5.8.0-20200810.txt
dmesg_warm_5.8.0-20200810.txt

edit, adding a third log of warm boot after cold boot+reboot, the 'random' values for 'dmaget' become cleaned up.
dmesg_warm2_5.8.0-20200810.txt

@digetx
Copy link
Member

digetx commented Aug 11, 2020

Thank you for the testing! I still don't know why Host1x is misbehaving on cold boot on AC100, but I pushed another change to the grate-kernel that makes CDMA to be stopped after initialization, could you please give it a try?

I couldn't reproduce the Xv problem on Ubuntu 20.04. Could you please tell what distro you're using?

(II) systemd-logind: took control of session /org/freedesktop/login1/session/c1
...
(0): XV adaptor initialized

@emulti
Copy link
Author

emulti commented Aug 11, 2020

I have Arch Linux Arm installed to emmc.
By default, Xorg is run rootless when starting from xinit or startx. But I have tried overriding this (/etc/X11/Xwrapper.config) and it makes no difference whether root or rootless. It seems when systemd takes over the session then there is an issue with opentegra reading card properties.

I corrected an error in directory permissions on /usr/share/polkit-1/rules.d that was preventing polkit loading default rules, (logged as a bug on Arch site) but I don't think it's linked with the XV issue. I'll keep investigating.
DRI devices are tagged as 'uaccess' in udev so should be fully accessible, but there is some method that systemd-logind uses to revoke that access when a user session is not 'Active'. Does the opentegra driver run as group 'video'?
This is the contents of /dev/dri:

drwxr-xr-x  2 root root         80 Aug 11 19:56 by-path
crw-rw----+ 1 root video  226,   0 Aug 11 19:56 card0
crw-rw-rw-  1 root render 226, 128 Aug 11 19:56 renderD128

It's interesting that /dev/dri/card0 has no user-level access.

Here are the boot logs with latest update.
On cold boot the tegra-mc: DMA blocked message is gone and Xorg can be started.
Warm boot is also OK.
The debug info from the last update isn't shown now.
dmesg_cold_5.8.0-20200811.txt
dmesg_warm_5.8.0-20200811.txt

@digetx
Copy link
Member

digetx commented Aug 11, 2020

Xv needs access to the DRM atomic UAPI, I guess the permission is getting dropped somehow. It could be systemd or Xorg issue, or some configuration problem. For the starter I need to reproduce the problem and will try Arch, meanwhile you may check if problem exists on Ubuntu or Debian.

I dropped the boot logs because enough logs has been collected, thank you very much! It's great that the remedy has been found!

digetx added a commit to grate-driver/xf86-video-opentegra that referenced this issue Sep 12, 2020
The atomic capabilities were requested only after getting DRM-master
access, but under rootless Xorg that requesting was erroneously skipped.

Link: grate-driver/mesa#10 (comment)
@digetx
Copy link
Member

digetx commented Sep 12, 2020

@emulti Hello! I got around to trying Arch and managed to reproduce and fix the Xv problem! It's fixed now in the Opentegra driver grate-driver/xf86-video-opentegra@3c22a03

I also fixed the real root of the "DMA blocked" bug after finding that the AC100 fix broke Nexus 7 in a similar way. If you'll try a recent grate-kernel and AC100 doesn't work again on a cold boot, then please let me know!

@emulti
Copy link
Author

emulti commented Sep 13, 2020

Thanks digetx, I will test this out in the next few days. Looks promising from the commit.
As long as a cold reboot is not done, my AC100 has been very stable for the last month, with patch to unmute speakers after resume.

digetx pushed a commit that referenced this issue Oct 26, 2020
This patch fixes memory leaks when reply is allocated and is not freed
on error execution path.

Found by enabling address sanitizer on simple EGL app.

```c

int main()
{
    EGLDisplay display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
    EGLint major;
    EGLint minor;

    if (!eglInitialize(display, &major, &minor))
    {
        return 1;
    }
    eglTerminate(display);
    return 0;
}
```

Compiled with: `gcc testme.c -o testme -fsanitize=address -lasan -lEGL`

Execution environment:
- Windows 10, VMWare Player 15.5.2 build-15785246 without 3D accelaration
- Guest OS: OpenSUSE Leap 15.2
- Mesa 19.3.4

Program output:

```sh
ASAN_OPTIONS=fast_unwind_on_malloc=0 ./testme

libEGL warning: DRI2: failed to authenticate
==52510==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fa62315f500 in malloc (/usr/lib64/libasan.so.4+0xdc500)
    #1 0x7fa61e12d86b  (/usr/lib64/libxcb.so.1+0xf86b)
    #2 0x7fa61e12b5c7  (/usr/lib64/libxcb.so.1+0xd5c7)
    #3 0x7fa61e12cc3e  (/usr/lib64/libxcb.so.1+0xec3e)
    #4 0x7fa61e12cd4f in xcb_wait_for_reply (/usr/lib64/libxcb.so.1+0xed4f)
    #5 0x7fa61ebe02a5  (/usr/lib64/libEGL_mesa.so.0+0x202a5)
    #6 0x7fa61ebdb5ca  (/usr/lib64/libEGL_mesa.so.0+0x1b5ca)
    #7 0x7fa61ebd750c  (/usr/lib64/libEGL_mesa.so.0+0x1750c)
    #8 0x7fa61ebd7554  (/usr/lib64/libEGL_mesa.so.0+0x17554)
    #9 0x7fa61ebd1107  (/usr/lib64/libEGL_mesa.so.0+0x11107)
    #10 0x400856 in main (/home/user/testme+0x400856)
    #11 0x7fa622ad8349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    #12 0x4006e9 in _start (/home/user/testme+0x4006e9)

SUMMARY: AddressSanitizer: 32 byte(s) leaked in 1 allocation(s).
```

Signed-off-by: Andrey Vostrikov <av.linux.dev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6611>
(cherry picked from commit 4242073)
digetx pushed a commit that referenced this issue May 15, 2022
need to iterate over the descriptors in the binding to invalidate the whole
thing here

=================================================================
==546534==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a0000ae6c0 at pc 0x7fe20e26fd9d bp 0x7ffd92be6bc0 sp 0x7ffd92be6bb8
READ of size 8 at 0x61a0000ae6c0 thread T0
    #0 0x7fe20e26fd9c in zink_descriptor_set_refs_clear ../src/gallium/drivers/zink/zink_descriptors.c:950
    #1 0x7fe20e401304 in zink_destroy_surface ../src/gallium/drivers/zink/zink_surface.c:340
    #2 0x7fe20e21311b in zink_surface_reference ../src/gallium/drivers/zink/zink_surface.h:106
    #3 0x7fe20e21a5b9 in zink_sampler_view_destroy ../src/gallium/drivers/zink/zink_context.c:835
    #4 0x7fe20c41d35f in tc_sampler_view_destroy ../src/gallium/auxiliary/util/u_threaded_context.c:1848
    #5 0x7fe20e210ff7 in pipe_sampler_view_reference ../src/gallium/auxiliary/util/u_inlines.h:216
    #6 0x7fe20e22d592 in zink_set_sampler_views ../src/gallium/drivers/zink/zink_context.c:1532
    #7 0x7fe20c41a3d8 in tc_call_set_sampler_views ../src/gallium/auxiliary/util/u_threaded_context.c:1393
    #8 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
    #9 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
    #10 0x7fe20c42b728 in tc_destroy ../src/gallium/auxiliary/util/u_threaded_context.c:4250
    #11 0x7fe20b65176a in st_destroy_context_priv ../src/mesa/state_tracker/st_context.c:387
    #12 0x7fe20b65669f in st_destroy_context ../src/mesa/state_tracker/st_context.c:1009
    #13 0x7fe20b7055ab in st_context_destroy ../src/mesa/state_tracker/st_manager.c:944
    #14 0x7fe20a9c75bd in dri_destroy_context ../src/gallium/frontends/dri/dri_context.c:256
    #15 0x7fe20a9d4bef in driDestroyContext ../src/gallium/frontends/dri/dri_util.c:534
    #16 0x7fe22361f25c in drisw_destroy_context ../src/glx/drisw_glx.c:429
    #17 0x7fe223625d95 in glXDestroyContext ../src/glx/glxcmds.c:523
    #18 0x7fe22636aaeb in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GLX/libglx.c:332
    #19 0x7fe2269d9e7d in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GL/g_libglglxwrapper.c:384
    #20 0x41b88a in tcu::lnx::x11::glx::GlxRenderContext::~GlxRenderContext() /home/zmike/src/VK-GL-CTS/framework/platform/lnx/X11/tcuLnxX11GlxPlatform.cpp:734
    #21 0x41b8e9 in tcu::lnx::x11::glx::GlxRenderContext::~GlxRenderContext() /home/zmike/src/VK-GL-CTS/framework/platform/lnx/X11/tcuLnxX11GlxPlatform.cpp:735
    #22 0x2323aa7 in deqp::gles31::Context::destroyRenderContext() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31Context.cpp:77
    #23 0x2323969 in deqp::gles31::Context::~Context() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31Context.cpp:55
    #24 0x232278e in deqp::gles31::TestPackage::deinit() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31TestPackage.cpp:102
    #25 0x2c866c2 in tcu::DefaultHierarchyInflater::leaveTestPackage(tcu::TestPackage*) /home/zmike/src/VK-GL-CTS/framework/common/tcuTestHierarchyIterator.cpp:75
    #26 0x2c87058 in tcu::TestHierarchyIterator::next() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestHierarchyIterator.cpp:252
    #27 0x2c365da in tcu::TestSessionExecutor::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:122
    #28 0x2c00b0c in tcu::App::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuApp.cpp:221
    #29 0x4141b7 in main /home/zmike/src/VK-GL-CTS/framework/platform/tcuMain.cpp:58
    #30 0x7fe2263e155f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
    #31 0x7fe2263e160b in __libc_start_main_impl (/lib64/libc.so.6+0x2d60b)
    #32 0x413fa4 in _start (/home/zmike/src/VK-GL-CTS/build/external/openglcts/modules/glcts+0x413fa4)

0x61a0000ae6c0 is located 64 bytes inside of 1328-byte region [0x61a0000ae680,0x61a0000aebb0)
freed by thread T0 here:
    #0 0x7fe226cb6627 in free (/usr/lib64/libasan.so.6+0xae627)
    #1 0x7fe20aab1751 in unsafe_free ../src/util/ralloc.c:302
    #2 0x7fe20aab16c8 in unsafe_free ../src/util/ralloc.c:295
    #3 0x7fe20aab13c3 in ralloc_free ../src/util/ralloc.c:265
    #4 0x7fe20e269234 in descriptor_pool_free ../src/gallium/drivers/zink/zink_descriptors.c:286
    #5 0x7fe20e26937d in descriptor_pool_delete ../src/gallium/drivers/zink/zink_descriptors.c:296
    #6 0x7fe20e26ff53 in zink_descriptor_pool_reference ../src/gallium/drivers/zink/zink_descriptors.c:967
    #7 0x7fe20e270db2 in zink_descriptor_program_deinit ../src/gallium/drivers/zink/zink_descriptors.c:1071
    #8 0x7fe20e3b6536 in zink_destroy_gfx_program ../src/gallium/drivers/zink/zink_program.c:695
    #9 0x7fe20e1eaaf9 in zink_gfx_program_reference ../src/gallium/drivers/zink/zink_program.h:242
    #10 0x7fe20e20d386 in zink_shader_free ../src/gallium/drivers/zink/zink_compiler.c:2099
    #11 0x7fe20e3b9f0b in zink_delete_shader_state ../src/gallium/drivers/zink/zink_program.c:1074
    #12 0x7fe20c3e29ad in util_shader_reference ../src/gallium/auxiliary/util/u_live_shader_cache.c:188
    #13 0x7fe20e3ba11e in zink_delete_cached_shader_state ../src/gallium/drivers/zink/zink_program.c:1093
    #14 0x7fe20c41709e in tc_call_delete_fs_state ../src/gallium/auxiliary/util/u_threaded_context.c:998
    #15 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
    #16 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
    #17 0x7fe20c423683 in tc_flush ../src/gallium/auxiliary/util/u_threaded_context.c:3003
    #18 0x7fe20b62d996 in st_flush ../src/mesa/state_tracker/st_cb_flush.c:60
    #19 0x7fe20b62dbe3 in st_glFlush ../src/mesa/state_tracker/st_cb_flush.c:94
    #20 0x7fe20ae4bded in _mesa_make_current ../src/mesa/main/context.c:1493
    #21 0x7fe20ae49702 in _mesa_free_context_data ../src/mesa/main/context.c:1187
    #22 0x7fe20b65668b in st_destroy_context ../src/mesa/state_tracker/st_context.c:1005
    #23 0x7fe20b7055ab in st_context_destroy ../src/mesa/state_tracker/st_manager.c:944
    #24 0x7fe20a9c75bd in dri_destroy_context ../src/gallium/frontends/dri/dri_context.c:256
    #25 0x7fe20a9d4bef in driDestroyContext ../src/gallium/frontends/dri/dri_util.c:534
    #26 0x7fe22361f25c in drisw_destroy_context ../src/glx/drisw_glx.c:429
    #27 0x7fe223625d95 in glXDestroyContext ../src/glx/glxcmds.c:523
    #28 0x7fe22636aaeb in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GLX/libglx.c:332
    #29 0x7fe2269d9e7d in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GL/g_libglglxwrapper.c:384

previously allocated by thread T0 here:
    #0 0x7fe226cb691f in __interceptor_malloc (/usr/lib64/libasan.so.6+0xae91f)
    #1 0x7fe20aab0c81 in ralloc_size ../src/util/ralloc.c:120
    #2 0x7fe20aab0e33 in rzalloc_size ../src/util/ralloc.c:153
    #3 0x7fe20aab12c8 in rzalloc_array_size ../src/util/ralloc.c:233
    #4 0x7fe20e26c76d in allocate_desc_set ../src/gallium/drivers/zink/zink_descriptors.c:657
    #5 0x7fe20e26e9cb in zink_descriptor_set_get ../src/gallium/drivers/zink/zink_descriptors.c:840
    #6 0x7fe20e2747aa in zink_descriptors_update ../src/gallium/drivers/zink/zink_descriptors.c:1424
    #7 0x7fe20e36fc48 in void zink_draw<(zink_multidraw)1, (zink_dynamic_state)2, true, false>(pipe_context*, pipe_draw_info const*, unsigned int, pipe_draw_indirect_info const*, pipe_draw_start_count_bias const*, unsigned int, pipe_vertex_state*, unsigned int) ../src/gallium/drivers/zink/zink_draw.cpp:788
    #8 0x7fe20e29166d in zink_draw_vbo<(zink_multidraw)1, (zink_dynamic_state)2, true> ../src/gallium/drivers/zink/zink_draw.cpp:907
    #9 0x7fe20c424982 in tc_call_draw_single ../src/gallium/auxiliary/util/u_threaded_context.c:3155
    #10 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
    #11 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
    #12 0x7fe20c41f7a9 in tc_texture_map ../src/gallium/auxiliary/util/u_threaded_context.c:2279
    #13 0x7fe20b630757 in pipe_texture_map_3d ../src/gallium/auxiliary/util/u_inlines.h:572
    #14 0x7fe20b6341f6 in st_ReadPixels ../src/mesa/state_tracker/st_cb_readpixels.c:546
    #15 0x7fe20b42fea7 in read_pixels ../src/mesa/main/readpix.c:1178
    #16 0x7fe20b42fea7 in _mesa_ReadnPixelsARB ../src/mesa/main/readpix.c:1195
    #17 0x7fe20b42ffc0 in _mesa_ReadPixels ../src/mesa/main/readpix.c:1210
    #18 0x2a6d094 in glu::readPixels(glu::RenderContext const&, int, int, tcu::PixelBufferAccess const&) /home/zmike/src/VK-GL-CTS/framework/opengl/gluPixelTransfer.cpp:61
    #19 0x29eaa06 in deqp::gls::ShaderExecUtil::FragmentOutExecutor::execute(int, void const* const*, void* const*) /home/zmike/src/VK-GL-CTS/modules/glshared/glsShaderExecUtil.cpp:677
    #20 0x25a600b in iterate /home/zmike/src/VK-GL-CTS/modules/gles31/functional/es31fOpaqueTypeIndexingTests.cpp:585
    #21 0x2322b53 in deqp::gles31::TestCaseWrapper<deqp::gles31::TestPackage>::iterate(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/modules/gles31/tes31TestCaseWrapper.hpp:86
    #22 0x2c376fd in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:302
    #23 0x2c366e3 in tcu::TestSessionExecutor::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:139
    #24 0x2c00b0c in tcu::App::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuApp.cpp:221
    #25 0x4141b7 in main /home/zmike/src/VK-GL-CTS/framework/platform/tcuMain.cpp:58
    #26 0x7fe2263e155f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)

cc: mesa-stable

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15173>
(cherry picked from commit 698ae34)
digetx pushed a commit that referenced this issue May 15, 2022
Test case 'dEQP-GLES31.functional.shaders.builtin_functions.common.modf.vec2_mediump_tess_control'..
=================================================================
==539161==ERROR: AddressSanitizer: unknown-crash on address 0x60400008cfef at pc 0x7fffdb47b2d6 bp 0x7fffffffa490 sp 0x7fffffffa488
READ of size 4 at 0x60400008cfef thread T0
    #0 0x7fffdb47b2d5 in XXH_read32 ../src/util/xxhash.h:531
    #1 0x7fffdb47bfbf in XXH_readLE32 ../src/util/xxhash.h:608
    #2 0x7fffdb47bfbf in XXH_readLE32_align ../src/util/xxhash.h:620
    #3 0x7fffdb47bfbf in XXH32_endian_align ../src/util/xxhash.h:797
    #4 0x7fffdb47bfbf in XXH32 ../src/util/xxhash.h:831
    #5 0x7fffdb480b49 in _mesa_hash_data ../src/util/hash_table.c:631
    #6 0x7fffded8c10a in shader_module_hash ../src/gallium/drivers/zink/zink_program.c:82
    #7 0x7fffded8cad8 in get_shader_module_for_stage ../src/gallium/drivers/zink/zink_program.c:144
    #8 0x7fffded8cf64 in update_gfx_shader_modules ../src/gallium/drivers/zink/zink_program.c:182
    #9 0x7fffded8dcc2 in zink_update_gfx_program ../src/gallium/drivers/zink/zink_program.c:257
    #10 0x7fffdec63463 in update_gfx_program ../src/gallium/drivers/zink/zink_draw.cpp:223
    #11 0x7fffded7aab9 in update_gfx_pipeline<true> ../src/gallium/drivers/zink/zink_draw.cpp:445
    #12 0x7fffded4a88b in void zink_draw<(zink_multidraw)1, (zink_dynamic_state)2, true, false>(pipe_context*, pipe_draw_info const*, unsigned int, pipe_draw_indirect_info const*, pipe_draw_start_count_bias const*, unsigned int, pipe_vertex_state*, unsigned int) ../src/gallium/drivers/zink/zink_draw.cpp:777
    #13 0x7fffdec6c5b2 in zink_draw_vbo<(zink_multidraw)1, (zink_dynamic_state)2, true> ../src/gallium/drivers/zink/zink_draw.cpp:907
    #14 0x7fffdcdff982 in tc_call_draw_single ../src/gallium/auxiliary/util/u_threaded_context.c:3155
    #15 0x7fffdcdec706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
    #16 0x7fffdcded4ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
    #17 0x7fffdcdfa492 in tc_buffer_map ../src/gallium/auxiliary/util/u_threaded_context.c:2251
    #18 0x7fffdb7f2439 in pipe_buffer_map_range ../src/gallium/auxiliary/util/u_inlines.h:393
    #19 0x7fffdb7f56c2 in _mesa_bufferobj_map_range ../src/mesa/main/bufferobj.c:488
    #20 0x7fffdb803300 in map_buffer_range ../src/mesa/main/bufferobj.c:3734
    #21 0x7fffdb8036e7 in _mesa_MapBufferRange ../src/mesa/main/bufferobj.c:3817
    #22 0x29ecb02 in deqp::gls::ShaderExecUtil::BufferIoExecutor::readOutputBuffer(void* const*, int) /home/zmike/src/VK-GL-CTS/modules/glshared/glsShaderExecUtil.cpp:1069
    #23 0x29ee499 in deqp::gls::ShaderExecUtil::TessControlExecutor::execute(int, void const* const*, void* const*) /home/zmike/src/VK-GL-CTS/modules/glshared/glsShaderExecUtil.cpp:1390
    #24 0x246264c in deqp::gles31::Functional::CommonFunctionCase::iterate() /home/zmike/src/VK-GL-CTS/modules/gles31/functional/es31fShaderCommonFunctionTests.cpp:400
    #25 0x2322b53 in deqp::gles31::TestCaseWrapper<deqp::gles31::TestPackage>::iterate(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/modules/gles31/tes31TestCaseWrapper.hpp:86
    #26 0x2c376fd in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:302
    #27 0x2c366e3 in tcu::TestSessionExecutor::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:139
    #28 0x2c00b0c in tcu::App::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuApp.cpp:221
    #29 0x4141b7 in main /home/zmike/src/VK-GL-CTS/framework/platform/tcuMain.cpp:58
    #30 0x7ffff6dbc55f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
    #31 0x7ffff6dbc60b in __libc_start_main_impl (/lib64/libc.so.6+0x2d60b)
    #32 0x413fa4 in _start (/home/zmike/src/VK-GL-CTS/build/external/openglcts/modules/glcts+0x413fa4)

0x60400008cff1 is located 0 bytes to the right of 33-byte region [0x60400008cfd0,0x60400008cff1)
allocated by thread T0 here:
    #0 0x7ffff769191f in __interceptor_malloc (/usr/lib64/libasan.so.6+0xae91f)
    #1 0x7fffded8c608 in get_shader_module_for_stage ../src/gallium/drivers/zink/zink_program.c:115
    #2 0x7fffded8cf64 in update_gfx_shader_modules ../src/gallium/drivers/zink/zink_program.c:182
    #3 0x7fffded8dcc2 in zink_update_gfx_program ../src/gallium/drivers/zink/zink_program.c:257
    #4 0x7fffdec63463 in update_gfx_program ../src/gallium/drivers/zink/zink_draw.cpp:223
    #5 0x7fffded7aab9 in update_gfx_pipeline<true> ../src/gallium/drivers/zink/zink_draw.cpp:445
    #6 0x7fffded4a88b in void zink_draw<(zink_multidraw)1, (zink_dynamic_state)2, true, false>(pipe_context*, pipe_draw_info const*, unsigned int, pipe_draw_indirect_info const*, pipe_draw_start_count_bias const*, unsigned int, pipe_vertex_state*, unsigned int) ../src/gallium/drivers/zink/zink_draw.cpp:777
    #7 0x7fffdec6c5b2 in zink_draw_vbo<(zink_multidraw)1, (zink_dynamic_state)2, true> ../src/gallium/drivers/zink/zink_draw.cpp:907
    #8 0x7fffdcdff982 in tc_call_draw_single ../src/gallium/auxiliary/util/u_threaded_context.c:3155
    #9 0x7fffdcdec706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
    #10 0x7fffdcded4ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
    #11 0x7fffdcdfa492 in tc_buffer_map ../src/gallium/auxiliary/util/u_threaded_context.c:2251
    #12 0x7fffdb7f2439 in pipe_buffer_map_range ../src/gallium/auxiliary/util/u_inlines.h:393
    #13 0x7fffdb7f56c2 in _mesa_bufferobj_map_range ../src/mesa/main/bufferobj.c:488
    #14 0x7fffdb803300 in map_buffer_range ../src/mesa/main/bufferobj.c:3734
    #15 0x7fffdb8036e7 in _mesa_MapBufferRange ../src/mesa/main/bufferobj.c:3817
    #16 0x29ecb02 in deqp::gls::ShaderExecUtil::BufferIoExecutor::readOutputBuffer(void* const*, int) /home/zmike/src/VK-GL-CTS/modules/glshared/glsShaderExecUtil.cpp:1069
    #17 0x29ee499 in deqp::gls::ShaderExecUtil::TessControlExecutor::execute(int, void const* const*, void* const*) /home/zmike/src/VK-GL-CTS/modules/glshared/glsShaderExecUtil.cpp:1390
    #18 0x246264c in deqp::gles31::Functional::CommonFunctionCase::iterate() /home/zmike/src/VK-GL-CTS/modules/gles31/functional/es31fShaderCommonFunctionTests.cpp:400
    #19 0x2322b53 in deqp::gles31::TestCaseWrapper<deqp::gles31::TestPackage>::iterate(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/modules/gles31/tes31TestCaseWrapper.hpp:86
    #20 0x2c376fd in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:302
    #21 0x2c366e3 in tcu::TestSessionExecutor::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:139
    #22 0x2c00b0c in tcu::App::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuApp.cpp:221
    #23 0x4141b7 in main /home/zmike/src/VK-GL-CTS/framework/platform/tcuMain.cpp:58
    #24 0x7ffff6dbc55f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)

cc: mesa-stable

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15173>
(cherry picked from commit 62b8daa)

Conflicts:
	src/gallium/drivers/zink/zink_program.c
digetx pushed a commit that referenced this issue Dec 17, 2022
This avoids u_blitter recursion:

 #0  util_blitter_set_running_flag
 #1  util_blitter_custom_color
 #2  si_blit_decompress_color
 #3  si_decompress_dcc
 #4  si_texture_disable_dcc
 #5  si_update_ps_colorbuf0_slot
 #6  si_bind_ps_shader
 #7  util_blitter_restore_fragment_states
 #8  util_blitter_custom_color
 #9  si_blit_decompress_color
 #10 si_decompress_dcc
 #11 si_sdma_copy_image
 #12 si_blit

cc: mesa-stable

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16962>
(cherry picked from commit 3d37291)

Conflicts:
	src/gallium/drivers/radeonsi/si_blit.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants