Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpi-5.4.y: Fixes for V3D DRM timeout issues #3816

Draft
wants to merge 2 commits into
base: rpi-5.4.y
Choose a base branch
from

Conversation

Terminus-IMRC
Copy link

The current V3D DRM driver in rpi-5.4.y kernel has two issues:

  1. Return value NULL from v3d_cache_clean_job_run() is mis-treated as an error by the DRM scheduler, which results in the following warning:
[  653.831148] ------------[ cut here ]------------
[  653.836262] WARNING: CPU: 1 PID: 259 at ./include/linux/dma-fence.h:533 drm_sched_main+0x238/0x31c [gpu_sched]
[  653.847204] Modules linked in: <snip>
[  653.902464] CPU: 1 PID: 259 Comm: v3d_cache_clean Tainted: G         C        5.4.51-v7l+ #1327
[  653.912189] Hardware name: BCM2711
[  653.916085] Backtrace:
[  653.919026] [<c020d46c>] (dump_backtrace) from [<c020d768>] (show_stack+0x20/0x24)
[  653.927584]  r6:e8418000 r5:00000000 r4:c129c8f8 r3:31a472ee
[  653.933747] [<c020d748>] (show_stack) from [<c0a39a24>] (dump_stack+0xe0/0x124)
[  653.942062] [<c0a39944>] (dump_stack) from [<c0221c50>] (__warn+0xec/0x104)
[  653.949548]  r8:00000215 r7:00000009 r6:bf158610 r5:00000000 r4:00000000 r3:31a472ee
[  653.958350] [<c0221b64>] (__warn) from [<c0221d20>] (warn_slowpath_fmt+0xb8/0xc0)
[  653.966933]  r9:bf158610 r8:00000215 r7:bf1566a0 r6:00000009 r5:00000000 r4:c1204f88
[  653.975826] [<c0221c6c>] (warn_slowpath_fmt) from [<bf1566a0>] (drm_sched_main+0x238/0x31c [gpu_sched])
[  653.986377]  r9:00000000 r8:c1204f88 r7:efa9a300 r6:00000000 r5:e8fc8700 r4:ef3b58a0
[  653.995297] [<bf156468>] (drm_sched_main [gpu_sched]) from [<c0244e70>] (kthread+0x13c/0x168)
[  654.004996]  r10:e8f126dc r9:ef3c3ae4 r8:bf156468 r7:ef3b58a0 r6:00000000 r5:e8bd4b80
[  654.013994]  r4:e8f126c0
[  654.017095] [<c0244d34>] (kthread) from [<c02010ac>] (ret_from_fork+0x14/0x28)
[  654.025452] Exception stack(0xe8419fb0 to 0xe8419ff8)
[  654.031067] 9fa0:                                     00000000 00000000 00000000 00000000
[  654.040363] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  654.049671] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  654.056876]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0244d34
[  654.065856]  r4:e8bd4b80 r3:c0204648
[  654.070019] ---[ end trace 9b7e1a26f0895ac5 ]---

It takes some time to print the warning and thus the execution time of CL/CSD programs will get longer.
This warning can be ceased with 3c37926, and this commit requires 93db7d6.
However, 93db7d6 is known to break the behavior where v3d_{cl,csd}_job_timedout() expect the timer to be rearmed.
Since workarounds for this issue are not available yet, we changed the codes not to expect the timer rearming in b465bea.

The non-rearming timer issue should be fixed in the DRM scheduler itself (as said by Daniel Vetter in the thread) because the Etnaviv DRM driver also expects the behavior.
Therefore, we think these commits are appropriate only for the rpi tree.

  1. v3d_csd_job_run() ignores timeout error flag which results in an infinite GPU reset loop if once a timeout occurs:
[  178.799106] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[  178.807836] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[  179.839132] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[  179.847865] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[  180.879146] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[  180.887925] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[  181.919188] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[  181.928002] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
...

This issue is fixed with 410a046 by correctly referring to the timed-out flag.
We also added a function to set the timeout milliseconds through kernel's cmdline in 69af0ae.
Because these patches depend on the timeout behavior described in 1, and bcm2711_defconfig is not available in drm-next for now, we choose not to post these patches to dri-devel yet.

We tested these patches with Piglit and SaschaWillems/Vulkan with Igalia's Vulkan driver, and our CSD programs in Idein/py-videocore6.
The programs run peacefully with each other even if a timeout occurs.

Note that the first two drm/sched commits are from linux-5.6.y tree, so you may cherry-pick the other three commits by us to rpi-5.8.y if we are moving to there.

@anholt Is there any reason that the timeout flag is referred to in v3d_{bin,render}_job_run() but is ignored in v3d_csd_job_run()?

@6by9
Copy link
Contributor

6by9 commented Aug 27, 2020

Two of these patches appear to be from upstream
torvalds/linux@135517d
torvalds/linux@d7c5782

Please add the line
"Commit upstream."
to the commit text. It helps when rebasing to a new upstream release.

@naushir Could you ping Igalia over the two v3d patches please? They really ought to be upstreamed too if correct.

@txenoo
Copy link
Contributor

txenoo commented Aug 27, 2020

We were dealing with the piglit timeouts errors in the kernel since we upgraded to 5.4,y and reached to a similar fix to drm/sched: Fix passing zero to 'PTR_ERR' warning v2 torvalds/linux@135517d.

I was about to create a pull request to cherry-pick torvalds/linux@135517d and I've just found this recent Issue. @6by9 We will have a look to the patches, and check if they fix some of the errors we are still seeing when running the full-testsuites that didn't happen on 4.19.y

@Terminus-IMRC do you have a particular test to reproduce the behaviour you describe at 2)

@Terminus-IMRC
Copy link
Author

Terminus-IMRC commented Aug 28, 2020

Thank you for the responses!

@6by9 I see. We've done editing the commit messages.

@txenoo You may use our 32-bit build linux-image-5.4.58-v7l-idein-109 for ease (you will need kernel=vmlinuz-5.4.58-v7l-idein line in your /boot/config.txt).

The current sgemm.py runs in about 0.55 seconds. Tweaking P = 1024 into P = 2048 doubles the time, and thus the original 1-second timeout will take place.

@txenoo
Copy link
Contributor

txenoo commented Aug 28, 2020

@Terminus-IMRC, I've been testing your branch, and the CSD timeouts are notified and treated properly, piglit run exposes one of that situations, so I'm going to check it.

But test run time execution continues to be higher that with 4.19.y, I've been checking the difference in the tests executions time and test from piglit under the group spec/arb_depth_buffer_float/depthstencil-render-miplevels are terrible slow.

I think that the cause of this is related to MMU errors, that are hidden in the log of rpi-5.4.y kernel because of f23e93b ""drm/v3d: Suppress all but the first MMU error". I've checked and there is a reported issue #3574 "RPi4 2Gb MMU crash V3d " related to this in 5.4

@Terminus-IMRC
Copy link
Author

@txenoo Thank you for testing!
Please note that appending v3d.hang_limit=xxx to your /boot/cmdline.txt modifies the timeout to xxx milliseconds (the current default is 5000 milliseconds).
In addition, please use linux-image-5.4.58-v7l-idein-110, which reverts f23e93b from 109, if you need it.

I've run the tests (piglit run all results/ --dmesg --include-tests spec --include-tests arb_depth_buffer_float --include-tests depthstencil-render-miplevels).
Mysteriously, I found some pte invalid (invalid read) errors, but no write violation, pte invalid (invalid write) errors were seen.
Because only a few (~10) MMU errors are produced, I think this is not the cause of the long execution time.
I'll too have a look into what is making the Piglit tests slow.

@txenoo
Copy link
Contributor

txenoo commented Sep 1, 2020

@Terminus-IMRC I've confirmed that theese slow tests are HW dependent, I've was launching the runs in a RPI4-8Gb and the MMU errors are hundreds. But in a RPI4-4Gb, my numbers are similar to yours with fewer MMU errors. And time execution is similar to 4.19.y with 20 minuties with my default test selection. But the RPI4-8Gb times are 1h30m with ~200 tests just returning timeout (90s). All timeout tests are also reporting MMU errors that are not reported in the 4Gb RPI4.

@txenoo
Copy link
Contributor

txenoo commented Sep 1, 2020

The current V3D DRM driver in rpi-5.4.y kernel has two issues:
1. Return value NULL from v3d_cache_clean_job_run() is mis-treated as an error by the DRM scheduler, which results in the following warning:
[...]
It takes some time to print the warning and thus the execution time of CL/CSD programs will get longer.
This warning can be ceased with 3c37926, and this commit requires 93db7d6.

@Terminus-IMRC . I couldn't find why torvalds/linux@135517d is required? I think that torvalds/linux@d7c5782 is just a fix-up of the regression introduced at 167bf96014a09. I think that "drm/sched: Fix passing zero to 'PTR_ERR' warning v2" should have been tagged for 5.4.y stable, but the Fixed wasn't marked. In any case we would need the your fix for kernels after 5.6 as drm/scheduler: Avoid accessing freed bad job is already available since then. But maybe I'm missing something.

However, 93db7d6 is known to break the behavior where v3d_{cl,csd}_job_timedout() expect the timer to be rearmed.
Since workarounds for this issue are not available yet, we changed the codes not to expect the timer rearming in b465bea.

The non-rearming timer issue should be fixed in the DRM scheduler itself (as said by Daniel Vetter in the thread) because the Etnaviv DRM driver also expects the behavior.
Therefore, we think these commits are appropriate only for the rpi tree.

I've seen that at dri-devel mailing list etnaviv has recently send their fix for this issue. So it would probably make sense to check their solution and submit a fix upstream too for v3d too.

@txenoo
Copy link
Contributor

txenoo commented Sep 2, 2020

@Terminus-IMRC I've confirmed that theese slow tests are HW dependent, I've was launching the runs in a RPI4-8Gb and the MMU errors are hundreds. But in a RPI4-4Gb, my numbers are similar to yours with fewer MMU errors. And time execution is similar to 4.19.y with 20 minutes with my default test selection. But the RPI4-8Gb times are 1h30m with ~200 tests just returning timeout (90s). All timeout tests are also reporting MMU errors that are not reported in the 4Gb RPI4.

I was checking this issue and finally it was an issue with the deployment. The PRI4-8Gb I was using remotely had installed the xscreensaver that was activated during test runs. And some of the executed screensavers are generating extra MMU errors and that made the tests to timeout.

So after a reinstall of RaspbianOS in one RPI4-8Gb and just run piglit times were in the range of 22 minutes. In any case we need to check what are this screensavers doing but that's a different Issue. Mystery solved. Sorry for the noise with my time executions.

@Terminus-IMRC
Copy link
Author

@txenoo Thank you for the reports and for letting me know the Etnaviv patch!

Sorry, I was confusing about torvalds/linux@135517d -- I thought it was needed for torvalds/linux@d7c5782, but it turned out that the commit is needed to prevent a kernel NULL pointer dereference when many CSD timeouts occur behind a Vulkan program:

[  177.881821] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[  177.889486] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[  177.897256] [drm] Skipping CSD job resubmission due to previous error (-125)
[  177.904503] 8<--- cut here ---
[  177.907560] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[  177.915673] pgd = 9b7d4e2e
[  177.918376] [00000010] *pgd=80000000004003, *pmd=00000000
[  177.923791] Internal error: Oops: 207 [#1] SMP ARM
[  177.928576] Modules linked in: <snip>
[  177.979784] CPU: 2 PID: 54 Comm: kworker/2:1 Tainted: G         C        5.4.61-v7l-idein #112
[  177.988386] Hardware name: BCM2711
[  177.991803] Workqueue: events drm_sched_job_timedout [gpu_sched]
[  177.997818] PC is at drm_sched_increase_karma+0x74/0x108 [gpu_sched]
[  178.004164] LR is at 0xc8d4cc08
[  178.007297] pc : [<bf27a074>]    lr : [<c8d4cc08>]    psr: 20000113
[  178.013554] sp : efab7e80  ip : c87bf008  fp : efab7ea4
[  178.018770] r10: 00000000  r9 : c88e3000  r8 : c88e39a8
[  178.023986] r7 : c7c94c00  r6 : 00000001  r5 : c88e3548  r4 : c88e350c
[  178.030504] r3 : 00000000  r2 : 0000004e  r1 : 00000000  r0 : c88e3504
[  178.037025] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  178.044151] Control: 30c5383d  Table: 07ccb140  DAC: 55555555
[  178.049891] Process kworker/2:1 (pid: 54, stack limit = 0xbb6daef9)
[  178.056149] Stack: (0xefab7e80 to 0xefab8000)
[  178.060499] 7e80: c88e34e0 c88e3990 c88e3990 c7c94c00 c88e39a8 c88e3000 efab7ecc efab7ea8
[  178.068669] 7ea0: bf1fc404 bf27a00c c88e3578 c7c94c18 c88e34e0 c7c94c00 00000000 00000080
[  178.076838] 7ec0: efab7edc efab7ed0 bf1fc7c4 bf1fc3b0 efab7efc efab7ee0 bf27a938 bf1fc770
[  178.085007] 7ee0: c88e3578 efa21000 efebb300 efebe500 efab7f34 efab7f00 c023e520 bf27a8fc
[  178.093176] 7f00: efebb300 efebb300 efebb300 efa21000 efa21014 efebb300 00000008 efebb318
[  178.101346] 7f20: c1203d00 efebb300 efab7f74 efab7f38 c023e8a0 c023e2dc c0a57d20 c0dbdd7c
[  178.109515] 7f40: c12a4053 ffffe000 efab7f74 efa9d740 efa20140 00000000 efab6000 efa21000
[  178.117685] 7f60: c023e840 ef929e74 efab7fac efab7f78 c0245c3c c023e84c efa9d75c efa9d75c
[  178.117689] 7f80: ffffe000 efa20140 c0245acc 00000000 00000000 00000000 00000000 00000000
[  178.117695] 7fa0: 00000000 efab7fb0 c02010ac c0245ad8 00000000 00000000 00000000 00000000
[  178.142192] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  178.142196] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[  178.142198] Backtrace:
[  178.142223] [<bf27a000>] (drm_sched_increase_karma [gpu_sched]) from [<bf1fc404>] (v3d_gpu_reset_for_timeout+0x60/0xa4 [v3d])
[  178.142228]  r9:c88e3000 r8:c88e39a8 r7:c7c94c00 r6:c88e3990 r5:c88e3990 r4:c88e34e0
[  178.180030] [<bf1fc3a4>] (v3d_gpu_reset_for_timeout [v3d]) from [<bf1fc7c4>] (v3d_bin_job_timedout+0x60/0x64 [v3d])
[  178.180035]  r9:00000080 r8:00000000 r7:c7c94c00 r6:c88e34e0 r5:c7c94c18 r4:c88e3578
[  178.180052] [<bf1fc764>] (v3d_bin_job_timedout [v3d]) from [<bf27a938>] (drm_sched_job_timedout+0x48/0x98 [gpu_sched])
[  178.180070] [<bf27a8f0>] (drm_sched_job_timedout [gpu_sched]) from [<c023e520>] (process_one_work+0x250/0x570)
[  178.218884]  r7:efebe500 r6:efebb300 r5:efa21000 r4:c88e3578
[  178.218890] [<c023e2d0>] (process_one_work) from [<c023e8a0>] (worker_thread+0x60/0x5d0)
[  178.218894]  r10:efebb300 r9:c1203d00 r8:efebb318 r7:00000008 r6:efebb300 r5:efa21014
[  178.218897]  r4:efa21000
[  178.218903] [<c023e840>] (worker_thread) from [<c0245c3c>] (kthread+0x170/0x174)
[  178.218907]  r10:ef929e74 r9:c023e840 r8:efa21000 r7:efab6000 r6:00000000 r5:efa20140
[  178.218911]  r4:efa9d740
[  178.260700] [<c0245acc>] (kthread) from [<c02010ac>] (ret_from_fork+0x14/0x28)
[  178.260703] Exception stack(0xefab7fb0 to 0xefab7ff8)
[  178.260707] 7fa0:                                     00000000 00000000 00000000 00000000
[  178.260710] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  178.260714] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  178.260718]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0245acc
[  178.260722]  r4:efa20140
[  178.306244] Code: e593c000 0a00000f e5971008 e1c322d8 (e1c101d0)
[  178.306282] ---[ end trace 7e9b29bfd443fbac ]---

However, this issue is not related to this PR, so now I removed the relevant commits from the branch.
Because the changes seem to be applicable to the upstream, I'm planning to post them to the dri-devel in addition to a similar fix to the Etnaviv.
I think this PR should be merged to the rpi tree after it is merged to drm-next.

@Terminus-IMRC Terminus-IMRC marked this pull request as draft September 2, 2020 10:32
@pelwell
Copy link
Contributor

pelwell commented Sep 2, 2020

Comment here when it gets merged upstream.

@Terminus-IMRC
Copy link
Author

The patchset waiting for comments is here: https://lists.freedesktop.org/archives/dri-devel/2020-September/278609.html .
I mistakenly re-sent the patchset, so please ignore the second one.

@Gandalfthegreybeard
Copy link

So, no activity since Sept as far as I can see? Is there something that is blocking this patchset from being merged?

@Terminus-IMRC
Copy link
Author

No, with a reason I don't know... A few weeks later from the patchset I sent a reminder to @anholt, the sole maintainer of the driver, but currently, I've received no responses. Do you see something wrong in my patchset?

@Gandalfthegreybeard
Copy link

No, with a reason I don't know... A few weeks later from the patchset I sent a reminder to @anholt, the sole maintainer of the driver, but currently, I've received no responses. Do you see something wrong in my patchset?

I don't know that section of the kernel... so can't comment at the moment, but was wanting to try out qmkl6 on the 64bit Raspbian kernel... which is 5.4.79* based

The previous code misses a check for the timeout error set by
drm_sched_resubmit_jobs(), which results in an infinite GPU reset loop
if once a timeout occurs:

[  178.799106] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[  178.807836] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[  179.839132] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[  179.847865] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[  180.879146] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[  180.887925] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[  181.919188] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[  181.928002] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
...

This commit adds the check for timeout as in v3d_{bin,render}_job_run():

[   66.408962] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[   66.417734] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[   66.428296] [drm] Skipping CSD job resubmission due to previous error (-125)

, where -125 is -ECANCELED, though users currently have no way other
than inspecting the dmesg to check if the timeout has occurred.

Signed-off-by: Yukimasa Sugizaki <ysugi@idein.jp>
The default timeout is 500 ms which is too short for some workloads
including Piglit.  Adding this parameter will help users to run heavier
tasks.

Signed-off-by: Yukimasa Sugizaki <ysugi@idein.jp>
@Terminus-IMRC
Copy link
Author

Thank you for the interest!

If you trust us, then you can use our build for aarch64. You can install this by running sudo dpkg -i linux-image-5.4.83-v8+_5.4.83-v8+-1_arm64.deb and by adding kernel=vmlinuz-5.4.83-v8+ to /boot/config.txt.

Otherwise, you can build the Idein/linux:rpi-5.4.y-v3d-timeout tree by yourself. make bcm2711_defconfig && make bindeb-pkg should be all right (you will be asked to install some additional development packages, which can be done by sudo apt install xxx).

Adding v3d.timeout=10000 (10 seconds) to /boot/cmdline.txt should be enough for running tests of QMKL6.

@6by9
Copy link
Contributor

6by9 commented Dec 29, 2020

The patches are blocked waiting for Reviewed-by responses.

Eric has moved on to pastures new, although is still listed as the supporter for V3D.
We probably ought to have a discussion at Pi Towers as to whether we ask Igalia to formerly take over maintainership for V3D seeing as they do most of the work on it for us.

Kernel patch reviewing generally requires some knowledge of the system, and certainly in the 3D land that's not me.

@wimrijnders
Copy link

Hi there, the issue that this commit is attempting to fix is hurting my project. I would love to see it merged and fixed.

May I enquire kindly when it will be released?

@wimrijnders
Copy link

Kernel patch reviewing generally requires some knowledge of the system, and certainly in the 3D land that's not me.

I've been knee-deep in v3d code the past few months, and in the kernel code associated to it. I've actually been over the changes in the fixes and found them to be good.
Would that qualify me as being knowledgeable enough? (disclaimer: afflicted by Impostor syndrome).

@Terminus-IMRC
Copy link
Author

@wimrijnders I'm happy if you reply to my patch mail with Tested-by or Reviewed-by tag with a description of your tests (see https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes ). I don't know who is responsible for the employment of the maintainer.

@6by9
Copy link
Contributor

6by9 commented Jan 13, 2021

@wimrijnders Anyone can send a Tested-by.
Terminus-IMRC has linked to the implications of sending a Reviewed-by. Being involved in the kernel development process to get your "face" known in the kernel community is the way to build reputation so the maintainer knows to trust your R-B replies.

Looking at "drm/v3d: Add job timeout module param", whilst module parameters aren't nice, the patch is fine.
Although

MODULE_PARM_DESC(timeout,
	"Timeout for a job in ms (0 means infinity and default is 500 ms)");

would presumably get checkpatch complaining about the indentation not matching the opening parenthesis.

The other patch does require some knowledge of v3d. If you feel you have sufficient knowledge then feel free to send a Reviewed-by. Terminus-IMRC has done a good job of describing the issue and solution in the commit text, and it does appear to do what it says.
I'll ping the Igalia guys for their review too.

@Terminus-IMRC
Copy link
Author

Thank you for the review! I'll try to move the timeout module parameter to debugfs.

@6by9
Copy link
Contributor

6by9 commented Jan 13, 2021

Thank you for the review! I'll try to move the timeout module parameter to debugfs.

Don't bother - debugfs is overkill for this. I'm happy with it being a module parameter, although perhaps the default just needs to be increased.
It was only that one indent that looked odd, although Github can sometimes mis-format things.

@wimrijnders
Copy link

@Terminus-IMRC Now's there's a handle which I've encountered often in my search for information on v3d. I see that project py-videocore6 is yours, I've been over this project frequently. Thank you for your work.

Since my project depends on this bug fixed, I'll put the time in it for a proper test. I'll read up and then see what I can do.

@wimrijnders
Copy link

OK, I said that I will look into confirming the fix and am gearing up to it.
Would you mind giving me suggestions as to how to do it?

This is the approach that looks most logical to me:

  • Download linux kernel source code on a Pi 4 and compile
  • Run this kernel, confirm that the bug happens
  • Patch the solution in this issue into the kernel code, compile
  • Run this kernel and confirm that the bug is fixed

Is this the proper and accepted way of doing this? This looks like it will take a long time. But, whatever it takes.

@txenoo
Copy link
Contributor

txenoo commented Feb 4, 2021

@Terminus-IMRC I've devoted some time to review the two patches currently included in this PR. I have still pending to review drm/v3d: Correctly restart the timer when progress is made but I wil try do check it in the following days.

@Terminus-IMRC
Copy link
Author

@wimrijnders Yes, please go ahead.

@txenoo Thank you for reviewing! Please feel free to point out my mistakes, because I have not so much experience in making a patch.

@marcusnchow
Copy link

Is there a plan for this to be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants