-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCIe-port not working on RK3399 #116
Comments
I'm having exactly the same issue with 3 of the 4 cards I've plugged into my RockPro64. The only card that functions as expected is an Intel I350-T4, the three Mellanox cards I've attempted to use all cause pcie initialization to fail. I'll get a serial cable out later and dump a full boot log with the 4.4 kernel and mainline. |
Bootlogs below. Kernel 4.4.132-1075 (ayufan 0.7.9): Kernel 4.18.0-rc8-1060 (ayufan): |
Having the same issue, cross ref: |
Can we please have some updates on this issue? |
Anyone? I'm about ready to sell my RockPro64 and just use x86_64 for my project. I'm 100% willing to supply any debug information you might need, and I've got about a dozen different PCIe network cards here I can test with. |
commit 4ea7701 upstream. When running kill(72057458746458112, 0) in userspace I hit the following issue. UBSAN: Undefined behaviour in kernel/signal.c:1462:11 negation of -2147483648 cannot be represented in type 'int': CPU: 226 PID: 9849 Comm: test Tainted: G B ---- ------- 3.10.0-327.53.58.70.x86_64_ubsan+ rockchip-linux#116 Hardware name: Huawei Technologies Co., Ltd. RH8100 V3/BC61PBIA, BIOS BLHSV028 11/11/2014 Call Trace: dump_stack+0x19/0x1b ubsan_epilogue+0xd/0x50 __ubsan_handle_negate_overflow+0x109/0x14e SYSC_kill+0x43e/0x4d0 SyS_kill+0xe/0x10 system_call_fastpath+0x16/0x1b Add code to avoid the UBSAN detection. [akpm@linux-foundation.org: tweak comment] Link: http://lkml.kernel.org/r/1496670008-59084-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhongjiang <zhongjiang@huawei.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ea7701 upstream. When running kill(72057458746458112, 0) in userspace I hit the following issue. UBSAN: Undefined behaviour in kernel/signal.c:1462:11 negation of -2147483648 cannot be represented in type 'int': CPU: 226 PID: 9849 Comm: test Tainted: G B ---- ------- 3.10.0-327.53.58.70.x86_64_ubsan+ #116 Hardware name: Huawei Technologies Co., Ltd. RH8100 V3/BC61PBIA, BIOS BLHSV028 11/11/2014 Call Trace: dump_stack+0x19/0x1b ubsan_epilogue+0xd/0x50 __ubsan_handle_negate_overflow+0x109/0x14e SYSC_kill+0x43e/0x4d0 SyS_kill+0xe/0x10 system_call_fastpath+0x16/0x1b Add code to avoid the UBSAN detection. [akpm@linux-foundation.org: tweak comment] Link: http://lkml.kernel.org/r/1496670008-59084-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhongjiang <zhongjiang@huawei.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I'm running into similar issues with an LSI HBA card. It works without issue in a standard x86 motherboard. gen1 training times out when the board is inserted and nothing shows up in lspci. If I plug in a USB3 host card it seems to work fine, so the PCIe slot itself is fine (gen2.1 rockpro64 board). I built 4.4.154-1124-rockchip-ayufan with PCI_DEBUG enabled and captured this dmesg output with the LSI card installed: Now, this card is an 8x card in a 4x slot, so as an experiment I ran it through a 1x PCIe mining adapter. It works just fine in an x86 motherboard in this config. When I use this with the rockpro64 I get a new error: To try something different I rebuilt the kernel, but extending the gen1 PCIe training timeout from 500ms to 5s (drivers/pci/host/pcie-rockchip.c line 619). The board boots normally without the LSI card, just giving the usual gen1 timeout message. If I boot it with the LSI card installed directly (no 1x adapter) now I get the error again: So, perhaps it is timing out too quickly or taking too long to train without the 1x adapter, which prevents it from getting to the error. For some reason it trains faster with the 1x adapter. Apologies if this is an unrelated issue - if so I'm happy to create a new one. I'm happy to test anything at this point. |
Any news about this? I'm having same the same problem with LSI 9201 card. If it's of any help here's few logs from my Rockpro64. With 4.4 kernel (ayufan's), serial console log (3 crashes): With 4.20-rc6 (ayufan's + patch to disable mmc command queueing), serial console log (3 crashes) and dmesg from last attempt: Edit: Like @rich0 above I tested the card in x86 setup (Ubuntu 18.04, 64bit). With it the card work in both PCIe3 16x and 1x slots and |
I am having the same issue with a Delock PCI Express Card > Mini PCIe adapter connected to a Telit LM960 LTE module: https://pastebin.com/FHMGRgVG |
Has there been any motion on this at all? I'm sitting on what's effectively a useless board for my application (10GbE routing) at the moment since I can't bring up any of the PCIe NICs I've tested (I've tried around 10 different NICs at this point.) The only NICs I've managed to use successfully are Intel i350 and similar boards. |
Didn´t debug why is my rockpro64 not booting with an LSI 3Ware 9650 , but I bet I have the same issues like above some people reporting. Would be nice to know where we can start to solve that problem? |
Updating my status: Looks like I got my LSI 9201 working at least with one SSD drive. For some reason PCIe driver seems to need some delay between training and bus scanning. I built a test kernel with this workaround on top of ayufan's latest 4.4: https://github.com/nuumio/linux-kernel/releases/tag/nuumio-4.4-pcie-scan-sleep-02 The most relevant change is: nuumio@5a65b17 (in branch: https://github.com/nuumio/linux-kernel/commits/nuumio-4.4-pcie-scan-sleep). Last time I tried this with a bit older kernel I got the controller up but it kept resetting the connection to SSD every few seconds. Now with more patches it seems somewhat stable. I have no idea about the root cause but hopefully this gives ideas where the actual problem is. Curiously the delay needed for this is about the same that was needed earlier for deferring SDIO initialization to get WiFi/BT module and PCIe working at the same time for Rockpro64 (that was finally done so that SDIO driver waits until PCIe is finished). My current setup:
4.4-development branch seems quite active currently. I hope you get this one resolved too :) |
Just to comment for the record and the benefit of the many others with this issue, nuumio's patch (which seems to be in line to be released on ayufan) fixes my issue. You just need to set a command line parameter to enable the delay (I haven't worked out the minimum required delay yet). I was also having power issues which were solved by a 1x mining adapter. Using a 5A power supply is likely to address that problem though nobody has tested the whole thing under heavy load yet. I'll be doing actual testing of the drives/etc but for now I can get the HBA to show up in lspci. ayufan also enabled LSI HBAs in his kernels. |
I bet the riser is shorting PCIe pin A1 to B17 to provide "presence" as a 1x card.. See https://imgur.com/a/AJB71Ih Shorting pin A1 (PRSNT1) to the second presence pin B31 (PRSNT2) would make a PCIe card detect as a 4x.* (The presence pins are a little bit shorter.) https://electronics.stackexchange.com/questions/201437/pcie-prsnt-signal-connection Explanation: PCIe Cards short these presence pins to indicate the number of PCIe lanes / BUS width the connection will be using. (Note: I don't yet have a RockPro64 to test this on yet.. but same thing would be happening on an Intel Chipset without these pins jumped. Here is an example of shorting the pin on the riser https://imgur.com/a/4rl7T5I taken from my plex server...) |
The card works fine on a powered 16x riser cable as well, like this one: The only downside to this powered riser is that it seems to drive power back into the rockpro64 such that it remains powered on even after disconnected from the power supply. I don't generally run it this way as I am not certain that drawing current in this way isn't harmful. So, aside from the likely power issue, the current ayufan kernels address my issues. |
I have similar back-powering issues just using a USB UART; I've found I
have to disconnect *everything* from the board when powering it down.
I rarely shut mine off so it hasn't been much of a problem but if I were
shutting off regularly I'd put everything powered on one power strip with a
switch and just use that.
…On Sun, Jun 2, 2019 at 7:21 AM rich0 ***@***.***> wrote:
The card works fine on a powered 16x riser cable as well, like this one:
https://www.amazon.com/gp/product/B01NAE4O7I/
The only downside to this powered riser is that it seems to drive power
back into the rockpro64 such that it remains powered on even after
disconnected from the power supply. I don't generally run it this way as I
am not certain that drawing current in this way isn't harmful.
So, aside from the likely power issue, the current ayufan kernels address
my issues.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#116?email_source=notifications&email_token=AG33PQO4FSICWVFRTKPHM2TPYPJPRA5CNFSM4FPC2VPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWXWX3Y#issuecomment-498035695>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AG33PQPO7H3ZSQCKRP5JWBDPYPJPRANCNFSM4FPC2VPA>
.
|
I am the same with a rockpi4b
lspci returns nothing then other times I will boot and
Seems completely spurious sometimes I hits runs of it working sometimes runs not. [Edit] I think the bridge has died on me as now with or without I can not get any listing on multiple tries |
I see the same issue with the following setup:
|
I see similar issues to this on a Rock Pi 4 being unable to detect, pcie (error -110) and hence the NVMe m.2 drive running kernel 5.6.7. The issue is intermittent as the NVMe drive is detected in linux about 5% of the time. If I use the u-boot provided by radxa (rather then mainline with rockchip patches) this then becomes 100% so I am not sure if there is some form of PCIe initialisation that is happening in the u-boot that resolves the issue. I would still like to see my NVMe working with mainline kernel and mainline u-boot. |
Here too a boot freeze when a PCIe adapter is present (PCIe x1 to Mini PCIe adapter with Coral Edge TPU).
|
Here's my output from a rockpro64 with a Compex WLE1216VX attached to the PCIE slot root@FarmBox:~# dmesg | grep pci |
@clarkis117 |
@StuartIanNaylor it is a mini pice card, which I have in a mini pcie to pcie adapter. The rockpro64 has a 4x pcie card slot on its board. It may be a power design issue as I was able to use an Intel wifi adapter in the same setup, and the compex card in an x86 pc with the same adapter. The compex card has a TDP greater than 10 watts. |
So one thing I found with testing is if you enable CONFIG_DEBUG_SHIRQ it shows up some issues on the driver. Some details here: https://patchwork.kernel.org/project/linux-rockchip/patch/1502353273-123788-1-git-send-email-shawn.lin@rock-chips.com/ |
[ Upstream commit 8cbcc5e ] Handle destruction of rules with port destination type to enable full destruction of flow. Without this handling of TX rules the deletion of these rules fails. Dmesg of flow destruction failure: [ 203.714146] mlx5_core 0000:00:0b.0: mlx5_cmd_check:753:(pid 342): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x144b7a) [ 210.547387] ------------[ cut here ]------------ [ 210.548663] refcount_t: decrement hit 0; leaking memory. [ 210.550651] WARNING: CPU: 4 PID: 342 at lib/refcount.c:31 refcount_warn_saturate+0x5c/0x110 [ 210.550654] Modules linked in: mlx5_ib mlx5_core ib_ipoib rdma_ucm rdma_cm iw_cm ib_cm ib_umad ib_uverbs ib_core [ 210.550675] CPU: 4 PID: 342 Comm: test Not tainted 5.8.0-rc2+ rockchip-linux#116 [ 210.550678] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 210.550680] RIP: 0010:refcount_warn_saturate+0x5c/0x110 [ 210.550685] Code: c6 d1 1b 01 00 0f 84 ad 00 00 00 5b 5d c3 80 3d b5 d1 1b 01 00 75 f4 48 c7 c7 20 d1 15 82 c6 05 a5 d1 1b 01 01 e8 a7 eb af ff <0f> 0b eb dd 80 3d 99 d1 1b 01 00 75 d4 48 c7 c7 c0 cf 15 82 c6 05 [ 210.550687] RSP: 0018:ffff8881642e77e8 EFLAGS: 00010282 [ 210.550691] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000 [ 210.550694] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102c85ceef [ 210.550696] RBP: ffff888161720428 R08: ffffffff8124c10e R09: ffffed103243beae [ 210.550698] R10: ffff8881921df56b R11: ffffed103243bead R12: ffff8881841b4180 [ 210.550701] R13: ffff888161720428 R14: ffff8881616d0000 R15: ffff888161720380 [ 210.550704] FS: 00007fc27f025740(0000) GS:ffff888192000000(0000) knlGS:0000000000000000 [ 210.550706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 210.550708] CR2: 0000557e4b41a6a0 CR3: 0000000002415004 CR4: 0000000000360ea0 [ 210.550711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 210.550713] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 210.550715] Call Trace: [ 210.550717] mlx5_del_flow_rules+0x484/0x490 [mlx5_core] [ 210.550720] ? mlx5_cmd_set_fte+0xa80/0xa80 [mlx5_core] [ 210.550722] mlx5_ib_destroy_flow+0x17f/0x280 [mlx5_ib] [ 210.550724] uverbs_free_flow+0x4c/0x90 [ib_uverbs] [ 210.550726] destroy_hw_idr_uobject+0x41/0xb0 [ib_uverbs] [ 210.550728] uverbs_destroy_uobject+0xaa/0x390 [ib_uverbs] [ 210.550731] __uverbs_cleanup_ufile+0x129/0x1b0 [ib_uverbs] [ 210.550733] ? uverbs_destroy_uobject+0x390/0x390 [ib_uverbs] [ 210.550735] uverbs_destroy_ufile_hw+0x78/0x190 [ib_uverbs] [ 210.550737] ib_uverbs_close+0x36/0x140 [ib_uverbs] [ 210.550739] __fput+0x181/0x380 [ 210.550741] task_work_run+0x88/0xd0 [ 210.550743] do_exit+0x5f6/0x13b0 [ 210.550745] ? sched_clock_cpu+0x30/0x140 [ 210.550747] ? is_current_pgrp_orphaned+0x70/0x70 [ 210.550750] ? lock_downgrade+0x360/0x360 [ 210.550752] ? mark_held_locks+0x1d/0x90 [ 210.550754] do_group_exit+0x8a/0x140 [ 210.550756] get_signal+0x20a/0xf50 [ 210.550758] do_signal+0x8c/0xbe0 [ 210.550760] ? hrtimer_nanosleep+0x1d8/0x200 [ 210.550762] ? nanosleep_copyout+0x50/0x50 [ 210.550764] ? restore_sigcontext+0x320/0x320 [ 210.550766] ? __hrtimer_init+0xf0/0xf0 [ 210.550768] ? timespec64_add_safe+0x150/0x150 [ 210.550770] ? mark_held_locks+0x1d/0x90 [ 210.550772] ? lockdep_hardirqs_on_prepare+0x14c/0x240 [ 210.550774] __prepare_exit_to_usermode+0x119/0x170 [ 210.550776] do_syscall_64+0x65/0x300 [ 210.550778] ? trace_hardirqs_off+0x10/0x120 [ 210.550781] ? mark_held_locks+0x1d/0x90 [ 210.550783] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 210.550785] ? lockdep_hardirqs_on+0x112/0x190 [ 210.550787] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 210.550789] RIP: 0033:0x7fc27f1cd157 [ 210.550791] Code: Bad RIP value. [ 210.550793] RSP: 002b:00007ffd4db27ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000023 [ 210.550798] RAX: fffffffffffffdfc RBX: ffffffffffffff80 RCX: 00007fc27f1cd157 [ 210.550800] RDX: 00007fc27f025740 RSI: 00007ffd4db27eb0 RDI: 00007ffd4db27eb0 [ 210.550803] RBP: 0000000000000016 R08: 0000000000000000 R09: 000000000000000e [ 210.550805] R10: 00007ffd4db27dc7 R11: 0000000000000246 R12: 0000000000400c00 [ 210.550808] R13: 00007ffd4db285f0 R14: 0000000000000000 R15: 0000000000000000 [ 210.550809] irq event stamp: 49399 [ 210.550812] hardirqs last enabled at (49399): [<ffffffff81172d36>] console_unlock+0x556/0x6f0 [ 210.550815] hardirqs last disabled at (49398): [<ffffffff81172897>] console_unlock+0xb7/0x6f0 [ 210.550818] softirqs last enabled at (48706): [<ffffffff81e0037b>] __do_softirq+0x37b/0x60c [ 210.550820] softirqs last disabled at (48697): [<ffffffff81c00e2f>] asm_call_on_stack+0xf/0x20 [ 210.550822] ---[ end trace ad18c0e6fa846454 ]--- [ 210.581862] mlx5_core 0000:00:0c.0: mlx5_destroy_flow_table:2132:(pid 342): Flow table 262150 wasn't destroyed, refcount > 1 Fixes: a7ee18b ("RDMA/mlx5: Allow creating a matcher for a NIC TX flow table") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8cbcc5e ] Handle destruction of rules with port destination type to enable full destruction of flow. Without this handling of TX rules the deletion of these rules fails. Dmesg of flow destruction failure: [ 203.714146] mlx5_core 0000:00:0b.0: mlx5_cmd_check:753:(pid 342): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x144b7a) [ 210.547387] ------------[ cut here ]------------ [ 210.548663] refcount_t: decrement hit 0; leaking memory. [ 210.550651] WARNING: CPU: 4 PID: 342 at lib/refcount.c:31 refcount_warn_saturate+0x5c/0x110 [ 210.550654] Modules linked in: mlx5_ib mlx5_core ib_ipoib rdma_ucm rdma_cm iw_cm ib_cm ib_umad ib_uverbs ib_core [ 210.550675] CPU: 4 PID: 342 Comm: test Not tainted 5.8.0-rc2+ rockchip-linux#116 [ 210.550678] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 210.550680] RIP: 0010:refcount_warn_saturate+0x5c/0x110 [ 210.550685] Code: c6 d1 1b 01 00 0f 84 ad 00 00 00 5b 5d c3 80 3d b5 d1 1b 01 00 75 f4 48 c7 c7 20 d1 15 82 c6 05 a5 d1 1b 01 01 e8 a7 eb af ff <0f> 0b eb dd 80 3d 99 d1 1b 01 00 75 d4 48 c7 c7 c0 cf 15 82 c6 05 [ 210.550687] RSP: 0018:ffff8881642e77e8 EFLAGS: 00010282 [ 210.550691] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000 [ 210.550694] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102c85ceef [ 210.550696] RBP: ffff888161720428 R08: ffffffff8124c10e R09: ffffed103243beae [ 210.550698] R10: ffff8881921df56b R11: ffffed103243bead R12: ffff8881841b4180 [ 210.550701] R13: ffff888161720428 R14: ffff8881616d0000 R15: ffff888161720380 [ 210.550704] FS: 00007fc27f025740(0000) GS:ffff888192000000(0000) knlGS:0000000000000000 [ 210.550706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 210.550708] CR2: 0000557e4b41a6a0 CR3: 0000000002415004 CR4: 0000000000360ea0 [ 210.550711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 210.550713] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 210.550715] Call Trace: [ 210.550717] mlx5_del_flow_rules+0x484/0x490 [mlx5_core] [ 210.550720] ? mlx5_cmd_set_fte+0xa80/0xa80 [mlx5_core] [ 210.550722] mlx5_ib_destroy_flow+0x17f/0x280 [mlx5_ib] [ 210.550724] uverbs_free_flow+0x4c/0x90 [ib_uverbs] [ 210.550726] destroy_hw_idr_uobject+0x41/0xb0 [ib_uverbs] [ 210.550728] uverbs_destroy_uobject+0xaa/0x390 [ib_uverbs] [ 210.550731] __uverbs_cleanup_ufile+0x129/0x1b0 [ib_uverbs] [ 210.550733] ? uverbs_destroy_uobject+0x390/0x390 [ib_uverbs] [ 210.550735] uverbs_destroy_ufile_hw+0x78/0x190 [ib_uverbs] [ 210.550737] ib_uverbs_close+0x36/0x140 [ib_uverbs] [ 210.550739] __fput+0x181/0x380 [ 210.550741] task_work_run+0x88/0xd0 [ 210.550743] do_exit+0x5f6/0x13b0 [ 210.550745] ? sched_clock_cpu+0x30/0x140 [ 210.550747] ? is_current_pgrp_orphaned+0x70/0x70 [ 210.550750] ? lock_downgrade+0x360/0x360 [ 210.550752] ? mark_held_locks+0x1d/0x90 [ 210.550754] do_group_exit+0x8a/0x140 [ 210.550756] get_signal+0x20a/0xf50 [ 210.550758] do_signal+0x8c/0xbe0 [ 210.550760] ? hrtimer_nanosleep+0x1d8/0x200 [ 210.550762] ? nanosleep_copyout+0x50/0x50 [ 210.550764] ? restore_sigcontext+0x320/0x320 [ 210.550766] ? __hrtimer_init+0xf0/0xf0 [ 210.550768] ? timespec64_add_safe+0x150/0x150 [ 210.550770] ? mark_held_locks+0x1d/0x90 [ 210.550772] ? lockdep_hardirqs_on_prepare+0x14c/0x240 [ 210.550774] __prepare_exit_to_usermode+0x119/0x170 [ 210.550776] do_syscall_64+0x65/0x300 [ 210.550778] ? trace_hardirqs_off+0x10/0x120 [ 210.550781] ? mark_held_locks+0x1d/0x90 [ 210.550783] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 210.550785] ? lockdep_hardirqs_on+0x112/0x190 [ 210.550787] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 210.550789] RIP: 0033:0x7fc27f1cd157 [ 210.550791] Code: Bad RIP value. [ 210.550793] RSP: 002b:00007ffd4db27ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000023 [ 210.550798] RAX: fffffffffffffdfc RBX: ffffffffffffff80 RCX: 00007fc27f1cd157 [ 210.550800] RDX: 00007fc27f025740 RSI: 00007ffd4db27eb0 RDI: 00007ffd4db27eb0 [ 210.550803] RBP: 0000000000000016 R08: 0000000000000000 R09: 000000000000000e [ 210.550805] R10: 00007ffd4db27dc7 R11: 0000000000000246 R12: 0000000000400c00 [ 210.550808] R13: 00007ffd4db285f0 R14: 0000000000000000 R15: 0000000000000000 [ 210.550809] irq event stamp: 49399 [ 210.550812] hardirqs last enabled at (49399): [<ffffffff81172d36>] console_unlock+0x556/0x6f0 [ 210.550815] hardirqs last disabled at (49398): [<ffffffff81172897>] console_unlock+0xb7/0x6f0 [ 210.550818] softirqs last enabled at (48706): [<ffffffff81e0037b>] __do_softirq+0x37b/0x60c [ 210.550820] softirqs last disabled at (48697): [<ffffffff81c00e2f>] asm_call_on_stack+0xf/0x20 [ 210.550822] ---[ end trace ad18c0e6fa846454 ]--- [ 210.581862] mlx5_core 0000:00:0c.0: mlx5_destroy_flow_table:2132:(pid 342): Flow table 262150 wasn't destroyed, refcount > 1 Fixes: a7ee18b ("RDMA/mlx5: Allow creating a matcher for a NIC TX flow table") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hello. Have you solved the problem with the "Coral Edge TPU"? |
I've had reports the Coral Edge TPU does work on Fedora. Note the Edge TPU PCIe driver which was in staging upstream has now been dropped from the upstream kernel so the testing was by someone that built their own kernel to bring those drivers back. |
@nullr0ute |
https://gitlab.manjaro.org/manjaro-arm/packages/core/linux/-/issues/34 |
…er invert" This reverts commit 334791b. Reason for revert: The following warning appears on rk3588-evb1-lp4-v10 when suspend: [ 31.636037][ T414] unbalanced disables for vcc3v3_lcd0_n [ 31.636166][ T414] WARNING: CPU: 2 PID: 414 at drivers/regulator/core.c:2768 _regulator_disable+0x2e8/0x2f4 [ 31.636191][ T414] Modules linked in: bcmdhd dhd_static_buf [ 31.636256][ T414] CPU: 2 PID: 414 Comm: composer@2.1-se Not tainted 5.10.110 rockchip-linux#116 [ 31.636279][ T414] Hardware name: Rockchip RK3588 EVB1 LP4 V10 Board (DT) [ 31.636309][ T414] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--) [ 31.636338][ T414] pc : _regulator_disable+0x2e8/0x2f4 [ 31.636366][ T414] lr : _regulator_disable+0x2e8/0x2f4 ... [ 31.636950][ T414] Call trace: [ 31.636980][ T414] _regulator_disable+0x2e8/0x2f4 [ 31.637009][ T414] regulator_disable+0x40/0x84 [ 31.637036][ T414] panel_simple_unprepare+0x78/0xa4 [ 31.637064][ T414] drm_panel_unprepare+0x28/0x48 [ 31.637094][ T414] dw_mipi_dsi2_encoder_disable+0x70/0xbc [ 31.637123][ T414] drm_atomic_helper_commit_modeset_disables+0x174/0x4d0 [ 31.637154][ T414] rockchip_drm_atomic_helper_commit_tail_rpm+0x44/0x184 [ 31.637180][ T414] commit_tail+0x110/0x200 [ 31.637209][ T414] drm_atomic_helper_commit+0x1f0/0x210 [ 31.637238][ T414] drm_atomic_commit+0x50/0x64 [ 31.637268][ T414] drm_mode_atomic_ioctl+0x620/0x744 [ 31.637298][ T414] drm_ioctl+0x24c/0x3b8 [ 31.637328][ T414] __arm64_sys_ioctl+0x94/0xd0 [ 31.637359][ T414] el0_svc_common+0xc0/0x23c [ 31.637388][ T414] do_el0_svc+0x28/0x88 [ 31.637417][ T414] el0_svc+0x14/0x24 [ 31.637446][ T414] el0_sync_handler+0x88/0xec [ 31.637474][ T414] el0_sync+0x1a8/0x1c0 Signed-off-by: Tao Huang <huangtao@rock-chips.com> Change-Id: Id27946e0ef3a6c320214c961b8e9b02978a15f6b
…er invert" This reverts commit 334791b. Reason for revert: The following warning appears on rk3588-evb1-lp4-v10 when suspend: [ 31.636037][ T414] unbalanced disables for vcc3v3_lcd0_n [ 31.636166][ T414] WARNING: CPU: 2 PID: 414 at drivers/regulator/core.c:2768 _regulator_disable+0x2e8/0x2f4 [ 31.636191][ T414] Modules linked in: bcmdhd dhd_static_buf [ 31.636256][ T414] CPU: 2 PID: 414 Comm: composer@2.1-se Not tainted 5.10.110 radxa#116 [ 31.636279][ T414] Hardware name: Rockchip RK3588 EVB1 LP4 V10 Board (DT) [ 31.636309][ T414] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--) [ 31.636338][ T414] pc : _regulator_disable+0x2e8/0x2f4 [ 31.636366][ T414] lr : _regulator_disable+0x2e8/0x2f4 ... [ 31.636950][ T414] Call trace: [ 31.636980][ T414] _regulator_disable+0x2e8/0x2f4 [ 31.637009][ T414] regulator_disable+0x40/0x84 [ 31.637036][ T414] panel_simple_unprepare+0x78/0xa4 [ 31.637064][ T414] drm_panel_unprepare+0x28/0x48 [ 31.637094][ T414] dw_mipi_dsi2_encoder_disable+0x70/0xbc [ 31.637123][ T414] drm_atomic_helper_commit_modeset_disables+0x174/0x4d0 [ 31.637154][ T414] rockchip_drm_atomic_helper_commit_tail_rpm+0x44/0x184 [ 31.637180][ T414] commit_tail+0x110/0x200 [ 31.637209][ T414] drm_atomic_helper_commit+0x1f0/0x210 [ 31.637238][ T414] drm_atomic_commit+0x50/0x64 [ 31.637268][ T414] drm_mode_atomic_ioctl+0x620/0x744 [ 31.637298][ T414] drm_ioctl+0x24c/0x3b8 [ 31.637328][ T414] __arm64_sys_ioctl+0x94/0xd0 [ 31.637359][ T414] el0_svc_common+0xc0/0x23c [ 31.637388][ T414] do_el0_svc+0x28/0x88 [ 31.637417][ T414] el0_svc+0x14/0x24 [ 31.637446][ T414] el0_sync_handler+0x88/0xec [ 31.637474][ T414] el0_sync+0x1a8/0x1c0 Signed-off-by: Tao Huang <huangtao@rock-chips.com> Change-Id: Id27946e0ef3a6c320214c961b8e9b02978a15f6b
[ Upstream commit fb6df43 ] Lockdep reports that acpi_nfit_shutdown() may deadlock against an opportune acpi_nfit_scrub(). acpi_nfit_scrub () is run from inside a 'work' and therefore has already acquired workqueue-internal locks. It also acquiires acpi_desc->init_mutex. acpi_nfit_shutdown() first acquires init_mutex, and was subsequently attempting to cancel any pending workqueue items. This reversed locking order causes a potential deadlock: ====================================================== WARNING: possible circular locking dependency detected 6.2.0-rc3 rockchip-linux#116 Tainted: G O N ------------------------------------------------------ libndctl/1958 is trying to acquire lock: ffff888129b461c0 ((work_completion)(&(&acpi_desc->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x43/0x450 but task is already holding lock: ffff888129b460e8 (&acpi_desc->init_mutex){+.+.}-{3:3}, at: acpi_nfit_shutdown+0x87/0xd0 [nfit] which lock already depends on the new lock. ... Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&acpi_desc->init_mutex); lock((work_completion)(&(&acpi_desc->dwork)->work)); lock(&acpi_desc->init_mutex); lock((work_completion)(&(&acpi_desc->dwork)->work)); *** DEADLOCK *** Since the workqueue manipulation is protected by its own internal locking, the cancellation of pending work doesn't need to be done under acpi_desc->init_mutex. Move cancel_delayed_work_sync() outside the init_mutex to fix the deadlock. Any work that starts after acpi_nfit_shutdown() drops the lock will see ARS_CANCEL, and the cancel_delayed_work_sync() will safely flush it out. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20230112-acpi_nfit_lockdep-v1-1-660be4dd10be@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fb6df43 ] Lockdep reports that acpi_nfit_shutdown() may deadlock against an opportune acpi_nfit_scrub(). acpi_nfit_scrub () is run from inside a 'work' and therefore has already acquired workqueue-internal locks. It also acquiires acpi_desc->init_mutex. acpi_nfit_shutdown() first acquires init_mutex, and was subsequently attempting to cancel any pending workqueue items. This reversed locking order causes a potential deadlock: ====================================================== WARNING: possible circular locking dependency detected 6.2.0-rc3 rockchip-linux#116 Tainted: G O N ------------------------------------------------------ libndctl/1958 is trying to acquire lock: ffff888129b461c0 ((work_completion)(&(&acpi_desc->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x43/0x450 but task is already holding lock: ffff888129b460e8 (&acpi_desc->init_mutex){+.+.}-{3:3}, at: acpi_nfit_shutdown+0x87/0xd0 [nfit] which lock already depends on the new lock. ... Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&acpi_desc->init_mutex); lock((work_completion)(&(&acpi_desc->dwork)->work)); lock(&acpi_desc->init_mutex); lock((work_completion)(&(&acpi_desc->dwork)->work)); *** DEADLOCK *** Since the workqueue manipulation is protected by its own internal locking, the cancellation of pending work doesn't need to be done under acpi_desc->init_mutex. Move cancel_delayed_work_sync() outside the init_mutex to fix the deadlock. Any work that starts after acpi_nfit_shutdown() drops the lock will see ARS_CANCEL, and the cancel_delayed_work_sync() will safely flush it out. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20230112-acpi_nfit_lockdep-v1-1-660be4dd10be@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We have a RockPro64 board here and tried to get the pcie port working.
Depending on the card that was inserted, the port was either simply disabled, or the kernel panicked during boot.
Dmesg when no card is in the slot (4.4.138-1094):
With cards like Dell PowerEdge Perc 5i SAS RAID Controller, the kernel seldomly boots, ignoring the card and giving the dmesg-output from above, but most of the times, the kernel crashes with a couple of different stack-traces.
This is a stack-trace with this card using a mainline kernel. In contrast to this kernel, mainline continues booting, thus we were able to copy it out:
The text was updated successfully, but these errors were encountered: