-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aspeed_smc: Calculate checksum on normal DMA #15
Conversation
This patch adds the missing checksum calculation on normal DMA transfer. According to the datasheet this is how the SMC should behave. Verified on AST1250 that the hardware matches the behaviour. Signed-off-by: Christian Svensson <bluecmd@google.com>
Hi @bluecmd - have you sent this upstream? Ideally all of our patches would be at least on the upstream list(s), similar to how we handle kernel development. If you Cc the OpenBMC mailing list we can pick it up from there, as well as follow the conversation. Please also Cc @legoater and myself directly. |
I have no idea how to submit kernel patches like these. If you have any documentation or guide I can follow then I can do that :-) |
There's a description here of how to send qemu patches: https://github.com/qemu/qemu/blob/master/README#L52 |
Thanks, this is correct indeed. I recently did some changes in my patchset 034ee5d#diff-1c5ed2b17fd2d1367fc3dd9db76216eeR722 I prefer to work with email patches, as this is how most open-source |
@legoater I fully understand, no hard feelings. My goal was to raise awareness of this patch and have it merged to mainline one way or another - worst case I figured people would just redirect me to where to post it. Mission accomplished :-).
I don't fully understand what you mean here. I did look at that function, that's how I figured what was missing from the _rw one. The patch you're referring to is on the same branch as this fix. Btw, if your statement from your original post ("I will fix it directly in |
please send you patch by email on the openbmc mailing list. I will comment there. |
Patch sent |
Let start from the beginning: Commit b9e413d (in 2.9) "block: explicitly acquire aiocontext in aio callbacks that need it" added pairs of aio_context_acquire/release to mirror_write_complete and mirror_read_complete, when they were aio callbacks for blk_aio_* calls. Then, commit 2e1990b (in 3.0) "block/mirror: Convert to coroutines" dropped these blk_aio_* calls, than mirror_write_complete and mirror_read_complete are not callbacks more, and don't need additional aiocontext acquiring. Furthermore, mirror_read_complete calls blk_co_pwritev inside these pair of aio_context_acquire/release, which leads to the following dead-lock with mirror: (gdb) info thr Id Target Id Frame 3 Thread (LWP 145412) "qemu-system-x86" syscall () 2 Thread (LWP 145416) "qemu-system-x86" __lll_lock_wait () * 1 Thread (LWP 145411) "qemu-system-x86" __lll_lock_wait () (gdb) bt #0 __lll_lock_wait () #1 _L_lock_812 () #2 __GI___pthread_mutex_lock #3 qemu_mutex_lock_impl (mutex=0x561032dce420 <qemu_global_mutex>, file=0x5610327d8654 "util/main-loop.c", line=236) at util/qemu-thread-posix.c:66 #4 qemu_mutex_lock_iothread_impl #5 os_host_main_loop_wait (timeout=480116000) at util/main-loop.c:236 #6 main_loop_wait (nonblocking=0) at util/main-loop.c:497 #7 main_loop () at vl.c:1892 #8 main Printing contents of qemu_global_mutex, I see that "__owner = 145416", so, thr1 is main loop, and now it wants BQL, which is owned by thr2. (gdb) thr 2 (gdb) bt #0 __lll_lock_wait () #1 _L_lock_870 () #2 __GI___pthread_mutex_lock #3 qemu_mutex_lock_impl (mutex=0x561034d25dc0, ... #4 aio_context_acquire (ctx=0x561034d25d60) #5 dma_blk_cb #6 dma_blk_io #7 dma_blk_read #8 ide_dma_cb #9 bmdma_cmd_writeb #10 bmdma_write #11 memory_region_write_accessor #12 access_with_adjusted_size #15 flatview_write #16 address_space_write #17 address_space_rw #18 kvm_handle_io #19 kvm_cpu_exec #20 qemu_kvm_cpu_thread_fn #21 qemu_thread_start #22 start_thread #23 clone () Printing mutex in fr 2, I see "__owner = 145411", so thr2 wants aio context mutex, which is owned by thr1. Classic dead-lock. Then, let's check that aio context is hold by mirror coroutine: just print coroutine stack of first tracked request in mirror job target: (gdb) [...] (gdb) qemu coroutine 0x561035dd0860 #0 qemu_coroutine_switch #1 qemu_coroutine_yield #2 qemu_co_mutex_lock_slowpath #3 qemu_co_mutex_lock #4 qcow2_co_pwritev #5 bdrv_driver_pwritev #6 bdrv_aligned_pwritev #7 bdrv_co_pwritev #8 blk_co_pwritev #9 mirror_read_complete () at block/mirror.c:232 #10 mirror_co_read () at block/mirror.c:370 #11 coroutine_trampoline #12 __start_context Yes it is mirror_read_complete calling blk_co_pwritev after acquiring aio context. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow overlapping request by removing the assert that made it impossible. There are only two callers: 1. block_copy_task_create() It already asserts the very same condition before calling reqlist_init_req(). 2. cbw_snapshot_read_lock() There is no need to have read requests be non-overlapping in copy-before-write when used for snapshot-access. In fact, there was no protection against two callers of cbw_snapshot_read_lock() calling reqlist_init_req() with overlapping ranges and this could lead to an assertion failure [1]. In particular, with the reproducer script below [0], two cbw_co_snapshot_block_status() callers could race, with the second calling reqlist_init_req() before the first one finishes and removes its conflicting request. [0]: > #!/bin/bash -e > dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 > ./qemu-img create /tmp/fleecing.raw -f raw 1G > ( > ./qemu-system-x86_64 --qmp stdio \ > --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ > --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ > <<EOF > {"execute": "qmp_capabilities"} > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } } > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } } > {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } } > {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}} > EOF > ) & > sleep 5 > while true; do > ./qemu-nbd -d /dev/nbd0 > ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r > nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket' > done [1]: > #5 0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 > #6 0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 > #7 0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 > #8 0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 > #9 0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 > #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 > #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 > #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 > #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 > #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 > #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 > #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 > #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 > #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121 > #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175 Cc: qemu-stable@nongnu.org Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240712140716.517911-1-f.ebner@proxmox.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> (cherry picked from commit 6475155) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch adds the missing checksum calculation on normal DMA transfer.
According to the datasheet this is how the SMC should behave.
Verified on AST1250 that the hardware matches the behaviour.
Signed-off-by: Christian Svensson bluecmd@google.com