Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The examples/riscv has a stuck bug #125

Closed
elliott10 opened this issue Apr 15, 2024 · 7 comments · Fixed by #128
Closed

The examples/riscv has a stuck bug #125

elliott10 opened this issue Apr 15, 2024 · 7 comments · Fixed by #128

Comments

@elliott10
Copy link

Running examples/riscv will get stuck.

Like below:

屏幕截图 2024-04-12 161449

@fslongjin
Copy link
Contributor

Me too. my qemu version is

QEMU emulator version 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.17)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

@fslongjin
Copy link
Contributor

fslongjin commented Apr 22, 2024

This issue is a bit magical; after adding #[inline(never)] to the can_pop function in VirtQueue under src/queue.rs, the test case can running normally. Or here:

while !self.can_pop() {
            info!("add_notify_wait_pop: can_pop=false");
            spin_loop();
        }

By adding an output log statement, it runs normally. I suspect it might be due to a lack of volatile read?

I modified codes in VirtQueue and other structs, make these variables volatile and the bug solved. I'll PR later.

@qwandor
Copy link
Collaborator

qwandor commented Apr 22, 2024

I don't think volatile access should be necessary for the used ring (e.g. in VirtQueue::can_pop), because it is shared memory not MMIO. My guess would be that we're missing a read barrier somewhere, which is allowing the compiler (or possibly the CPU?) to reorder memory access in a way that gives incorrect values.

@fslongjin
Copy link
Contributor

I don't think volatile access should be necessary for the used ring (e.g. in VirtQueue::can_pop), because it is shared memory not MMIO. My guess would be that we're missing a read barrier somewhere, which is allowing the compiler (or possibly the CPU?) to reorder memory access in a way that gives incorrect values.

Maybe you are right.

Despite the test case of riscv in virtio-drivers can run properly, I have another test case which can not run properly yet.

My test case is in s-mode, sv39, rv64. In my test case the virtio-blk-mmio(legacy) hungs, and virtio-blk-mmio(modern) reports 'not ready' when it read blocks.

But the virtio net driver runs well.(both legacy and modern)

I'll try to investigate what happens.

@qwandor
Copy link
Collaborator

qwandor commented Apr 22, 2024

After talking to some other people about this, I think the issue is that we should be using atomics for some of these shared memory accesses. I'll send a PR shortly for you to test.

@qwandor
Copy link
Collaborator

qwandor commented Apr 22, 2024

@fslongjin Can you test if #128 fixes your issues reliably?

@fslongjin
Copy link
Contributor

@fslongjin Can you test if #128 fixes your issues reliably?

sure, I'll test it tomorrow morning~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants