Skip to content

RAID1 Disk Setup and Asynchronous Reads#223

Merged
ioeddk merged 20 commits into
mainfrom
yingqi/raid1_merge/setting_up_raid_qemu
May 16, 2026
Merged

RAID1 Disk Setup and Asynchronous Reads#223
ioeddk merged 20 commits into
mainfrom
yingqi/raid1_merge/setting_up_raid_qemu

Conversation

@ioeddk
Copy link
Copy Markdown
Contributor

@ioeddk ioeddk commented May 5, 2026

As one of the intermediate PR of #170, merging the following commits:

  1. Updated the QEMU start-up options to use three physical partitions as RAID1 member devices, rather than disk images.
  2. Updated the RAID1 process_read to perform asynchronous reads.
  3. Resolve the long tail latency of the RAID device by creating the RAID1 thread with the real-time scheduler.
  4. Changed the OQueue used in the VirtIO module to the new OQueue.

@ioeddk ioeddk requested a review from a team as a code owner May 5, 2026 00:10
Copy link
Copy Markdown
Contributor

@arthurp arthurp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits that should be trivial to fix.

The complex bit is creating and configuring the block devices. See comments below.

Comment thread kernel/comps/virtio/src/device/block/device.rs Outdated
Comment thread kernel/comps/raid/src/lib.rs Outdated
Comment thread kernel/comps/virtio/src/device/block/device.rs
Comment thread kernel/comps/virtio/src/device/mod.rs Outdated
Comment thread test/apps/test_common.mk
Comment thread test/Makefile
Comment thread tools/qemu_args.sh
Comment thread OSDK.toml
Comment on lines -65 to +67
-drive if=none,format=raw,id=r0,file=./test/build/raid1_0.img \
-drive if=none,format=raw,id=r1,file=./test/build/raid1_1.img \
-drive if=none,format=raw,id=r0,file=/dev/nvme0n1p1 \
-drive if=none,format=raw,id=r1,file=/dev/nvme1n1p1 \
-drive if=none,format=raw,id=r2,file=/dev/nvme2n1p1 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to handle this given the discussion of multiple configurations above. Do you know off hand when this file is actually used? I think, most of the time, it's actually ignored. If we can understand when this file is actually used we can potentially remove the raid devices entirely and just document that raid isn't configured in the specific configuration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configurations here is configuring QEMU if emulating Asterinas on RISC-V. Just deleting them doesn't affect the CI/CD or development for us (we are defaulting to x86_64). But if there are RISC-V emulation in CI/CD it may fail.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. If the makefile creates image files by default, it's probably best to just point this at those files.

Copy link
Copy Markdown
Contributor Author

@ioeddk ioeddk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed minor changes. Will change qemu_args.sh and optimize snafu error handling ASAP.

Comment thread test/Makefile
Comment thread tools/qemu_args.sh
Comment thread OSDK.toml
Comment on lines -65 to +67
-drive if=none,format=raw,id=r0,file=./test/build/raid1_0.img \
-drive if=none,format=raw,id=r1,file=./test/build/raid1_1.img \
-drive if=none,format=raw,id=r0,file=/dev/nvme0n1p1 \
-drive if=none,format=raw,id=r1,file=/dev/nvme1n1p1 \
-drive if=none,format=raw,id=r2,file=/dev/nvme2n1p1 \
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configurations here is configuring QEMU if emulating Asterinas on RISC-V. Just deleting them doesn't affect the CI/CD or development for us (we are defaulting to x86_64). But if there are RISC-V emulation in CI/CD it may fail.

Comment thread test/apps/test_common.mk
Comment thread kernel/comps/virtio/src/device/block/device.rs Outdated
Comment thread kernel/comps/raid/src/lib.rs Outdated
Comment thread tools/qemu_args.sh Outdated
Comment thread tools/qemu_args.sh
@ioeddk ioeddk force-pushed the yingqi/raid1_merge/setting_up_raid_qemu branch from 508dd72 to 6819de0 Compare May 14, 2026 02:26
Copy link
Copy Markdown
Contributor

@arthurp arthurp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to nit pick so much. But there are just a lot of things that feel like they haven't been checked or that I think we have discussed before but you haven't addressed. Please read the entire diff yourself again with all previous comments in mind. Let's chat a little about this when we meet today.

Since we have reached a point where nothing is a major issue I can't fix in an already planned next PR, go ahead and do your full pass and then merge.

Comment thread tools/qemu_args.sh Outdated
# Configure RAID drive sources. Set RAID_DEVICES to a comma-separated list of
# exactly three existing block devices (e.g.
# RAID_DEVICES=/dev/nvme0n1p1,/dev/nvme1n1p1,/dev/nvme2n1p1) to pass them directly to the guest.
# If unset or any device path is invalid, 1GB ext2 image files are used instead.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: In general, I want something to fail if there is invalid data (as opposed to missing), not use a default. This is because the user probably meant to specify something, but did it wrong. If they wanted the default, they wouldn't have specified anything at all. This is something AI does wrong constantly, I don't know if you used it here, but it's worth noting.

Comment thread tools/qemu_args.sh
cp "$RAID_DEV_0" "$RAID_IMG"
fi
done
RAID_CACHE="writeback"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the cache mode different? I'm not saying it's wrong, I'm just curious.

Comment thread tools/qemu_args.sh Outdated
Comment on lines +23 to +56
SSH_RAND_PORT=${SSH_PORT:-22}
SSH_RAND_PORT=${SSH_PORT:-61541}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some fixes, like changing this back to 22, are in some later commits in #170. Since this branch is a cherry-picked first few commits from #170, those places appear to be unfixed.

Comment on lines -28 to +37
#[derive(Clone)]
#[derive(Clone, Copy, Default, BinarySerde)]
#[repr(C)]
pub struct BlockDeviceCompletionStats {
/// The latency of the I/O request (time from submission to completion).
pub latency: Duration,
/// The latency of the I/O request in microseconds.
pub latency_us: u64,
/// The number of outstanding requests at completion time.
pub outstanding_requests: usize,
pub outstanding_requests: u64,
/// The index of the device that produced this stat.
pub device_index: u64,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Aren't we dropping these changes? Once I move to CBOR, Duration works and the derive is different. It doesn't really matter I guess, but I'm going to immediately undo these changes. I thought we discussed this, but maybe I'm wrong, or maybe I said not to worry about it. Sorry, if those are the case.

Comment on lines +168 to +172
// Disable IRQs while holding the OQueue's SpinLock inside
// try_strong_observe to prevent deadlock with the IRQ handler
// that produces to the same OQueue (bio completion stats).
let _irq_guard = ostd::trap::irq::disable_local();
o.try_strong_observe()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed after #201 landed?

Comment thread kernel/comps/raid/src/lib.rs Outdated
/// Dispatches a request by type. The RAID-1 device accepts the same BIOs as
/// any `BlockDevice` and applies RAID semantics underneath.
fn process_request(&self, request: BioRequest) {
// log::info!("Raid1Device process request, type: {:?}", request.type_());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code.

/// Handles the irq issued from the device
fn handle_irq(&self) {
info!("Virtio block device handle irq");
// info!("Virtio block device handle irq");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code.

Comment thread kernel/src/fs/mod.rs Outdated
Comment on lines +85 to +96
// single disk benchmark
// let nvme_device_name = "raid0";
// if let Ok(block_device_nvme) = start_block_device(nvme_device_name) {
// let nvme_fs = Ext2::open(block_device_nvme).unwrap();
// let target_path = FsPath::try_from("/raid1").unwrap();
// self::rootfs::mount_fs_at(nvme_fs, &target_path).unwrap();
// info!("[kernel] Mounted NVMe fs at {:?} ", target_path);
// } else {
// error!("[kernel] Failed to start NVMe block device '{}'", nvme_device_name);
// }
// return;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented code.

@ioeddk
Copy link
Copy Markdown
Contributor Author

ioeddk commented May 16, 2026

All issues addressed.

@ioeddk ioeddk merged commit 13484de into main May 16, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants