Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime-rs: built-in Dragonball sandbox part I - resource and device managers #4265

Merged
merged 15 commits into from
Jun 12, 2022

Conversation

studychao
Copy link
Member

@studychao studychao commented May 16, 2022

This pull request initializes the basic framework for Dragonball Sandbox and introduces the resource and device managers part.

In this PR, we introduce the abstractions for several management for virtual machine managers such as LegacyDeviceManager, AddressSpaceManager, ResourceManager and etc. They are responsible for maintaining different kinds of resources for virtual machines.

This is the first pull request for Dragonball Sandbox and there are approximately 6 PRs covering different parts of VMM in total.

Fixes: #4257

@studychao studychao requested a review from a team as a code owner May 16, 2022 15:46
@studychao
Copy link
Member Author

studychao commented May 16, 2022

Related issue: #4257

@studychao studychao force-pushed the anolis/dragonball-1 branch 4 times, most recently from c859a25 to 57c7151 Compare May 17, 2022 02:49
@studychao studychao force-pushed the anolis/dragonball-1 branch 3 times, most recently from 7caa412 to 5338bfb Compare May 17, 2022 10:00
Copy link
Contributor

@quanweiZhou quanweiZhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

atomic-guest-memory = []
virtio-vsock = ["dbs-virtio-devices/virtio-vsock", "virtio-queue"]

[patch.'crates-io']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unnecessary to use patch for lib.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be removed at the final version of Dragonball:)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be removed at the final version of Dragonball:)

const MSI_IRQ_BASE: u32 = LEGACY_IRQ_MAX + 1;

// Linux: KVM_MAX_IRQ_ROUTES is defined as 4096 in linux/include/linux/kvm_host.h
const MSI_IRQ_MAX: u32 = 1023;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment does not match for MSI_IRQ_MAX.

// Linux: KVM_MAX_IRQ_ROUTES is defined as 4096 in linux/include/linux/kvm_host.h
const MSI_IRQ_MAX: u32 = 1023;
// Linux: KVM_USER_MEM_SLOTS is 509 for x86_64 and 512 for aarch64.
const KVM_USER_MEM_SLOTS: u32 = 509;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it need to use #[cfg(target_arch = "x86_64")] and #[cfg(target_arch = "aarch64")] to distinguish KVM_USER_MEM_SLOTS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this value come from? Could you add a reference to the kernel source line that specifies this value?

Copy link
Contributor

@jongwu jongwu Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm curious about it too. I check it in aarch64 and KVM_USER_MEM_SLOTS is 32767. If I'm wrong, pls correct me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thans for your comments!
As Dragonball only supports kernel-4.19 and kernel-5.10 currently, the macro "KVM_USER_MEM_SLOTS" is defined in file linux/arch/arm64/include/asm/kvm_host.h, in which KVM_USER_MEM_SLOTS = 512.
In kernel-5.12 and above, this macro is defined in linux/include/linux/kvm_host.h and KVM_USER_MEM_SLOTS = 32767.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation.
Does that mean we must run kata3.0 on top of specific host kernel version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jongwu :)
For now, we only test Arm Dragonball on these two kernel version(4.19 & 5.10). In the future, we may support higher versions of kernel.

src/dragonball/src/resource_manager.rs Outdated Show resolved Hide resolved

/// Type of the dragonball virtio mmio devices.
#[cfg(feature = "dbs-virtio-devices")]
pub type DbsMmioV2Device = dbs_virtio_devices::mmio::MmioV2Device<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary qualification dbs_virtio_devices::mmio::

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Thanks.

stdin_handle,
sock_listener,
sock_conn: None,
logger: logger.new(slog::o!()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to add a subsystem for the logger?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

}

if should_drop {
let events = Events::with_data_raw(libc::STDIN_FILENO, EPOLL_EVENT_STDIN, EventSet::IN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it need to set self.stdin_handle to None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If self.stdin_handle is set to None, stdin related connection will be lost. We only want to drop the data in stdin channel here.

@studychao studychao force-pushed the anolis/dragonball-1 branch 3 times, most recently from 38fc116 to f3b4a41 Compare May 18, 2022 05:25
src/dragonball/README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/dragonball/README.md Outdated Show resolved Hide resolved
@studychao studychao force-pushed the anolis/dragonball-1 branch 4 times, most recently from 8cea7db to 8681a42 Compare May 20, 2022 02:50
Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @studychao - A few initial comments...

default: build

build:
cargo update -p vm-memory:0.8.0 --precise 0.7.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Buglet: This line is prefixed by spaces, not tabs as required for make.
  • Why is this line required / why can't version 0.8.0 be specified in Cargo.toml?

Copy link
Contributor

@wllenyj wllenyj May 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the crates in dbs-xxx uses version 0.7.0 of vm-memory, but rust-vmm uses 0.8.0. This will be resolved by upgrading dbs-xxx in later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space is changed to tab


test:
@echo "INFO: testing dragonball for development build"
cargo test --all-features -- --test-threads 1 --nocapture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this limited to a single thread? Do the unit tests need to be fixed (or annotated with #[serial]) maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem has been solved, and the --test-threads 1 will be deleted in the following patch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already deleted.

@@ -0,0 +1,32 @@
# Introduction
`Dragonball Sandbox` is a light-weight virtual machine manager(VMM) based on Linux Kernel-based Virtual Machine(KVM),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a single space before brackets:

Suggested change
`Dragonball Sandbox` is a light-weight virtual machine manager(VMM) based on Linux Kernel-based Virtual Machine(KVM),
`Dragonball Sandbox` is a light-weight virtual machine manager (VMM) based on Linux Kernel-based Virtual Machine (KVM),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed:) Thanks.

which is optimized for container workloads with:
- container image management and acceleration service
- flexible and high-performance virtual device drivers
- low CPU & memory overhead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- low CPU & memory overhead
- low CPU and memory overhead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

- container image management and acceleration service
- flexible and high-performance virtual device drivers
- low CPU & memory overhead
- industry-leading startup time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider rewording this to something like:

Suggested change
- industry-leading startup time
- Minimal startup time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed~

|| irq.checked_add(count).is_none()
|| irq + count - 1 > MSI_IRQ_MAX
{
panic!("invalid irq number when freeing legacy irq");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please add a unit test using #[should_panic] for this and any other functions that panic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic has been removed.

Comment on lines 505 to 521
Resource::PioAddressRange { base, size } => self.free_pio_address(*base, *size),
Resource::MmioAddressRange { base, size } => self.free_mmio_address(*base, *size),
Resource::MemAddressRange { base, size } => self.free_mem_address(*base, *size),
Resource::LegacyIrq(base) => self.free_legacy_irq(*base),
Resource::MsiIrq { ty: _, base, size } => self.free_msi_irq(*base, *size),
Resource::KvmMemSlot(slot) => self.free_kvm_mem_slot(*slot),
Resource::MacAddresss(_) => {}
}
}
}

fn free_allocated_resources(
&self,
resources: DeviceResources,
) -> Result<DeviceResources, ResourceError> {
self.free_device_resources(&resources);
Err(ResourceError::NoAvailResource)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly odd code: All the free_*() handlers panic on error and the function that calls the handlers then returns an error "on success". Also, if the resources are freed, saying there are no resources available is "correct", but it's potentially misleading as it really means, "there (probably) are system resources to create the requested resources, but those requested resources haven't been (re-)created yet".

Why can we just use the conventional methodology here and make the handlers return a Result to allow free_device_resources() to return that Result. This function could then return that Result on error, or Ok on success, as is normal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion.

We have refactored this part.

Comment on lines +10 to +11
/// The descriptor to the initrd file, if there is one
initrd_file: Option<File>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that normal rootfs images cannot be used? Also, what does it mean if initrd_file == None?

Copy link
Member Author

@studychao studychao May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support normal rootfs images and initrd is for providing the possibility for some cases if users need to use initrd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I'm still not clear on this point. If initrd_file = None, where is the rootfs image specified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rootfs image related setting will be introduced in the next PR related to block device manager

Comment on lines 14 to 15
/// Share guest kernel text/ro sections.
share_ro_sections: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a String and not just a bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a performance feature and we decide to not include this in the stage one for Dragonball:)

Comment on lines +15 to +17
pub host_numa_node_id: Option<u32>,
/// numa node id on guest for this region
pub guest_numa_node_id: Option<u32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Are these values always different?
  • Can multiple NumaRegionInfo object contain the same values for these elements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • host_numa_node_id and guest_numa_node_id could be the same or different, it's related to users' actual needs.
  • Technically, it won't be multiple NumaRegionInfo that contain the same value. We will introduce the validation check in vm configuration part PR.

@studychao
Copy link
Member Author

Hi @jodh-intel thanks a lot for your comments. Your advice are very important to us.
I'll address your comments soon. :)

@studychao
Copy link
Member Author

@jodh-intel Thanks for your review, I have addressed the latest comments.
Please take a look at them when you are free:)

@studychao
Copy link
Member Author

ooi, are you planning to add tracing to dragonball?

For the tracing question, I am not sure what you mean by tracing here. Does that refer to tracing rust lib or the support for tracing tools like GDB?

If you mean GDB, we have achieved the GDB support for Dragonball but it will not be included in the first stage of Dragonball. It could be added in the future stages.

For tracing rust lib, we don't have support for it.

Hope that helps and please let me know if I misunderstood anything:)

@studychao
Copy link
Member Author

/test

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @studychao. One query regarding that default issue, but other than that...

lgtm

/// Invalid address space operation.
#[error("invalid address space operation")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. That seems a bit pointless if there is an #[error], but I guess we have to keep the compiler happy ;)

Comment on lines 40 to 41
ConfigInfos {
configs: Vec::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? I'm not seeing that...?

@studychao
Copy link
Member Author

@jodh-intel Thanks for your review:)
Sorry that I forgot to push the default issue-related code ... Now I have pushed the code and please take a look at it.

wllenyj and others added 14 commits June 11, 2022 17:21
The dragonball crate initial commit that includes dragonball README and
basic code structure.

Fixes: kata-containers#4257

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Resource manager manages all resources of a virtual machine instance.

Fixes: kata-containers#4257

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Address space abstraction to manage virtual machine's physical address space.
The AddressSpaceMgr Struct to manage address space.

Fixes: kata-containers#4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Console manager to manage frontend and backend console devcies.

A virtual console are composed up of two parts: frontend in virtual
machine and backend in host OS. A frontend may be serial port,
virtio-console etc, a backend may be stdio or Unix domain socket. The
manager connects the frontend with the backend.

Fixes: kata-containers#4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
The legacy devices manager is used for managing legacy devices.

Fixes: kata-containers#4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
It is used for managing a group of configuration information.

Fixes: kata-containers#4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
It is used for holding guest kernel configuration information.

Fixes: kata-containers#4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Device manager to manage IO devices for a virtual machine. And added
DeviceManagerTx to provide operation transaction for device management,
added DeviceManagerContext to operation context for device management.

Fixes: kata-containers#4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Added VsockDeviceMgr struct to manage all vsock devices.

Fixes: kata-containers#4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Update Dragonball Readme to fix style problem and add github issue for
TODOs.

Add document for devices in dragonball. This is the document for the
current dragonball device status and we'll keep updating it when we
introduce more devices in later pull requets.

Fixes: kata-containers#4257

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Currently supported: build, clippy, check, format, test, clean

Fixes: kata-containers#4257

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Revert this patch, after dragonball-sandbox is ready. And all
subsequent implementations are submitted.

Fixes: kata-containers#4257

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
add dragonball description into kata README to help introduce dragonball
sandbox.

Fixes: kata-containers#4257

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
add THIRD-PARTY file to add license for crosvm.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
@katacontainersbot katacontainersbot added the size/huge Largest and most complex task (probably needs breaking into small pieces) label Jun 11, 2022
@studychao studychao force-pushed the anolis/dragonball-1 branch 2 times, most recently from ebbd3ed to 9199e35 Compare June 11, 2022 14:49
fix clippy warnings in safe-path lib to make clippy happy.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
@liubin liubin merged commit f23d709 into kata-containers:runtime-rs Jun 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
runtime-rs size/huge Largest and most complex task (probably needs breaking into small pieces)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants