Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QEMU orchestrator: implement support for remote hosts #1760

Merged
merged 12 commits into from
Mar 3, 2022

Conversation

anirudhrb
Copy link
Contributor

@anirudhrb anirudhrb commented Feb 21, 2022

The current qemu orchestrator treats the local machine as the host for the VMs. Add support to allow specifying a remote
host in the runbook. The remote host should have libvirtd running and configured to allow remote connections. VMs under test are spawned on this remote host.

Example runbook for specifying a remote host

name: qemu default
...<snip>...
platform:
  - type: qemu
    admin_private_key_file: $(admin_private_key_file)
    keep_environment: $(keep_environment)
    qemu:
      hosts:
        - address: "10.77.0.5"
          username: "anirudh"
          private_key_file: $(admin_private_key_file)
    requirement:
      qemu:
        qcow2: $(qcow2)
        cloud_init:
          extra_user_data: $(extra_user_data)

In case host is not specified in the runbook, the local machine is treated as host (existing behavior).

@squirrelsc
Copy link
Member

@cwize1 Can you have a look on this change too?

node_addr = address
node_port = 22
if self.qemu_platform_runbook.is_host_remote():
self.host_node.tools[Iptables].start_forwarding(10022, address, 22)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like it'll support testing multiple VMs on the same host.

What I recommend you do is allow the user to set the libvirt network that the VM connects to. Then the user can setup their own network bridge on the host, create a libvirt "network" that points to that bridge, and then set tell LISA to use that "network" for the VMs. Then the test runner will be able to access the VMs directly without needing to deal with port forwarding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the consistent way for local and remote hosts. It will simplify the code logic. The only difference is the localhost doesn't need connection info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "the consistent way"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of port forwarding, we do have special logic for remote hosts (set up iptable rules for forwarding). So consistent way is not possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I think it can make the logic simpler, if the port forwarding or other approaches are used on local too.
If to use port forwarding, one way is to set a start port range from high end like 30000, and map the ports one by one incrementally, if a port is unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to figure out the bridged network thing. I had tried it at first but it didn't work. I am inclined to doing it in a follow up PR.

Just to understand, in what cases do we need multiple VMs on the same host? When I do implement it how do I test it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create a simple test function with the following annotation:

@TestCaseMetadata(
    description="",
    requirement=node_requirement(
        node=schema.NodeSpace(
            node_count=2,
        )
    ),
)

There are a bunch of "Microsoft suite" tests that use multiple nodes, I think mainly around testing networking. I am using it to test multi-node Kubernetes clusters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwize1 The field schema.Environment.topology is used to differentiate different network topology, but it's not used so far. The value is always subnet. Please let me know, if you need to test different topology, we can discuss the requirement, and think about how to support it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squirrelsc Unfortunately, unlike with Hyper-V which has a fairly unified networking API, libvirt/QEMU doesn't provide much assistance for networking. Unless you are doing something super simple (e.g. using the default NAT network), then a developer has to manually create the network outside of libvirt and then point the libvirt VM at that network. Typically, either Linux kernel bridges or OVS (Open vSwitch) are used but both have their challenges. OVS is a big piece of software that is complicated to use. And every Linux distro has their own networking manager API that you have to us to provision networks, such as kernel bridges. I don't think it is worth trying to pull that complexity into LISA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the thoughts. IMO, it depends on test scenarios, and doesn't need to fully automate all things. The lab environments is complex, and it's acceptable for preconfigured steps. If some servers are setup with some topologies, LISA just needs to use it, and keep it unchanged. The LISA needs to be aware of the topology, and assign test cases by its requirements. If the different topology tests is needed from a test case, the platform checks the settings (from runbook) of each server, and then allocate VMs from matched servers. The ADO agents supports it by "capabilities". The capabilities just a couple of key/value pairs, it's very similar like what I want to do.

Copy link
Contributor

@cwize1 cwize1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Contributor

@cwize1 cwize1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@squirrelsc
Copy link
Member

BTW, please update document for the new schema.

Copy link
Contributor

@cwize1 cwize1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@@ -280,12 +317,21 @@ def _configure_nodes(self, environment: Environment, log: Logger) -> None:
node_context.cloud_init_file_path = os.path.join(
vm_disks_dir, f"{node_context.vm_name}-cloud-init.iso"
)
node_context.os_disk_base_file_path = qemu_node_runbook.qcow2

if self.host_node.is_remote:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the is_remote is incorrect here, and always be True, because it's the method itself. The is_remote() is right.
  2. I'm not sure if the "remote" check is useful here. Can remote and local use the same folder structure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is_remote is correct based on usages in other files. is_remote() throws the error "bool is not callable".
  2. This is an optimization. We avoid copying OS disk image for local node. That's why remote check is useful here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_remote is marked with @property. So, it behaves like a C# style property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm confused on the LibVirtHost, I thought its type is LibVirtHost. BTW, what's the reason it needs a LibVirtHost schema? Can it reuse the RemoteNode schema?

Copy link
Contributor Author

@anirudhrb anirudhrb Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some utility in having a separate schema because we could have libvirt specific fields in it. Right now we have the lisa_working_dir property that is not part of RemoteNode. In the future we could have a supported_hypervisors property for example to indicate the hypervisors (qemu, ch etc) that the host supports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Qemu can inherit the RemoteNode, so it's easy to obtain the general supports of node. The Node has working_path, and it can be overwritten to read from the schema.

lisa/tools/qemu_img.py Outdated Show resolved Hide resolved
Copy link
Contributor

@cwize1 cwize1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

# The directory where lisa will store VM related files (such as disk images).
# This directory must already exist and the test user should have write permission
# to it.
lisa_working_dir: str = "/var/tmp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to leave the default as the base VM image's directory for the local case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it could be done if needed. Is it for backward compatibility?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running on a local dev box, it is nice to be able to easily see all the files created, particularly for cases where you keep the environment after the test completes/fails (e.g. while debugging).

Copy link
Contributor

@cwize1 cwize1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@squirrelsc squirrelsc merged commit 9e72070 into microsoft:main Mar 3, 2022
@anirudhrb anirudhrb deleted the qemu_remote_host_support branch December 22, 2023 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants