Refactor proposal: replace schema.go with libvirtxml types #10844

victortoso · 2023-12-06T11:18:35Z

Is your feature request related to a problem? Please describe:

The libvirt-go-xml-module (or libvirtxml for short) provides 100% coverage to Libvirt XML schemas. We should be using that instead of incrementally duplicating that module into schema.go.

Describe the solution you'd like:

If agreed, we should stop introducing new types to schema.go and start using libvirtxml whenever possible. We should also start moving existing types to libvirtxml one. This is a considerable amount of work but should remove the need of XML parsing as it'll all be covered by libvirtxml

Additional context:

There are real technical issues related to this request. See the migratableDomXML function, which makes us parse the XML all over.

The text was updated successfully, but these errors were encountered:

alicefr · 2023-12-06T11:56:04Z

The function migratableDomXML is hard to extend for new fields that needs to be added or modified for the migrate XML. An example, where this is needed is for storage migration where we need to add the slices section (see this RFE for more details)

victortoso · 2023-12-06T13:52:55Z

/cc @andreabolognani

alicefr · 2023-12-06T15:23:14Z

/assign @victortoso

berrange · 2023-12-06T17:11:04Z

The libvirt-go-xml-module (or libvirtxml for short) provides 100% coverage to Libvirt XML schemas.

And this is validated by testing that the libvirtxml structs can losslessly roundtrip xml -> obj -> xml, on all 3500 XML files in libvirt.git, so we have pretty high confidence in its behaviour.

victortoso · 2024-01-12T16:56:03Z

Moving here the discussion raised by @vladikr in #10847 (comment).

At the moment, I wanted to understand how much memory more would it be in comparison to KubeVirt's schema.go. The simplest way to do the comparison is to load the same XML to both data structs and do a comparison.

I created a small program to read xml from a folder and compute using go's testing.benchmark and also runtime.ReadMemStats (to see if there is a difference between both).

I've fetched all xmls from libvirt git repo, removed all the ones that were not complete (e.g: did not start with <domain>) and ran the program with them. The resulting file is a bit big as there was 2335 xmls.

The average says that libvirtxml is 20% bigger than KubeVirt's schema.go.

I've also tested with fewer (5) but more close to real KubeVirt's use case. 4 files from virtwrap's test and one obtained by virsh dumpxml after booting the VMI.

The average says libvirtxml is 17% bigger than KubeVirt's schema.go.

Taking a single example, the domain_x86_64, libvirtxml is 19% bigger, that is, libvirtxml uses 119KiB compared to schema.go's 93KiB.

berrange · 2024-01-12T17:14:10Z

Taking a single example, the domain_x86_64, libvirtxml is 19% bigger, that is, libvirtxml uses 119KiB compared to schema.go's 93KiB.

Looking at the size of these in isolation gives an unhelpful impression of the impact.

Lets say kubevirt was managing a tiny VM where QEMU's RAM was a mere 100 MB (realistically VMs are more likely to be 1GB in size or greater), and we'll ignore everything like like EDK2 RAM, and other random QEMU memory usage.

A difference of 119kb-93kb == 26 kb. In context of the 100 MB QEMU guest RAM, this is an overhead of 0.02%, or 0.002% for a 1 GB VM. QEMU's RAM usage can vary by more than 26kb from minute-to-minute as it responds to guest actions.

IOW, this difference is completely lost in the background in the big picture.

andreabolognani · 2024-01-12T17:21:57Z

@berrange I believe the idea is that this data is frequently sent back and forth between the various KubeVirt / Kubernetes components, so the additional traffic could be problematic. I don't know if 20% is enough to make a significant difference.

@victortoso thanks a lot for looking into this! I am confused by the results though. The example you have in the README, which I have replicated locally, is:

$ go run main.go --xml testdata/real-use-cases/vmi-pvc-clipboard.xml

unsafe.Sizeof() of each domain struct
 * libvirtxml: 520 bytes
 * kubevirt  : 920 bytes

compared:bytes is the percentage of libvirtxml in comparison to KubeVirt
e.g: Value: 111, means livirtxml's domain struct is 11% bigger than KubeVirt domain struct

                                         |                     | <------- libvirtxml ----> | <-------- kubevirt -----> |
                                         | compared: bytes (%) |  benchmark  |   runtime   |  benchmark  |   runtime   |
                                filename | benchmark | runtime | KiB | alloc | KiB | alloc | KiB | alloc | KiB | alloc |
                   vmi-pvc-clipboard.xml |       111 |     111 | 113 |  2970 | 113 |  2970 | 101 |  2867 | 101 |  2867 |
                                         |                     | <------- libvirtxml ----> | <-------- kubevirt -----> |
                                         | compared: bytes (%) |  benchmark  |   runtime   |  benchmark  |   runtime   |
                                filename | benchmark | runtime | KiB | alloc | KiB | alloc | KiB | alloc | KiB | alloc |

The average size of Libvirt : 113 KiB
The average size of Kubevirt: 101 KiB
In average, libvirtxml is 111 % compared to Kubevirt

If I'm reading that correctly, the actual struct size for libvirtxml is 57% the size, but somehow the amount of memory allocated while producing it is 111%? The first number intuitively doesn't make sense to me, as we surely are storing more information in the libvirtxml case, not half of it... So maybe we're just not implementing parsing as efficiently as possible, and there's room to improve that in a way that would benefit not just KubeVirt as potential consumer of the API, but other existing consumers as well?

victortoso · 2024-01-12T17:27:13Z

If I'm reading that correctly, the actual struct size for libvirtxml is 57% the size,

Yes. libvirtxml uses a lot of pointers, while schema.go doesn't. For an empty struct, that means libvirtxml is lighter.

but somehow the amount of memory allocated while producing it is 111%?

After loading the Domain XML, libvirtxml uses a little bit of memory, which makes sense if they don't discard anything while schema.go does.

The 111% is computed based on schema.go memory. So, in reality it is 11% more than schema.go

berrange · 2024-01-12T17:29:00Z

@berrange I believe the idea is that this data is frequently sent back and forth between the various KubeVirt / Kubernetes components, so the additional traffic could be problematic. I don't know if 20% is enough to make a significant difference.

For data transfer, surely the serialized document size is what matters, not the in memory structs for the parsed doc.

andreabolognani · 2024-01-12T17:34:40Z

Yes. libvirtxml uses a lot of pointers, while schema.go doesn't. For an empty struct, that means libvirtxml is lighter.

This seems to indicate that the value you're reporting only considers the "shallow" memory usage. If we want to make a fair comparison, we need to know the "deep" memory usage is, that is, including all data pointed to through however many levels of indirection.

For data transfer, surely the serialized document size is what matters, not the in memory structs for the parsed doc.

Right. But if libvirtxml deserializes/serializes the original document more accurately, that ought to increase the amount of data that ultimately travels on the wire.

I think we really need @vladikr to clarify what numbers are relevant to his concerns in order to make a useful assessment of the impact switching to libvirtxml would have.

alicefr · 2024-01-16T10:10:36Z

We could improve the data transferred by removing the data that aren't interesting for KubeVirt before the serialization (by setting to null the pointers).
Today, the filtering is achieved implicitly because KubeVirt structs don't define all the fields. IMO, it would be better to have this done in an explicit way rather then loosing data with the marshaling.

andreabolognani · 2024-01-16T11:13:55Z

We could improve the data transferred by removing the data that aren't interesting for KubeVirt before the serialization (by setting to null the pointers).

That could be a neat idea! I would however make sure that unnecessary fields are emptied right after deserialization instead. This would ensure that the KubeVirt logic only ever deals with structs where fields have been emptied; doing it the other way around might result in accidental loss of information.

Of course we should only really consider doing this if the additional overhead is really found to be unbearable. If that's not the case, simplicity beats raw performance IMO.

alicefr · 2024-01-16T11:32:07Z

We could improve the data transferred by removing the data that aren't interesting for KubeVirt before the serialization (by setting to null the pointers).

That could be a neat idea! I would however make sure that unnecessary fields are emptied right after deserialization instead. This would ensure that the KubeVirt logic only ever deals with structs where fields have been emptied; doing it the other way around might result in accidental loss of information.

Of course we should only really consider doing this if the additional overhead is really found to be unbearable. If that's not the case, simplicity beats raw performance IMO.

Sure, my point is that we can solve the memory overhead if this is really a concern.

EdDev · 2024-02-09T09:08:14Z

I have not seen evidence that we are optimizing for memory usage too much, nor that we got feedback of it from the field. If there is such feedback, please do share it.

The traffic that carries this information is in-node. Could you please clarify how a few 10 kilobytes will have any effect?

I find the option to offload maintenance of the schema to a dedicated project valuable for the long run.
This suggestion will benefit both projects.

victortoso · 2024-02-20T13:42:22Z

I had a quick offline discussion with @vladikr about this yesterday and he raised another concern. Vladik, please correct me if I misunderstood.

At the moment, by relying on a subset of libvirt fields, this gives to KubeVirt some control over what features we are using from libvirt. It makes easier to check and supervise. It would be easy to see a patch adding/changing libvirt domain.

By moving to libvirtxml, there would be less control, it makes it a little harder to tell which features are being used.

Overall, schema.go is used internally and we set it based on KubeVirt's API.

I think this concern is still solvable if really needed, just not sure on how to tackle it yet.

EdDev · 2024-02-20T15:12:58Z

How about this: To use both.

Target the custom one we use today to be used for virt-launcher <--> virt-handler communication.
Target the libvirtxml one to direct communication with libvirt.

virt-handler will not know about the full libvirt schema, only on the current custom one.
virt-launcher will know about both, using the custom one to send information to virt-handler.

I think this will solve both concerns and will allow us to reduce the size of the custom spec we use today. Something that was discussed in the past as well. Reducing memory, decoupling from libvirt config...

victortoso added the kind/enhancement label Dec 6, 2023

alicefr mentioned this issue Dec 6, 2023

Refactor migratableDomXML function using libvirtxml marshalling #10847

Merged

kubevirt-bot assigned victortoso Dec 6, 2023

This was referenced Feb 21, 2024

kubevirt-launcher panic when using hookSidecars to add vhostuser disk device #11276

Open

wip: Fix double call to hook's onDefineDomain #11324

Closed

victortoso mentioned this issue Feb 28, 2024

Proposal changes for hook sidecar Beta #11215

Open

3 tasks

victortoso mentioned this issue May 2, 2024

sidecar does not build out of tree #11817

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor proposal: replace schema.go with libvirtxml types #10844

Refactor proposal: replace schema.go with libvirtxml types #10844

victortoso commented Dec 6, 2023

alicefr commented Dec 6, 2023

victortoso commented Dec 6, 2023

alicefr commented Dec 6, 2023

berrange commented Dec 6, 2023

victortoso commented Jan 12, 2024 •

edited

berrange commented Jan 12, 2024

andreabolognani commented Jan 12, 2024

victortoso commented Jan 12, 2024

berrange commented Jan 12, 2024

andreabolognani commented Jan 12, 2024

alicefr commented Jan 16, 2024 •

edited

andreabolognani commented Jan 16, 2024

alicefr commented Jan 16, 2024

EdDev commented Feb 9, 2024

victortoso commented Feb 20, 2024

EdDev commented Feb 20, 2024 •

edited

Refactor proposal: replace schema.go with libvirtxml types #10844

Refactor proposal: replace schema.go with libvirtxml types #10844

Comments

victortoso commented Dec 6, 2023

alicefr commented Dec 6, 2023

victortoso commented Dec 6, 2023

alicefr commented Dec 6, 2023

berrange commented Dec 6, 2023

victortoso commented Jan 12, 2024 • edited

berrange commented Jan 12, 2024

andreabolognani commented Jan 12, 2024

victortoso commented Jan 12, 2024

berrange commented Jan 12, 2024

andreabolognani commented Jan 12, 2024

alicefr commented Jan 16, 2024 • edited

andreabolognani commented Jan 16, 2024

alicefr commented Jan 16, 2024

EdDev commented Feb 9, 2024

victortoso commented Feb 20, 2024

EdDev commented Feb 20, 2024 • edited

victortoso commented Jan 12, 2024 •

edited

alicefr commented Jan 16, 2024 •

edited

EdDev commented Feb 20, 2024 •

edited