-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance degradation after fixing #95 #100
Comments
Why is that causing performance degradation? The underlying object is likely
Agreed. Though a better implementation of virtio-net could avoid the copies by doing |
@bonzini We're looking at using |
This doesn't work due to TAP API limitations. |
I looked at this issue some more and i'd like to propose that it comes down to a limitation of the current vm-memory API. The So either we need a |
But the atomicity is not the reason for the performance decrease, the current code is faster than a stupid byte-by-byte copy. The reason why performance got worse is that the system memcpy does not guarantee atomicity but, if it did, the new code is likely slower. |
@bonzini My argument is that we don't need atomicity all the time only for specific operations and hence why there needs to be atomic and non-atomic versions of the API. |
I agree on this direction. So we have two classes of interfaces:
|
When analyzing the disassembled code, it's really a little heavy by calling try_access() for every guest memory access. We should build quick path for those accesses which never cross the region boundary. |
No, that would be true if non-atomic versions would provide additional value. Right now the value would be speed, but that should not be the case with a properly optimized @jiangliu It should not call |
Hi! What do you think about enforcing (via the With this in place, we only have to check that a memory access takes place within a valid guest physical address range, before doing it in one go. Moreover, the valid ranges (potentially including multiple adjacent regions) can be precomputed to speed up future validations every time the guest memory layout changes. |
Hi again! Here goes another wall of text :D Now is a good time to polish and clear up some aspects around the For example, It looks like the primitives we're looking for are a set of methods similar to what In terms of validating assumptions, I wanted to start by asking what use cases are we targeting by allowing that certain guest memory ranges don't have to be backed by memory on the host (i.e. |
Currently we are using the vm-memory's interfaces in three typical ways:
So it may help to
|
This is the solution we use with Cloud Hypervisor that mitigates the performance drop: We only use the slower alignment checked write for copies <= size of usize: This means the same API can be used for small updates that must be atomic (like those for updating virtio queue offsets) and for large bulk copies where there is no expectation of that behaviour. |
I like @rbradford's solution very much, possibly extended to 16 bytes. |
That's a cool implementation! I think we should clear up some things at the interface level as well; opened #102 and would greatly appreciate if ppl can take a look. |
Where small objects are those objects that are less then the native data width for the platform. This ensure that volatile and alignment safe read/writes are used when updating structures that are sensitive to this such as virtio devices where the spec requires writes to be atomic. Fixes: cloud-hypervisor/cloud-hypervisor#1258 Fixes: rust-vmm#100 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Where small objects are those objects that are less then the native data width for the platform. This ensure that volatile and alignment safe read/writes are used when updating structures that are sensitive to this such as virtio devices where the spec requires writes to be atomic. Fixes: cloud-hypervisor/cloud-hypervisor#1258 Fixes: rust-vmm#100 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Where small objects are those objects that are less then the native data width for the platform. This ensure that volatile and alignment safe read/writes are used when updating structures that are sensitive to this such as virtio devices where the spec requires writes to be atomic. Fixes: cloud-hypervisor/cloud-hypervisor#1258 Fixes: #100 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When implementing the fix for #95, we introduced a performance degradation on (some?) glibc builds. On Firecracker with iperf3 and we observed a performance degradation of 5% for glibc builds.
On Cloud Hypervisor, the performance degradation is significantly worse. Observed impact is up to 50%. More details here: cloud-hypervisor/cloud-hypervisor#1258
Opening this issue so that we can decide on the next steps for fixing the performance degradation.
My 2-cents: I wouldn't want to introduce this fix only for x86_64 musl builds & aarch64 glibc & musl builds because we cannot know for sure what glibc versions are people using out there, hence we cannot know for sure that glibc is doing the right thing (i.e. optimizing memcpy at a higher granularity than 1 byte).
I would say that the underlying problem which is the reason for the performance degradation & the bug, is that we lose type information about the object that needs to written/read. I would rather like us to work on improving the interface so that type information doesn't need to be sort-of inferred (by checking alignments & reading/writing in the largest possible chunks). I would need to do some experiments before having a solution here.
Another thing that I believe is of paramount importance at this moment is to add performance testing to vm-memory. Pretty much every other function in vm-memory is on the critical performance path. We should make sure to not introduce regression here as we continue the development.
CC: @sboeuf @rbradford @sameo @alexandruag @serban300 @bonzini
The text was updated successfully, but these errors were encountered: