Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Networking metrics are being swapped #9129

Closed
feitnomore opened this issue Jan 31, 2023 · 9 comments
Closed

Networking metrics are being swapped #9129

feitnomore opened this issue Jan 31, 2023 · 9 comments
Labels

Comments

@feitnomore
Copy link
Contributor

What happened:
When using Prometheus to monitor VMI metrics, kubevirt_vmi_network_receive_bytes_total and kubevirt_vmi_network_transmit_bytes_total are being swapped. I've performed a simple test on a lab environment by downloading a simple Ubuntu image twice:

ubuntu-download

Each file is 3.89GB, with gives a total of ~7.5GB. This is the result I see in Prometheus:

prometheus-metrics

As you will notice, kubevirt_vmi_network_transmit_bytes_total went from 18 to 27GB while kubevirt_vmi_network_receive_bytes_total went from 2.81 to 2.88GB.

What you expected to happen:
As the VMI downloaded 7.5GB, the expectation was to see kubevirt_vmi_network_receive_bytes_total increase, and not the opposite.

How to reproduce it (as minimally and precisely as possible):
This can be reproduced by downloading any file inside a VMI an checking the metrics. I was able to reproduce on my LAB as well as using make cluster-up && make cluster-sync using the vm-cirros example provided with the code.

Additional context:
The problem seems to be related to this snippet:

        /* The TX/RX fields appear to be swapped here
         * because this is the host view. */
        if (STREQ(key, "rx_bytes")) {
            stats->tx_bytes = val;
        } else if (STREQ(key, "rx_packets")) {
            stats->tx_packets = val;
        } else if (STREQ(key, "rx_errors")) {
            stats->tx_errs = val;
        } else if (STREQ(key, "rx_dropped")) {
            stats->tx_drop = val;
        } else if (STREQ(key, "tx_bytes")) {
            stats->rx_bytes = val;
        } else if (STREQ(key, "tx_packets")) {
            stats->rx_packets = val;
        } else if (STREQ(key, "tx_errors")) {
            stats->rx_errs = val;
        } else if (STREQ(key, "tx_dropped")) {
            stats->rx_drop = val;
        } else {
            VIR_DEBUG("Unused ovs-vsctl stat key=%s val=%lld", key, val);
        }

I am not sure if we want to see the host perspective or the vmi perspective. The metric name suggests vmi. Maybe introduce a host metric with the actual values, and swap the vmi metrics? Also, while discussing this issue with @akalenyu, looks like the values are not always swapped.

Thanks

@andreabolognani
Copy link
Contributor

@feitnomore after some investigation with @akalenyu, I believe the behavior you're seeing is caused by the libvirt bug that's been addressed with libvirt commit 0862cb3c. The main branch of KubeVirt uses a version of libvirt that contains the fix, which the 0.58 release doesn't. Can you please try deploying KubeVirt from main and confirm that it behaves correctly in your environment? Thanks!

@feitnomore
Copy link
Contributor Author

I have no chance to test it again now, but as I explained on the initial comment, I've tested on the main like a week maybe 10 days ago:

How to reproduce it (as minimally and precisely as possible):
This can be reproduced by downloading any file inside a VMI an checking the metrics. I was able to reproduce on my LAB as well as using make cluster-up && make cluster-sync using the vm-cirros example provided with the code.

@akalenyu
Copy link
Contributor

akalenyu commented Feb 8, 2023

I have no chance to test it again now, but as I explained on the initial comment, I've tested on the main like a week maybe 10 days ago:

How to reproduce it (as minimally and precisely as possible): This can be reproduced by downloading any file inside a VMI an checking the metrics. I was able to reproduce on my LAB as well as using make cluster-up && make cluster-sync using the vm-cirros example provided with the code.

So apparently a libvirt bump happened recently by #8774 so that may explain why
you saw this fail on main. Right now I am failing to reproduce this issue on main

@feitnomore
Copy link
Contributor Author

Yep, I've tested again and now it looks good on main, after pulling the repo again.
Thanks!

@feitnomore feitnomore closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2023
@akalenyu
Copy link
Contributor

akalenyu commented Feb 8, 2023

/reopen
want to close as "completed"

@kubevirt-bot
Copy link
Contributor

@akalenyu: Reopened this issue.

In response to this:

/reopen
want to close as "completed"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot kubevirt-bot reopened this Feb 8, 2023
@akalenyu
Copy link
Contributor

akalenyu commented Feb 8, 2023

/close

@kubevirt-bot
Copy link
Contributor

@akalenyu: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@akalenyu
Copy link
Contributor

akalenyu commented Feb 8, 2023

/cc @sradco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants