-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ena NIC msi_irq is empty on ubuntu kernel 5.19 - causing a severe throughput degradation for i3.4xlarge (and above?) #13560
Comments
@raphaelsc / @bhalevy my suspicion is that it's something related to compactions, looks like the compactions are not "steady" during the entire write and the effect is severe. @aleksbykov let's try to bisect it to a smaller range. |
The reactor stalls are relatively short. For example, on node1:
And the stalls look similar to #13160 |
It's doesn't appear to be just an infinite loop on shard 0 or something, because the extra load happens both in the write phase and the read phase of the test, but not in between. |
I'll check if it happens locally. If not, it's probably a test setup issue. If yes, I'll bisect it. |
It doesn't happen on my PC with 1da0270. |
This is a kernel and/or i3.4xlarge regression (edit: I didn't test other instance types). The NIC doesn't report its IRQ numbers for some reason ( I don't know what we can do about this. |
@roydahan Compactions being bursty is an effect, not a cause. Since shards other than 0 are very underloaded, they are able to devote most of their CPU to compactions (normally they would devote a small but smoothly growing fraction of their time), and complete them in a fast burst. |
@aleksbykov @michoecho please, send the following info from the node where you saw the above:
I strongly doubt there is any kernel issue here. Based on the perftune output there wasn't any IRQ detected, meaning the NIC in question wasn't exposing either MSI-X, or MSI, or INT#x vectors to the guest OS. |
|
Looks legit. How I can get access to that VM? |
It's not some particular VM. Just launch ami-0501eb17c8c79b6d2 (us-east-1) on i3.4xlarge. Edit: mine is already shut down, so I can't give you access to it to save you the effort. |
This statement is incorrect in general, @michoecho. The actual amount of CPU out of those 100% used by compactions depends on other running contexts and relative amount of shares compared to compaction ones. |
Yes. By "normally" I meant "in a regular run of this particular test", not in general. |
Didn't we agree to remain on 5.15? (https://www.omgubuntu.co.uk/2022/01/ubuntu-22-04-lts-will-use-linux-5-15-kernel ) - how did you get 5.19? |
Latest succesful run is with : Scylla version 5.3.0~dev-0.20230325.e8fb718e4ad4 with build-id 6eed28a1ac2addc02aceea60af4d6ee4acd56955 PASSED ami-078e6867d914fbfb0
First failed run is with: Scylla version 5.3.0-dev-0.20230328.c7131a05741d with build-id 6358d7ada913b1dfc96849ddb519b7a243afe0bd FAILED ami-020e718640eafe444 between them we don't have AMI. for Scylla version 5.3.0 - dev-0.20230328.c7131a05741d with build-id 6358d7ada913b1dfc96849ddb519b7a243afe0bd
scyllaadm@perf-regression-latency-ubuntu-db-node-a292366c-1:~$ uname -a
|
@yaronkaikov - Ubuntu 22.0.4 LTS kernel should be 5.15, with an optional, for HW enablement, 5.19 - but we've never moved to 5.19 explicitly - how come the AMI uses it? |
@vladzcloudius - do we have any tests for perftune that can catch such issues? |
Since we are using the latest image available , we don't pin the kernel version |
@mykaul I guess our AMIs use the |
@yaronkaikov - this might be the issue. We were supposed to keep using LTS. |
I just verified that the current |
Still, we need to report the problem, so it gets fixed before we're forced to move to a newer kernel. Also users may be using that newer kernel. |
Tested a smaller i4i instance (2xl) - still works:
|
Closed with scylladb/scylla-machine-image#443 |
It's not closed, in the sense that we do need/want to support that 5.19 kernel. |
So perhaps it is an Ubuntu issue (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2016991 ) after all? |
It's definitely not a Ubuntu issue. If it was it would not work on i4i and with older (Amazon!) kernel. It's quite obviously an Amazon kernel bug. They (Amazon) have to fix it. |
@yaronkaikov @mykaul Are you sure it's safe to not use Amazon's kernels that AFAIU come with Amazon's vanilla Ubuntu AMIs? Are you sure Ubuntu's LTS kernels are certified by Amazon? |
I think it's safe. I also slightly prefer having a single kernel version (as much as possible) across cloud providers (you could ask also about serverless - we have haven't decided yet - https://github.com/scylladb/serverless-issues/issues/11 ) I could not find a 'certified by Amazon', but I assume it's because Ubuntu Pro might be. |
See here: They confirm that "Ubuntu EC2 Variant" would work reliably. Also see this: https://ubuntu.com/blog/introducing-the-ubuntu-aws-rolling-kernel-2 Bottom line, vanilla LTS kernels are not safe on Ec2 AMIs. There are many reasons for that: there are a few AWS specific device drivers and Xen related bits that are only relevant for Ec2 users. And hence (I guess) they are maintained in a much faster way in this kernel stream than in a mainline LTS stream. |
Where is your question coming from? Isn't the PR you linked using Amazon's LTS kernel (linux-aws-lts-22.04), not Ubuntu's vanilla LTS kernel? |
It comes from me missing this fact. ;) I assumed a vanilla LTS is used. Thanks for pointing this out, @michoecho. @yaronkaikov have we verified that a latest |
I have (#13560 (comment)), but a sanity check from someone else would be appreciated. |
First of all - great! |
i didn't , it's part of the |
This specific issue was fixed with 5.2.0-rc5, by ensuring we use kernel 5.15 which doesn't suffer from it (and is truly LTS, btw). There's a separate issue (in seastar, and elsewhere) to track the Linux kernel regression which should be fixed. Therefore, closing as completed. |
@mykaul there is nothing to fix in seastar in this context. cc @syuu1228 |
|
No, we don't. It's a bogus state of the kernel from our perspective.
I know (hence not re-opening this one). But what you probably want is to have some tracking for the next time we upgrade the kernel in the AMI. BTW, upgrading to the "latest LTS" is a risk too as I had explained some time ago. And to do this you need to pin the kernel. And when you upgrade you must make sure to upgrade SC installations to that kernel too. Tomer has recently performed an investigation about kernels' version in SC and it's nothing but terrible. We really have to take it under the control and upgrading to the "latest LTS" every time you build a new AMI is not helping us to get there...
|
I reopened the issue because in GCP we still have kernel 5.19 and we need to verify for several types of instances if there is an issue or not. @vladzcloudius we need your help how to test it and identify in the easiest way where the IRQ is set correctly and where not. |
The issue does not exist simply because the ENA driver is AWS specific. |
The overload of shard 0 was an accidental effect, not a direct and predictable result of the bug. So checking for the bug via black box performance tests is a bad idea. (But of course if the issue is AWS-specific then there is no reason to do anything about it for GCP). |
Of course - we can look at the interrupt mapping, which was supposed to, but did not, happen. |
For this particular issue (a kernel bug) it shows up as an empty content of the When the kernel is working correctly both in AWS and in GCP this directory has to have files with names as IRQs of the corresponding NIC. Like I shown here: #13560 (comment) I believe all you need to verify in the context of this GH issue is that the above holds. As to other perftune.py testing - let's have a different context for that (GH issues?) and I'll be happy to help. |
@vponomaryov verified that the IRQs are showing on GCP n1-highmem-8, n1-highmem-16, n1-highmem-32 |
Installation details
Scylla version (or git commit hash): 5.3.0~dev-0.20230415.1da02706ddb8 with build-id f7ac5cd90e63ace5065c583d6d1d9c381f39b5c2
Cluster size: 3
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0501eb17c8c79b6d2
Performance latency test jobs:
Performance latency 1tb test run prepare command to populate dataset with size 1TB with c-s command:
All cassandra stress commands run with cl=ALL.
in 30 minutes after prepare stress-commands started, all nodes start reported a lot of Reactor stalls about 6-20 ms and lsa-time records to log. Decoded reactor stalled attached:reactor_stalls_decoded_nodes.zip
on monitoring: http://3.237.101.163:3000/d/sZoKwKP4k/scylla-enterprise-perf-regression-latency-shard-aware-1tb-test-scylla-per-server-metrics-nemesis-master?orgId=1&from=1681736652253&to=1681773381114
we see next problems for compactions:
VS latest successfull run for scylla: 5.3.0~dev-0.20230316.5705df77a155
and next for 'Writes currently blocked on dirty'
VS latest successfull run for scylla: 5.3.0~dev-0.20230316.5705df77a155
Latest successful run is for scylla: 5.3.0~dev-0.20230316.5705df77a155
job: https://jenkins.scylladb.com/view/New%20Performance%20Jobs/job/scylla-master/job/scylla-master-perf-regression-latency-shard-aware-1TB-test/14
issue start appeared from: 5.3.0
dev-0.20230331.160c184d0b0f with build-id 13d781a3205d092514f6642c9787a566aba7c110dev-0.20230316.5705df77a155latest successfull run: 5.3.0
Possible scylla commit which bring the issue: 472b155
DB logs: https://cloudius-jenkins-test.s3.amazonaws.com/540d1d32-9700-4c1c-aecc-168260285dd3/20230417_225030/db-cluster-540d1d32.tar.gz
The text was updated successfully, but these errors were encountered: