New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] IpFilteringUpdateTests testThatInvalidDynamicIpFilterConfigurationIsRejected failing #102349
Comments
Pinging @elastic/es-security (Team:Security) |
The assessment is the same ( These tests started failing recently with the same cause. I'm wondering if this is somehow related to the recent migration to Buildkite? |
The issue might have persisted even before the migration to Buildkite. Wondering if it surfaced now because many jobs are now grouped together under one pipeline? |
They still all run on independent hosts though. There's been no change to the number of tests that run on any given agent. |
This issue is same in nature as #102870 and #101615, we try to bind to a range of 100 ports and fail. Some observations:
After some searching I think that the issue is same as described here: docker/for-win#3171, where some high ports from dynamic range are being excluded on Windows. Attempting to bind to some of these ports would fail with Potential workarounds/solutions:
Another issue, but not related to this problem:
|
@brianseeders does https://github.com/elastic/elasticsearch-infra/tree/master/buildkite-tools work for Windows agents? |
It does now! I just got my local changes for it fixed up and pushed. @slobodanadamovic see: This will create a Windows instance in GCP for you, configured the same way as the Buildkite worker. It will SSH into the box for you as well. Then, you can:
|
First, I've had issue with creating an instance due to my username containing
Now, the
Any idea what I'm missing here? |
Ok, now after I've executed manually SSH command ( Running
After that, I was able to manually hardcode these port ranges in the
|
I'm thinking to take a naive approach and simply increase the range to 500 ports (out of 16383) in these tests. |
I think the issue with having larger port ranges is that we increase the likelihood of collisions between tests. Do we know why those ports are being excluded? Is there something we can change on our CI agents such that it no longer reserves these ports? |
Unfortunately, I wasn't able to trace down why or who excluded these ports.
AFAIK I don't think that this would be an issue in these tests. The collisions are anticipated and handled as long we can find at least 1 available TCP port. The binding is done with the first available port in the range. If we are unable to find a free port in this range, then we would fail with elasticsearch/server/src/main/java/org/elasticsearch/transport/TcpTransport.java Lines 490 to 506 in 64a7900
I would even argue that setting a full dynamic ports range (49152 - 65535) would be the safest here and avoid these failures in case of a high number of tests. Downside is that this is not optimal since all these tests would start iterating from 49152 until they find a first available port. Hence, I think that increasing a range to 500 (and still selecting it randomly) would not be an issue here. |
Fair enough. Can we limit this to only Windows? Since this issue seems to be restricted to that platform, and we run fewer concurrent tests on Windows vs Linux as well. |
@brianseeders any thoughts on what might be reserving these ports on Windows? Have you observed this before? |
No idea. I can help figure it out when I get back, though. You can also RDP into the machines if you open them in the GCP console and go to Connect, if that is helpful |
This PR increases client's port ranges for tests which are executed on Windows in order to avoid failures due to some port ranges being excluded from use. The larger ports range (300) is chosen based on the observation where a random consecutive range of 200 ports is excluded. Closes elastic#102349
…03894) This PR increases client's port ranges for tests which are executed on Windows in order to avoid failures due to some port ranges being excluded from use. The larger ports range (300) is chosen based on the observation where a random consecutive range of 200 ports can be excluded on Windows test workers. Closes #102349
…astic#103894) This PR increases client's port ranges for tests which are executed on Windows in order to avoid failures due to some port ranges being excluded from use. The larger ports range (300) is chosen based on the observation where a random consecutive range of 200 ports can be excluded on Windows test workers. Closes elastic#102349
…astic#103894) This PR increases client's port ranges for tests which are executed on Windows in order to avoid failures due to some port ranges being excluded from use. The larger ports range (300) is chosen based on the observation where a random consecutive range of 200 ports can be excluded on Windows test workers. Closes elastic#102349 (cherry picked from commit bdf5c7f) # Conflicts: # modules/transport-netty4/src/internalClusterTest/java/org/elasticsearch/transport/netty4/Netty4TransportMultiPortIntegrationIT.java # x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/transport/filter/IpFilteringIntegrationTests.java
…03894) (#103910) This PR increases client's port ranges for tests which are executed on Windows in order to avoid failures due to some port ranges being excluded from use. The larger ports range (300) is chosen based on the observation where a random consecutive range of 200 ports can be excluded on Windows test workers. Closes #102349
…ows (#103894) (#103914) * [Test] Use larger client ports range for tests running on Windows (#103894) This PR increases client's port ranges for tests which are executed on Windows in order to avoid failures due to some port ranges being excluded from use. The larger ports range (300) is chosen based on the observation where a random consecutive range of 200 ports can be excluded on Windows test workers. Closes #102349 (cherry picked from commit bdf5c7f) # Conflicts: # modules/transport-netty4/src/internalClusterTest/java/org/elasticsearch/transport/netty4/Netty4TransportMultiPortIntegrationIT.java # x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/transport/filter/IpFilteringIntegrationTests.java * Fix compilation error
…astic#103894) This PR increases client's port ranges for tests which are executed on Windows in order to avoid failures due to some port ranges being excluded from use. The larger ports range (300) is chosen based on the observation where a random consecutive range of 200 ports can be excluded on Windows test workers. Closes elastic#102349
Looks like these tests are trying to use an occupied port. We have various logic in our base test classes to ensure this doesn't happen but sounds like this test is doing something unique that's making this condition more likely. This happens across all tests in
IpFilteringUpdateTests
and I see similar errors inSslMultiPortTests
as well.Build scan:
https://gradle-enterprise.elastic.co/s/jiktat3ht3ye6/tests/:x-pack:plugin:security:internalClusterTest/org.elasticsearch.xpack.security.transport.filter.IpFilteringUpdateTests/testThatInvalidDynamicIpFilterConfigurationIsRejected
Reproduction line:
Applicable branches:
main, 8.11, 7.17
Reproduces locally?:
Didn't try
Failure history:
https://es-delivery-stats.elastic.dev/app/dashboards#/view/dcec9e60-72ac-11ee-8f39-55975ded9e63?_g=(refreshInterval:(pause:!t,value:60000),time:(from:now-7d%2Fd,to:now))&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testThatInvalidDynamicIpFilterConfigurationIsRejected'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.xpack.security.transport.filter.IpFilteringUpdateTests'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium))))
Failure excerpt:
The text was updated successfully, but these errors were encountered: