-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stargate fails to start because of too many open files #1286
Comments
Huh, well that's a new one. We run some things in the GitHub actions free tier as well without a problem, granted it is with fewer nodes. The odd thing is that file descriptors shouldn't be heavily consumed until the services start taking traffic. In a resource constrained environment I'd expect the error you're seeing to occur under load rather than on start up. We can take a look at dropwizard to see if there's anything we can tune. Although I wonder if your runner had a noisy neighbor? |
We have only seen this 2 or 3 times so maybe it is noisy neighbors.
On Wed, Sep 29, 2021 at 8:01 PM Doug Wettlaufer ***@***.***> wrote:
Huh, well that's a new one. We run some things in the GitHub actions free
tier as well without a problem, granted it is with fewer nodes.
The odd thing is that file descriptors shouldn't be heavily consumed until
the services start taking traffic. In a resource constrained environment
I'd expect the error you're seeing to occur under load rather than on start
up.
We can take a look at dropwizard to see if there's anything we can tune.
Although I wonder if your runner had a noisy neighbor?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1286 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABJBOMWLS44U2QC5KFTWXTUEOSGVANCNFSM5FAGY5DA>
.
--
- John
|
@jsanda @Miles-Garnsey Maybe what we can do is attempt these tests on the self-hosted runner we're going to setup and keep the file descriptor limits at the defaults and see if we run into this problem there, that would help rule out the noisy neighbor problem? |
The file limit error has only happened a few times. We can run the test N times on a self-hosted runner without the error happening. That doesn't mean it won't happen, but it does give increased confidence. We have been deploying nodes with heaps configured as low as 256 MB and 384 MB. Surprisingly that works fine a lot of the time, but we have issues too often. The issues are not limited to this open file limit error. The situation is like a game of Jenga :) |
@dougwettlaufer @jsanda How about creating a small fix here by adding a Doug, I think we don't need bundles watching by default, but this might be a breaking change.. So we could also go with UPDATE: even simpler solution, let's have that |
I'm just curious here, what is the bundle watching used for @ivansenic ? |
With OSGi you can replace bundles during the runtime. Meaning you can paste a new version of a jar to the folder we are watching and you would in runtime update that specific bundle with new version. |
Makes sense, I guess I was more wondering if that's something that is often done with Stargate? Just trying to gauge for example, is that an option we'd want to see exposed through K8ssandra or would it be sufficient just to turn it off by default when deployed through K8ssandra. |
If you ask me, and you do 😄, I would say it should be turned off in Kubernetes. I mean this is old tech, developed for monoliths and actually this bundle reloading was a way to achieve something you would nowadays do in the cloud. You have a new version, no problem, deploy. In fact, that's the whole benefit of the cloud-native development that you can deploy as much times as you want. |
Right, watching the directory is to enable the hot-reload use case which really doesn't apply in the cloud. How about we add the |
With a fix for stargate/stargate#1286 I think we might be able to reduce resource requirements in our test fixtures. This will be helpful for freeing up limiting resources in the GHA free runner.
With a fix for stargate/stargate#1286 I think we might be able to reduce resource requirements in our test fixtures. This will be helpful for freeing up limiting resources in the GHA free runner.
* reduce on and off heap memory for C* and Stargate With a fix for stargate/stargate#1286 I think we might be able to reduce resource requirements in our test fixtures. This will be helpful for freeing up limiting resources in the GHA free runner. * add comment explaining the usage of the custom image
I am running Stargate 1.0.31 in Kubernetes with K8ssandra. The Stargate image used is stargateio/stargate-3_11:v1.0.31. In one of our automated tests we have seen Stargate fail to start a few times with this a resource limit error like this:
This is in a CI environment with limited cpu/memory resources. The test is running in the free tier runner in GitHubActions. The runner vm has 2 cpus and 7 GB memory. The particular test in which this failed had already deployed two Cassandra nodes and one Stargate node. This failure is from the second Stargate node.
I believe the open file limit on the vm is set to 65536. I don't think I am able to increase it. Maybe the solution is to run my tests in an environment with more resources, but it would be nice if Stargate could less demanding especially considering this happens on startup.
The text was updated successfully, but these errors were encountered: