Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot expose ambassador app, node had condition DiskPression #129

Closed
camille-rodriguez opened this issue Nov 4, 2019 · 2 comments
Closed

Comments

@camille-rodriguez
Copy link

Hi,
I deployed CDK on AWS using the bundle. All well. I deployed then kubeflow on top of it and I ran into some issues with the script (see my other bug) and I've worked around them manually. Now, when I try to expose the ambassador app to access the kubeflow dashboard, the first ambassador-auth unit goes into this error: The node was low on resource: ephemeral-storage. Container ambassador was using 93222, which exceeds its request of 0., and all the other ambassador-auth units go into error with the error message Pod The node had condition: [DiskPressure].. It also looks like because of this error, juju tried to scale out and spawned 8 more units of the ambassador-auth, and they all ended up in the same state. The machines used for this are used only for this test, nothing else runs on them.

How can I resolve this issue? I am unable to deploy fully kubeflow at the moment.

My CDK status
image

juju kubeflow model status
image
image

kubectl status
image

Something weird too that I do not know if it is due to kubeflow or juju..probably juju. But I tried to scale down the number of units for ambassador-auth and it did the opposite.

$ juju scale-application ambassador-auth 2
ambassador-auth scaled to 2 units

gave that result... I have 17 units in error now o.o
image

@knkski
Copy link
Contributor

knkski commented Nov 5, 2019

@camille-rodriguez: Can you try adding a CDK worker node with a decent amount (100G+) of disk space, then retrying the deploy? This looks like none of the worker nodes have enough disk space.

@camille-rodriguez
Copy link
Author

I followed the workaround in the other issue I opened, and everything deployed fine this time. This bug was probably an aftermath of the other issues I faced. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants