-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: akash provider lease-shell stops working when pod gets restarted due to eviction #42
Comments
I thought I had looked at something similar in the past, let me see if I can find it |
Looks similar to issue akash-network/node#1480 . Maybe this got reintroduced somehow? |
maybe the fix wasn't applied to the master branch previously |
Reproduced on master branch, will have to track this down to see what is going on |
I mean was it fixed on |
It's hitting this line I think we need to filter out the pods that have failed & whatnot before trying to run the command https://github.com/ovrclk/akash/blob/master/provider/cluster/kube/client_exec.go#L100 |
@arno01 oh yeah now that I look at this are you sure the pod restarts? When I do this locally while watching the kubernetes cluster the pod moves to "completed". After a while the provider closes the lease because the containers aren't running. The kubernetes pod has a restart policy of always, but apparently that doesn't mean anything of the sort
I tried changing to "OnFailure" (since "Never" seems like a poor choice) but that gives me this error
|
@sacreman any suggestions here? |
@hydrogen18 I've tested this again just now: TL;DR Looks like that issue is isolated to a single provider - Europlots.
Evidence (Lumen)
Evidence (Europlots)
I've asked the provider for
Having that the namespace is same, there must be some issue on his side. Evidence (Akash.Pro)
|
I'm confused that we can't seem to reproduce this across all providers uniformly at this point. Do we know if there are any differences in configuration between those? |
There is a workaround per @boz , making it sev2 |
There have been new finding in https://github.com/ovrclk/engineering/issues/538 |
Reproducer
akash provider lease-shell
stops working;version 0.16.4-rc0
akash provider & client are of
0.16.4-rc0
versionlease-status after the eviction
lease-events logs
The text was updated successfully, but these errors were encountered: