Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix container stats endpoint response handling #183

Merged
merged 1 commit into from
Aug 31, 2022
Merged

fix container stats endpoint response handling #183

merged 1 commit into from
Aug 31, 2022

Conversation

dermetfan
Copy link
Contributor

@dermetfan dermetfan commented Jul 21, 2022

Podman 4.1.1 changed the stats endpoint's HTTP status for stopped containers from 404 to 200.

Fixes #182
Closes #187

@hashicorp-cla
Copy link

hashicorp-cla commented Jul 21, 2022

CLA assistant check
All committers have signed the CLA.

api/container_stats.go Outdated Show resolved Hide resolved
@DemonicTutor
Copy link

where you able to figure out the error log message? i still do not know where this comes from - if i execute the podman commands from console its fine.

@jdoss
Copy link
Contributor

jdoss commented Jul 21, 2022

I am not sure this is working. I pulled the PR and built it for testing on my test cluster and it is still leaving containers after jobs have been stopped.

Podman 4.1.1 changed the REST API HTTP status for stopped containers
from 404 to 200.

fix #182
@dermetfan
Copy link
Contributor Author

@DemonicTutor I have not looked into the error message specifically but it says

cannot get cgroup path unless container […] is running: container is stopped

which, to me, sounds like Nomad is trying to read the cgroup of a stopped container because, due to #182, it thinks the container is still running when it really is already stopped. So that error might go away with this fix.

@zandeez
Copy link

zandeez commented Aug 19, 2022

I can confirm this resolves this issue for me, RockyLinux 9, podman 4.1.1, nomad 1.3.3

@Procsiab
Copy link
Contributor

Procsiab commented Aug 24, 2022

Hello there, I would like to give my feedback too on this PR: I merged it into the main branch at the commit 4efeb99, and the compiled driver is working as expected with Nomad 1.3.3 and Podman 4.2.0 (Fedora IoT 36.20220822.0).

Edit: I should point out that, if using the "official" 0.4.0 driver with the same environment, if I upload a new job version for a particolar job or if I stop a job, the Nomad client hangs after stopping the pre-existing allocation, until I manually remove the stopped containers with the podman CLI.
This does not happen with the driver compiled as I described above.

@shoenig shoenig self-requested a review August 30, 2022 16:18
@shoenig
Copy link
Member

shoenig commented Aug 31, 2022

Hi @dermetfan thanks for the PR! I gave this a try and while the functionality of stopping a container seems to work.

I noticed what looks like a similar side affect of Podman's breaking API change, but we can open another issue for that,

2022-08-31T09:36:59.652-0500 [WARN]  client.driver_mgr.nomad-driver-podman: Could not remove container: driver=podman @module=podman container=7968dcd1e3e2d0574251c1ca06b632792a720f46b7c8eb7bff425b8c852befa9 error="cannot delete container, status code: 200" timestamp=2022-08-31T09:36:59.651-0500

@shoenig shoenig merged commit 9a3f94f into hashicorp:main Aug 31, 2022
lgfa29 added a commit that referenced this pull request Nov 15, 2022
@lgfa29 lgfa29 added this to the v0.4.1 milestone Nov 15, 2022
lgfa29 added a commit that referenced this pull request Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Nomad does not see stopping jobs with podman 4.1.1 (rest-api response change)
8 participants