Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for sidecar containers in batch jobs #25908

Closed
a-robinson opened this issue May 19, 2016 · 130 comments
Closed

Better support for sidecar containers in batch jobs #25908

a-robinson opened this issue May 19, 2016 · 130 comments
Labels
area/batch area/workload-api/job kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@a-robinson
Copy link
Contributor

a-robinson commented May 19, 2016

Consider a Job with two containers in it -- one which does the work and then terminates, and another which isn't designed to ever explicitly exit but provides some sort of supporting functionality like log or metric collection.

What options exist for doing something like this? What options should exist?

Currently the Job will keep running as long as the second container keeps running, which means that the user has to modify the second container in some way to detect when the first one is done so that it can cleanly exit as well.

This question was asked on Stack Overflow a while ago with no better answer than to modify the second container to be more Kubernetes-aware, which isn't ideal. Another customer has recently brought this up to me as a pain point for them.

@kubernetes/goog-control-plane @erictune

@soltysh
Copy link
Contributor

soltysh commented May 23, 2016

/sub

@erictune erictune added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jul 7, 2016
@mingfang
Copy link

Also using a liveness problem as suggested here http://stackoverflow.com/questions/36208211/sidecar-containers-in-kubernetes-jobs doesn't work since the pod will be considered failed and the overall job will not be considered successful.

@mingfang
Copy link

How about we declared a job success probe so that the Job can probe it to detect success instead of waiting for the pod to return 0.
Once the probe returns success, then the pod can be terminated.

@erictune
Copy link
Member

Can probe run against a container that has already exited, or would there
be a race where it is being torn down?

Another option is to designate certain exit codes as having special meaning.

Both "Success for the entire pod" or "failure for the entire pod" are both
useful.

This would need to be on the Pod object, so that is a big API change.

On Thu, Sep 22, 2016 at 1:41 PM, Ming Fang notifications@github.com wrote:

How about we declared a job success probe so that the Job can probe it to
detect success instead of waiting for the pod to return 0.

Once the probe returns success, then the pod can be terminated.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#25908 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHuudjrpVtef6U35RWRlZr3mDKcCRo7oks5qsugRgaJpZM4IiqQH
.

@mingfang
Copy link

mingfang commented Sep 23, 2016

@erictune Good point; we can't probe an exited container.

Can we designate a particular container in the pod as the "completion" container so that when that container exits we can say the job is completed?

The sidecar containers tends to be long lived for things like log shipping and monitoring.
We can force terminate them once the job is completed.

@soltysh
Copy link
Contributor

soltysh commented Sep 26, 2016

Can we designate a particular container in the pod as the "completion" container so that when that container exits we can say the job is completed?

Have you looked into this doc point 3, described in details in here where you basically don't set .spec.completions and as soon as first container finishes with 0 exit code the jobs is done.

The sidecar containers tends to be long lived for things like log shipping and monitoring.
We can force terminate them once the job is completed.

Personally, these look to me more like RS, rather than a job, but that's my personal opinion and most importantly I don't know full details of your setup.

Generally, there are following discussions #17244 and #30243 that are touching this topic as well.

@mingfang
Copy link

@soltysh the link you sent above, point 3 references pod completion and not container completion.

@erictune erictune added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 6, 2016
@erictune
Copy link
Member

erictune commented Oct 6, 2016

The two containers can share an emptyDir, and the first container and write an "I'm exiting now" message to a file and the other can exit when it sees that message.

@anshumanbh
Copy link

@erictune I have a use case which I think falls in this bucket and I am hoping you could guide me in the right direction since there doesn't seem to be any official recommended way to solve this problem.

I am using the client-go library to code everything below:

So, I have a job that basically runs a tool in a one container pod. As soon as the tool finishes running, it is supposed to produce a results file. I can't seem to capture this results file because as soon as the tool finishes running, the pod deletes and I lose the results file.

I was able to capture this results file if I used HostPath as a VolumeSource and since I am running minikube locally, the results file gets saved onto my workstation.

But, I understand that's not recommended and ideal for production containers. So, I used EmptyDir as suggested above. But, again, if I do that, I can't really capture it because it gets deleted with the pod itself.

So, should I be solving my problem using the sidecar container pattern as well?

Basically, do what you suggested above. Start 2 containers in the pod whenever the job starts. 1 container runs the job and as soon as the job gets done, drops a message that gets picked up by the other container which then grabs the result file and stores it somewhere?

I fail to understand why we would need 2 containers in the first place. Why can't the job container do all this by itself? That is, finish the job, save the results file somewhere, access it/read it and store it somewhere.

@soltysh
Copy link
Contributor

soltysh commented Feb 14, 2017

@anshumanbh I'd suggest you:

  1. use a persistent storage, you save the result file
  2. use hostPath mount, which is almost the same as 1, and you've already tried it
  3. upload the result file to a known remote location (s3, google drive, dropbox), generally any kind of shared drive

@anshumanbh
Copy link

@soltysh I don't want the file to be stored permanently. On every run, I just want to compare that result with the last result. So, the way I was thinking of doing this was committing to a github repository on every run and then doing a diff to see what changed. So, in order to do that, I just need to store the result temporarily somewhere so that I can access it to send it to Github. Make sense?

@soltysh
Copy link
Contributor

soltysh commented Feb 20, 2017

@anshumanbh perfectly clear, and still that doesn't fall into the category of side-car container. All you want to achieve is currently doable with what jobs provide.

@anshumanbh
Copy link

@soltysh so considering I want to go for option 3 from the list you suggested above, how would I go about implementing it?

The problem I am facing is that as soon as the job finishes, the container exits and I lose the file. If I don't have the file, how do I upload it to shared drive like S3/Google Drive/Dropbox? I can't modify the job's code to automatically upload it somewhere before it quits so unfortunately I would have to first run the job and then save the file somewhere..

@soltysh
Copy link
Contributor

soltysh commented Feb 23, 2017

If you can't modify job's code, you need to wrap it in such a way to be able to upload the file. If what you're working with is an image already just extend it with the copying code.

@anshumanbh
Copy link

@soltysh yes, that makes sense. I could do that. However, the next question I have is - suppose I need to run multiple jobs (think about it as running different tools) and none of these tools have the uploading part built in them. So, now, I would have to build that wrapper and extend each one of those tools with the uploading part. Is there a way I can just write the wrapper/extension once and use it for all the tools?

Wouldn't the side car pattern fit in that case?

@soltysh
Copy link
Contributor

soltysh commented Feb 23, 2017

Yeah, it could. Although I'd try with multiple containers inside the same pod, pattern. Iow. your pod is running the job container and alongside additional one waiting for the output and uploading that. Not sure how feasible is this but you can give it a try already.

@jmillikin-stripe
Copy link
Contributor

jmillikin-stripe commented Jun 14, 2017

Gentle ping -- sidecar awareness would make management of microservice proxies such as Envoy much more pleasant. Is there any progress to share?

The current state of things is that each container needs bundled tooling to coordinate lifetimes, which means we can't directly use upstream container images. It also significantly complicates the templates, as we have to inject extra argv and mount points.

An earlier suggestion was to designate some containers as a "completion" container. I would like to propose the opposite -- the ability to designate some containers as "sidecars". When the last non-sidecar container in a Pod terminates, the Pod should send TERM to the sidecars. This would be analogous to the "background thread" concept found in many threading libraries, e.g. Python's Thread.daemon.

Example config, when container main ends the kubelet would kill envoy:

containers:
  - name: main
    image: gcr.io/some/image:latest
    command: ["/my-batch-job/bin/main", "--config=/config/my-job-config.yaml"]
  - name: envoy
    image: lyft/envoy:latest
    sidecar: true
    command: ["/usr/local/bin/envoy", "--config-path=/my-batch-job/etc/envoy.json"]

@jmillikin-stripe
Copy link
Contributor

jmillikin-stripe commented Jun 14, 2017

For reference, here's the bash madness I'm using to simulate desired sidecar behavior:

containers:
  - name: main
    image: gcr.io/some/image:latest
    command: ["/bin/bash", "-c"]
    args:
      - |
        trap "touch /tmp/pod/main-terminated" EXIT
        /my-batch-job/bin/main --config=/config/my-job-config.yaml
    volumeMounts:
      - mountPath: /tmp/pod
        name: tmp-pod
  - name: envoy
    image: gcr.io/our-envoy-plus-bash-image:latest
    command: ["/bin/bash", "-c"]
    args:
      - |
        /usr/local/bin/envoy --config-path=/my-batch-job/etc/envoy.json &
        CHILD_PID=$!
        (while true; do if [[ -f "/tmp/pod/main-terminated" ]]; then kill $CHILD_PID; fi; sleep 1; done) &
        wait $CHILD_PID
        if [[ -f "/tmp/pod/main-terminated" ]]; then exit 0; fi
    volumeMounts:
      - mountPath: /tmp/pod
        name: tmp-pod
        readOnly: true
volumes:
  - name: tmp-pod
    emptyDir: {}

@soltysh
Copy link
Contributor

soltysh commented Aug 2, 2017

I would like to propose the opposite -- the ability to designate some containers as "sidecars". When the last non-sidecar container in a Pod terminates, the Pod should send TERM to the sidecars.

@jmillikin-stripe I like this idea, although I'm not sure if this follows the principal of treating some containers differently in a Pod or introducing dependencies between them. I'll defer to @erictune for the final call.

Although, have you checked #17244, would this type of solution fit your use-case? This is what @erictune mentioned a few comments before:

Another option is to designate certain exit codes as having special meaning.

@jmillikin-stripe
Copy link
Contributor

@jmillikin-stripe I like this idea, although I'm not sure if this follows the principal of treating some containers differently in a Pod or introducing dependencies between them. I'll defer to @erictune for the final call.

I think Kubernetes may need to be flexible about the principal of not treating containers differently. We (Stripe) don't want to retrofit third-party code such as Envoy to have Lamprey-style lifecycle hooks, and trying to adopt an Envelope-style exec inversion would be much more complex than letting Kubelet terminate specific sidecars.

Although, have you checked #17244, would this type of solution fit your use-case? This is what @erictune mentioned a few comments before:

Another option is to designate certain exit codes as having special meaning.

I'm very strongly opposed to Kubernetes or Kubelet interpreting error codes at a finer granularity than "zero or non-zero". Borglet's use of exit code magic numbers was an unpleasant misfeature, and it would be much worse in Kubernetes where a particular container image could be either a "main" or "sidecar" in different Pods.

@msperl
Copy link

msperl commented Aug 5, 2017

Maybe additional lifecycle hooks would be sufficient to solve this?

Could be:

  • PostStop: with a means to trigger lifecycle events on other containers in the pod (I.e trigger stop)
  • PeerStopped: signal that a "peer" container in the pod has died - possibly with exit code as an argument

This could also define a means to define custom policies to restart a container - or even start containers that are not started by default to allow some daisy chaining of containers (when container a finishes then start container b)

@oxygen0211
Copy link

oxygen0211 commented Sep 6, 2017

Also missing this. We run a job every 30 min that needs a VPN client for connectivity but there seem to be a lot of use cases where this could be very useful (for example stuff that needs kubectl proxy). Currently, I am using jobSpec.concurrencyPolicy: Replace as a workaround but of course this only works if a.) you can live without parallel job runs and b.) Job execution time is shorter than scheduling interval.

EDIT: in my usecase, it would be completely sufficient to have some property in the job spec to mark a container as the terminating one and have the job monitor that one for exit status and kill the remaining ones.

@mvanholsteijn
Copy link

mvanholsteijn commented Feb 7, 2021

Thank you for your response. How fast can you get the essential feature implemented and released?

I consider Kubernetes the father of the concept of running multiple containers together as a "pod": The simple fact that this concept breaks as soon as you run a job or cronjob, and is left broken for almost 5 years, is beyond my comprehension.

@dims
Copy link
Member

dims commented Feb 7, 2021

please move discussions to the KEP:
kubernetes/enhancements#753

please see the latest updates to the KEP here:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers

last update was here:
https://github.com/kubernetes/enhancements/pull/1980/files

/close

@dims dims closed this as completed Feb 7, 2021
@davi5e
Copy link

davi5e commented Feb 7, 2021

For an alpha feature, shouldn't the first try solve the Job conundrum? Most if not all jobs have one container per pod (before sidecars get inserted).

That would be a huge leap into getting an idea of how to solve it for other workloads too, in my modest opinion.

@justinmchase
Copy link

Can we get an update on this @dims ?

The initial KEP kubernetes/enhancements#753 that you linked to appears to have been closed with this mesage:

This KEP will not be progressing, Sig-node and others feel that this is not an incremental step in the right direction so they've gone back to the drawing board and will be coming up with some new KEPs that can hopefully solve all the use cases stated in this KEP as well as others.

Also this comment on your other thread about the "last update" says:

I think we should merge this but add a note saying this KEP has been rejected

I like many others just encountered this issue and spent a bunch of time trying to figure out if there is a solution and ultimately a work around and would love to have an update.

Also for what its worth, I agree with @krancour 's comment that a simple 90% solution is warranted at this point.

@adisky
Copy link
Contributor

adisky commented Aug 18, 2021

@justinmchase and others proposed a simple solution here kubernetes/enhancements#2872 based on above discussions and input from @dims, It would be be helpful to give feedback on the proposal.

@arianf
Copy link

arianf commented Oct 26, 2022

Would like to mention another workaround that I don't see called out here yet. Instead of using a shared volume to communicate between sidecar/keystone containers, you could use shareProcessNamespace: true, and kill the sidecar containers process from the keystone container (as part of the last thing your keystone container does before finishing)

@justinmchase
Copy link

@arianf Can you share an example, even if pusedocode, of how to do this? Would it just be killing all processes other than your own?

@arianf
Copy link

arianf commented Oct 31, 2022

Here is an example, I'm using nginx as the sidecar because it's easy to run without any configuration:

Note the caveat with this approach is that you need to make sure the keystone container doesn't finish before the sidecar container has started up, or else it won't be able to kill the other process.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: shared-process-cronjob
spec:
  schedule: "* * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 2
  jobTemplate:
    spec:
      template:
        spec:
          shareProcessNamespace: true
          restartPolicy: Never
          containers:
          - name: keystone
            image: busybox
            command:
              - /bin/sh
              - -c
              - >-
                echo "Starting work at $(date)";
                sleep 100;
                echo "Ending work at $(date)";
                pkill -SIGTERM nginx
          - name: sidecar
            image: nginx

You can see the pod was listed as "Completed":

$ k get pods                                        
shared-process-cronjob-1667249160-zmwl7         0/2     Completed          0          2m10s

And nginx exit code was 0:

$ k describe pods/shared-process-cronjob-1667249160-zmwl7 | grep -A10 "sidecar:"           
  sidecar:
    Container ID:   containerd://2853be81e63f8ef78440f11a315f8cffce6264abc54ed17a12de9deae5ba4b1f
    Image:          nginx
    Image ID:       docker.io/library/nginx@sha256:943c25b4b66b332184d5ba6bb18234273551593016c0e0ae906bab111548239f
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 31 Oct 2022 13:46:50 -0700
      Finished:     Mon, 31 Oct 2022 13:48:28 -0700

@taylorchu
Copy link

We use in-pod supervisor (https://github.com/taylorchu/wait-for) to solve the sidecar container issue. It is simple and can be used even if you don't use k8s.

@robert-gdv
Copy link

@dims wrote on Feb 7, 2021:

please move discussions to the KEP: kubernetes/enhancements#753

please see the latest updates to the KEP here: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers

last update was here: https://github.com/kubernetes/enhancements/pull/1980/files

/close

If you read the thread, they just came to a conclusion "We need an essential container flag". But then it get's closed and discussions redirected to an already closed (on Oct 21, 2020) issue kubernetes/enhancements#753 ?
Even all the links leaving it are dead ends. This is dissapointing.
It looks like 753 tried to boil the ocean, failed, and in the end we don't even get a hot tea.

@dims
Copy link
Member

dims commented Feb 6, 2023

Folks,

Please see kubernetes/enhancements#3761 (New KEP for sidecar containers)

thanks,
Dims

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/batch area/workload-api/job kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests