Better support for sidecar containers in batch jobs #25908

a-robinson · 2016-05-19T20:55:20Z

Consider a Job with two containers in it -- one which does the work and then terminates, and another which isn't designed to ever explicitly exit but provides some sort of supporting functionality like log or metric collection.

What options exist for doing something like this? What options should exist?

Currently the Job will keep running as long as the second container keeps running, which means that the user has to modify the second container in some way to detect when the first one is done so that it can cleanly exit as well.

This question was asked on Stack Overflow a while ago with no better answer than to modify the second container to be more Kubernetes-aware, which isn't ideal. Another customer has recently brought this up to me as a pain point for them.

@kubernetes/goog-control-plane @erictune

soltysh · 2016-05-23T12:49:05Z

/sub

mingfang · 2016-09-22T20:39:10Z

Also using a liveness problem as suggested here http://stackoverflow.com/questions/36208211/sidecar-containers-in-kubernetes-jobs doesn't work since the pod will be considered failed and the overall job will not be considered successful.

mingfang · 2016-09-22T20:41:01Z

How about we declared a job success probe so that the Job can probe it to detect success instead of waiting for the pod to return 0.
Once the probe returns success, then the pod can be terminated.

erictune · 2016-09-22T23:53:33Z

Can probe run against a container that has already exited, or would there
be a race where it is being torn down?

Another option is to designate certain exit codes as having special meaning.

Both "Success for the entire pod" or "failure for the entire pod" are both
useful.

This would need to be on the Pod object, so that is a big API change.

On Thu, Sep 22, 2016 at 1:41 PM, Ming Fang notifications@github.com wrote:

How about we declared a job success probe so that the Job can probe it to
detect success instead of waiting for the pod to return 0.

Once the probe returns success, then the pod can be terminated.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#25908 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHuudjrpVtef6U35RWRlZr3mDKcCRo7oks5qsugRgaJpZM4IiqQH
.

mingfang · 2016-09-23T01:39:43Z

@erictune Good point; we can't probe an exited container.

Can we designate a particular container in the pod as the "completion" container so that when that container exits we can say the job is completed?

The sidecar containers tends to be long lived for things like log shipping and monitoring.
We can force terminate them once the job is completed.

soltysh · 2016-09-26T09:31:58Z

Can we designate a particular container in the pod as the "completion" container so that when that container exits we can say the job is completed?

Have you looked into this doc point 3, described in details in here where you basically don't set .spec.completions and as soon as first container finishes with 0 exit code the jobs is done.

The sidecar containers tends to be long lived for things like log shipping and monitoring.
We can force terminate them once the job is completed.

Personally, these look to me more like RS, rather than a job, but that's my personal opinion and most importantly I don't know full details of your setup.

Generally, there are following discussions #17244 and #30243 that are touching this topic as well.

mingfang · 2016-09-26T16:44:53Z

@soltysh the link you sent above, point 3 references pod completion and not container completion.

erictune · 2016-10-06T21:11:00Z

The two containers can share an emptyDir, and the first container and write an "I'm exiting now" message to a file and the other can exit when it sees that message.

anshumanbh · 2017-02-10T03:29:06Z

@erictune I have a use case which I think falls in this bucket and I am hoping you could guide me in the right direction since there doesn't seem to be any official recommended way to solve this problem.

I am using the client-go library to code everything below:

So, I have a job that basically runs a tool in a one container pod. As soon as the tool finishes running, it is supposed to produce a results file. I can't seem to capture this results file because as soon as the tool finishes running, the pod deletes and I lose the results file.

I was able to capture this results file if I used HostPath as a VolumeSource and since I am running minikube locally, the results file gets saved onto my workstation.

But, I understand that's not recommended and ideal for production containers. So, I used EmptyDir as suggested above. But, again, if I do that, I can't really capture it because it gets deleted with the pod itself.

So, should I be solving my problem using the sidecar container pattern as well?

Basically, do what you suggested above. Start 2 containers in the pod whenever the job starts. 1 container runs the job and as soon as the job gets done, drops a message that gets picked up by the other container which then grabs the result file and stores it somewhere?

I fail to understand why we would need 2 containers in the first place. Why can't the job container do all this by itself? That is, finish the job, save the results file somewhere, access it/read it and store it somewhere.

soltysh · 2017-02-14T06:53:17Z

@anshumanbh I'd suggest you:

use a persistent storage, you save the result file
use hostPath mount, which is almost the same as 1, and you've already tried it
upload the result file to a known remote location (s3, google drive, dropbox), generally any kind of shared drive

anshumanbh · 2017-02-14T19:02:13Z

@soltysh I don't want the file to be stored permanently. On every run, I just want to compare that result with the last result. So, the way I was thinking of doing this was committing to a github repository on every run and then doing a diff to see what changed. So, in order to do that, I just need to store the result temporarily somewhere so that I can access it to send it to Github. Make sense?

soltysh · 2017-02-20T13:40:51Z

@anshumanbh perfectly clear, and still that doesn't fall into the category of side-car container. All you want to achieve is currently doable with what jobs provide.

anshumanbh · 2017-02-22T01:03:32Z

@soltysh so considering I want to go for option 3 from the list you suggested above, how would I go about implementing it?

The problem I am facing is that as soon as the job finishes, the container exits and I lose the file. If I don't have the file, how do I upload it to shared drive like S3/Google Drive/Dropbox? I can't modify the job's code to automatically upload it somewhere before it quits so unfortunately I would have to first run the job and then save the file somewhere..

soltysh · 2017-02-23T14:10:06Z

If you can't modify job's code, you need to wrap it in such a way to be able to upload the file. If what you're working with is an image already just extend it with the copying code.

anshumanbh · 2017-02-23T17:44:08Z

@soltysh yes, that makes sense. I could do that. However, the next question I have is - suppose I need to run multiple jobs (think about it as running different tools) and none of these tools have the uploading part built in them. So, now, I would have to build that wrapper and extend each one of those tools with the uploading part. Is there a way I can just write the wrapper/extension once and use it for all the tools?

Wouldn't the side car pattern fit in that case?

soltysh · 2017-02-23T18:00:44Z

Yeah, it could. Although I'd try with multiple containers inside the same pod, pattern. Iow. your pod is running the job container and alongside additional one waiting for the output and uploading that. Not sure how feasible is this but you can give it a try already.

jmillikin-stripe · 2017-06-14T20:14:29Z

Gentle ping -- sidecar awareness would make management of microservice proxies such as Envoy much more pleasant. Is there any progress to share?

The current state of things is that each container needs bundled tooling to coordinate lifetimes, which means we can't directly use upstream container images. It also significantly complicates the templates, as we have to inject extra argv and mount points.

An earlier suggestion was to designate some containers as a "completion" container. I would like to propose the opposite -- the ability to designate some containers as "sidecars". When the last non-sidecar container in a Pod terminates, the Pod should send TERM to the sidecars. This would be analogous to the "background thread" concept found in many threading libraries, e.g. Python's Thread.daemon.

Example config, when container main ends the kubelet would kill envoy:

containers:
  - name: main
    image: gcr.io/some/image:latest
    command: ["/my-batch-job/bin/main", "--config=/config/my-job-config.yaml"]
  - name: envoy
    image: lyft/envoy:latest
    sidecar: true
    command: ["/usr/local/bin/envoy", "--config-path=/my-batch-job/etc/envoy.json"]

jmillikin-stripe · 2017-06-14T21:55:14Z

For reference, here's the bash madness I'm using to simulate desired sidecar behavior:

containers:
  - name: main
    image: gcr.io/some/image:latest
    command: ["/bin/bash", "-c"]
    args:
      - |
        trap "touch /tmp/pod/main-terminated" EXIT
        /my-batch-job/bin/main --config=/config/my-job-config.yaml
    volumeMounts:
      - mountPath: /tmp/pod
        name: tmp-pod
  - name: envoy
    image: gcr.io/our-envoy-plus-bash-image:latest
    command: ["/bin/bash", "-c"]
    args:
      - |
        /usr/local/bin/envoy --config-path=/my-batch-job/etc/envoy.json &
        CHILD_PID=$!
        (while true; do if [[ -f "/tmp/pod/main-terminated" ]]; then kill $CHILD_PID; fi; sleep 1; done) &
        wait $CHILD_PID
        if [[ -f "/tmp/pod/main-terminated" ]]; then exit 0; fi
    volumeMounts:
      - mountPath: /tmp/pod
        name: tmp-pod
        readOnly: true
volumes:
  - name: tmp-pod
    emptyDir: {}

soltysh · 2017-08-02T10:33:51Z

I would like to propose the opposite -- the ability to designate some containers as "sidecars". When the last non-sidecar container in a Pod terminates, the Pod should send TERM to the sidecars.

@jmillikin-stripe I like this idea, although I'm not sure if this follows the principal of treating some containers differently in a Pod or introducing dependencies between them. I'll defer to @erictune for the final call.

Although, have you checked #17244, would this type of solution fit your use-case? This is what @erictune mentioned a few comments before:

Another option is to designate certain exit codes as having special meaning.

jmillikin-stripe · 2017-08-02T16:49:05Z

@jmillikin-stripe I like this idea, although I'm not sure if this follows the principal of treating some containers differently in a Pod or introducing dependencies between them. I'll defer to @erictune for the final call.

I think Kubernetes may need to be flexible about the principal of not treating containers differently. We (Stripe) don't want to retrofit third-party code such as Envoy to have Lamprey-style lifecycle hooks, and trying to adopt an Envelope-style exec inversion would be much more complex than letting Kubelet terminate specific sidecars.

Although, have you checked #17244, would this type of solution fit your use-case? This is what @erictune mentioned a few comments before:

Another option is to designate certain exit codes as having special meaning.

I'm very strongly opposed to Kubernetes or Kubelet interpreting error codes at a finer granularity than "zero or non-zero". Borglet's use of exit code magic numbers was an unpleasant misfeature, and it would be much worse in Kubernetes where a particular container image could be either a "main" or "sidecar" in different Pods.

msperl · 2017-08-05T06:32:27Z

Maybe additional lifecycle hooks would be sufficient to solve this?

Could be:

PostStop: with a means to trigger lifecycle events on other containers in the pod (I.e trigger stop)
PeerStopped: signal that a "peer" container in the pod has died - possibly with exit code as an argument

This could also define a means to define custom policies to restart a container - or even start containers that are not started by default to allow some daisy chaining of containers (when container a finishes then start container b)

oxygen0211 · 2017-09-06T07:13:24Z

Also missing this. We run a job every 30 min that needs a VPN client for connectivity but there seem to be a lot of use cases where this could be very useful (for example stuff that needs kubectl proxy). Currently, I am using jobSpec.concurrencyPolicy: Replace as a workaround but of course this only works if a.) you can live without parallel job runs and b.) Job execution time is shorter than scheduling interval.

EDIT: in my usecase, it would be completely sufficient to have some property in the job spec to mark a container as the terminating one and have the job monitor that one for exit status and kill the remaining ones.

mvanholsteijn · 2021-02-07T12:56:51Z

Thank you for your response. How fast can you get the essential feature implemented and released?

I consider Kubernetes the father of the concept of running multiple containers together as a "pod": The simple fact that this concept breaks as soon as you run a job or cronjob, and is left broken for almost 5 years, is beyond my comprehension.

dims · 2021-02-07T19:32:51Z

please move discussions to the KEP:
kubernetes/enhancements#753

please see the latest updates to the KEP here:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers

last update was here:
https://github.com/kubernetes/enhancements/pull/1980/files

/close

davi5e · 2021-02-07T21:45:05Z

For an alpha feature, shouldn't the first try solve the Job conundrum? Most if not all jobs have one container per pod (before sidecars get inserted).

That would be a huge leap into getting an idea of how to solve it for other workloads too, in my modest opinion.

justinmchase · 2021-03-22T13:56:18Z

Can we get an update on this @dims ?

The initial KEP kubernetes/enhancements#753 that you linked to appears to have been closed with this mesage:

This KEP will not be progressing, Sig-node and others feel that this is not an incremental step in the right direction so they've gone back to the drawing board and will be coming up with some new KEPs that can hopefully solve all the use cases stated in this KEP as well as others.

Also this comment on your other thread about the "last update" says:

I think we should merge this but add a note saying this KEP has been rejected

I like many others just encountered this issue and spent a bunch of time trying to figure out if there is a solution and ultimately a work around and would love to have an update.

Also for what its worth, I agree with @krancour 's comment that a simple 90% solution is warranted at this point.

adisky · 2021-08-18T07:48:19Z

@justinmchase and others proposed a simple solution here kubernetes/enhancements#2872 based on above discussions and input from @dims, It would be be helpful to give feedback on the proposal.

arianf · 2022-10-26T18:27:17Z

Would like to mention another workaround that I don't see called out here yet. Instead of using a shared volume to communicate between sidecar/keystone containers, you could use shareProcessNamespace: true, and kill the sidecar containers process from the keystone container (as part of the last thing your keystone container does before finishing)

justinmchase · 2022-10-30T14:53:43Z

@arianf Can you share an example, even if pusedocode, of how to do this? Would it just be killing all processes other than your own?

arianf · 2022-10-31T20:57:29Z

Here is an example, I'm using nginx as the sidecar because it's easy to run without any configuration:

Note the caveat with this approach is that you need to make sure the keystone container doesn't finish before the sidecar container has started up, or else it won't be able to kill the other process.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: shared-process-cronjob
spec:
  schedule: "* * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 2
  jobTemplate:
    spec:
      template:
        spec:
          shareProcessNamespace: true
          restartPolicy: Never
          containers:
          - name: keystone
            image: busybox
            command:
              - /bin/sh
              - -c
              - >-
                echo "Starting work at $(date)";
                sleep 100;
                echo "Ending work at $(date)";
                pkill -SIGTERM nginx
          - name: sidecar
            image: nginx

You can see the pod was listed as "Completed":

$ k get pods                                        
shared-process-cronjob-1667249160-zmwl7         0/2     Completed          0          2m10s

And nginx exit code was 0:

$ k describe pods/shared-process-cronjob-1667249160-zmwl7 | grep -A10 "sidecar:"           
  sidecar:
    Container ID:   containerd://2853be81e63f8ef78440f11a315f8cffce6264abc54ed17a12de9deae5ba4b1f
    Image:          nginx
    Image ID:       docker.io/library/nginx@sha256:943c25b4b66b332184d5ba6bb18234273551593016c0e0ae906bab111548239f
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 31 Oct 2022 13:46:50 -0700
      Finished:     Mon, 31 Oct 2022 13:48:28 -0700

taylorchu · 2022-11-02T16:45:26Z

We use in-pod supervisor (https://github.com/taylorchu/wait-for) to solve the sidecar container issue. It is simple and can be used even if you don't use k8s.

robert-gdv · 2022-12-20T13:50:46Z

@dims wrote on Feb 7, 2021:

please move discussions to the KEP: kubernetes/enhancements#753

please see the latest updates to the KEP here: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers

last update was here: https://github.com/kubernetes/enhancements/pull/1980/files

/close

If you read the thread, they just came to a conclusion "We need an essential container flag". But then it get's closed and discussions redirected to an already closed (on Oct 21, 2020) issue kubernetes/enhancements#753 ?
Even all the links leaving it are dead ends. This is dissapointing.
It looks like 753 tried to boil the ocean, failed, and in the end we don't even get a hot tea.

dims · 2023-02-06T13:20:00Z

Folks,

Please see kubernetes/enhancements#3761 (New KEP for sidecar containers)

thanks,
Dims

a-robinson added area/batch team/control-plane labels May 19, 2016

erictune added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jul 7, 2016

erictune added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 6, 2016

0xmichalis added area/workload-api/job and removed team/control-plane (deprecated - do not use) labels May 20, 2017

soltysh mentioned this issue Aug 2, 2017

Jobs don't work well with running unit tests #19701

Closed

choliver mentioned this issue Aug 15, 2017

Postgres backups quartictech/infra#127

Merged

dims closed this as completed Feb 7, 2021

viceice mentioned this issue Feb 15, 2021

Add optional DIND sidecar renovatebot/helm-charts#91

Closed

Hugome mentioned this issue Mar 19, 2021

feat: add multiple allowed service account contollers dapr/dapr#2828

Merged

4 tasks

christian-roggia mentioned this issue Apr 29, 2021

Add custom annotations to auto-migrate Jobs ory/k8s#257

Closed

nflaig mentioned this issue Jul 7, 2021

[jaeger]: Add activeDeadlineSeconds value to all jobs jaegertracing/helm-charts#267

Merged

4 tasks

cjwagner mentioned this issue Jul 8, 2021

Add support for treating ProwJob workload containers as sidecar containers. kubernetes/test-infra#22829

Closed

adisky mentioned this issue Aug 18, 2021

Keystone containers KEP kubernetes/enhancements#2872

Closed

4 tasks

harscoet mentioned this issue Sep 13, 2021

Best practice to handle a kubernetes job that needs to connect to the mesh hashicorp/consul-k8s#722

Closed

jenwirth mentioned this issue Oct 29, 2021

Jobs and wrestic containers don't complete k8up-io/k8up#547

Closed

caviliar mentioned this issue Nov 30, 2021

Helm: ability to set annotations on init-db job pod apache/superset#17580

Closed

nohajc mentioned this issue Dec 15, 2021

Using dikastes sidecar in kubernetes jobs projectcalico/calico#5227

Closed

hzxuzhonghu mentioned this issue Jan 17, 2023

Envoy container prevents job from ending istio/istio#42824

Closed

sudo-suhas mentioned this issue Mar 21, 2023

fix(meteor): telegraf sidecar exit coordination goto/charts#8

Merged

Antvirf mentioned this issue Jun 5, 2023

webhook initialization fails with istio injection enabled kubernetes/ingress-nginx#6394

Closed

craigbox mentioned this issue Jul 26, 2023

Blog: Native Sidecars istio/istio.io#13632

Merged

mjnagel mentioned this issue Nov 2, 2023

feat: add pepr capability for istio + jobs defenseunicorns/uds-core#12

Merged

5 tasks

ndresselhaus mentioned this issue Aug 1, 2024

PLAT-1491 abstract cloudsql-proxy in crons dave-inc/charts#188

Merged

frittentheke mentioned this issue Aug 21, 2024

Revisit Istio support grafana/k6-operator#195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for sidecar containers in batch jobs #25908

Better support for sidecar containers in batch jobs #25908

a-robinson commented May 19, 2016 •

edited

Loading

soltysh commented May 23, 2016

mingfang commented Sep 22, 2016

mingfang commented Sep 22, 2016

erictune commented Sep 22, 2016

mingfang commented Sep 23, 2016 •

edited

Loading

soltysh commented Sep 26, 2016

mingfang commented Sep 26, 2016

erictune commented Oct 6, 2016

anshumanbh commented Feb 10, 2017

soltysh commented Feb 14, 2017

anshumanbh commented Feb 14, 2017

soltysh commented Feb 20, 2017

anshumanbh commented Feb 22, 2017

soltysh commented Feb 23, 2017

anshumanbh commented Feb 23, 2017

soltysh commented Feb 23, 2017

jmillikin-stripe commented Jun 14, 2017 •

edited

Loading

jmillikin-stripe commented Jun 14, 2017 •

edited

Loading

soltysh commented Aug 2, 2017

jmillikin-stripe commented Aug 2, 2017

msperl commented Aug 5, 2017

oxygen0211 commented Sep 6, 2017 •

edited

Loading

mvanholsteijn commented Feb 7, 2021 •

edited

Loading

dims commented Feb 7, 2021

davi5e commented Feb 7, 2021

justinmchase commented Mar 22, 2021

adisky commented Aug 18, 2021

arianf commented Oct 26, 2022

justinmchase commented Oct 30, 2022

arianf commented Oct 31, 2022

taylorchu commented Nov 2, 2022

robert-gdv commented Dec 20, 2022

dims commented Feb 6, 2023

Better support for sidecar containers in batch jobs #25908

Better support for sidecar containers in batch jobs #25908

Comments

a-robinson commented May 19, 2016 • edited Loading

soltysh commented May 23, 2016

mingfang commented Sep 22, 2016

mingfang commented Sep 22, 2016

erictune commented Sep 22, 2016

mingfang commented Sep 23, 2016 • edited Loading

soltysh commented Sep 26, 2016

mingfang commented Sep 26, 2016

erictune commented Oct 6, 2016

anshumanbh commented Feb 10, 2017

soltysh commented Feb 14, 2017

anshumanbh commented Feb 14, 2017

soltysh commented Feb 20, 2017

anshumanbh commented Feb 22, 2017

soltysh commented Feb 23, 2017

anshumanbh commented Feb 23, 2017

soltysh commented Feb 23, 2017

jmillikin-stripe commented Jun 14, 2017 • edited Loading

jmillikin-stripe commented Jun 14, 2017 • edited Loading

soltysh commented Aug 2, 2017

jmillikin-stripe commented Aug 2, 2017

msperl commented Aug 5, 2017

oxygen0211 commented Sep 6, 2017 • edited Loading

mvanholsteijn commented Feb 7, 2021 • edited Loading

dims commented Feb 7, 2021

davi5e commented Feb 7, 2021

justinmchase commented Mar 22, 2021

adisky commented Aug 18, 2021

arianf commented Oct 26, 2022

justinmchase commented Oct 30, 2022

arianf commented Oct 31, 2022

taylorchu commented Nov 2, 2022

robert-gdv commented Dec 20, 2022

dims commented Feb 6, 2023

a-robinson commented May 19, 2016 •

edited

Loading

mingfang commented Sep 23, 2016 •

edited

Loading

jmillikin-stripe commented Jun 14, 2017 •

edited

Loading

jmillikin-stripe commented Jun 14, 2017 •

edited

Loading

oxygen0211 commented Sep 6, 2017 •

edited

Loading

mvanholsteijn commented Feb 7, 2021 •

edited

Loading