Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom BackoffLimit & concurrencyPolicy for SCDF Tasks are not passed to PODS while executing in Openshift environment #398

Closed
ilayaperumalg opened this issue Sep 21, 2020 · 20 comments
Assignees
Labels
status/in-progress Something is happening

Comments

@ilayaperumalg
Copy link
Contributor

@Srkanna commented on Sat Sep 19 2020

I'm trying to set a backoffLimit & concurrencyPolicy for batch jobs which are executed in Openshift environment via SCDF. Currently I'm setting these two at the global server config level. The resource limits, imagePullPolicy configurations are being passed to the CronJob but not backoffLimit and concurrencyPolicy.

I'm experiencing this in 2.6.1 and earlier versions as well. Below is the server-config.yaml.

  cloud:
    dataflow:
      task:
        platform:
          kubernetes:
            accounts:
              dev:
                limits:
                    memory: 1024Mi
                    cpu: 1
                entry-point-style: exec
                image-pull-policy: always
                backoffLimit: 1
                maxCrashLoopBackOffRestarts: 1
                concurrencyPolicy: forbid
  datasource:
    url: ${oracle-root-url}
    username: ${oracle-root-username}
    password: ${oracle-root-password}
    driver-class-name: oracle.jdbc.OracleDriver
    testOnBorrow: true
    validationQuery: "SELECT 1"
  flyway:
    enabled: false
  jpa:
    hibernate:
      use-new-id-generator-mappings: true

Both backoffLimit and maxCrashLoopBackOffRestarts are not passed to POD configuration. I still see PODS are getting restarted 6 times instead of 1 time after a failure. Below is the CronJob.yaml which I extracted from the Openshift cluster console after creating the schedule in SCDF for a batch job.

kind: CronJob
apiVersion: batch/v1beta1
metadata:
  name: batchjob1
  namespace: dev-batch
  selfLink: /apis/batch/v1beta1/namespaces/dev-batch/cronjobs/batchjob1
  uid: bef709dc-fa3a-11ea-933e-001a4a1a0116
  resourceVersion: '144552724'
  creationTimestamp: '2020-09-19T05:41:20Z'
  labels:
    spring-cronjob-id: batchjob1
spec:
  schedule: '*/10 * * * *'
  concurrencyPolicy: Allow
  suspend: false
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      template:
        metadata:
          creationTimestamp: null
        spec:
          containers:
            - name: batchjob1
              image: >-
                docker-registry.default.svc:5000/batch/batch-job:0.0.4
              args:
                - '--spring.datasource.username=BATCH_APP'
                - '--spring.cloud.task.name=batchjob1'
                - >-
                  --spring.datasource.url=jdbc:oracle:thin:@URL
                - '--spring.datasource.driverClassName=oracle.jdbc.OracleDriver'
                - '--spring.datasource.password=password'
                - '--spring.batch.job.names=Job1'
              env:
                - name: SPRING_CLOUD_APPLICATION_GUID
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: metadata.uid
              resources:
                limits:
                  cpu: '1'
                  memory: 1Gi
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              imagePullPolicy: Always
          restartPolicy: Never
          terminationGracePeriodSeconds: 30
          dnsPolicy: ClusterFirst
          serviceAccountName: default
          serviceAccount: default
          securityContext: {}
          schedulerName: default-scheduler
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
status: {}

Kindly let me know your inputs. @ilayaperumalg @sabbyanandan


@ilayaperumalg commented on Mon Sep 21 2020

Hi @Srkanna,

This looks like a bug. Moving this to Spring Cloud Deployer Kubernetes. Thanks for reporting.

@chrisjs
Copy link
Contributor

chrisjs commented Sep 21, 2020

looks more an unimplemented use case than bug. also concurrencyPolicy currently isn't a supported property. supported deployer properties can be found in the corresponding documentation, for example:

https://docs.spring.io/spring-cloud-dataflow/docs/2.6.1/reference/htmlsingle/#configuration-kubernetes-deployer

@chrisjs chrisjs added the type/help-needed Calling help label Sep 21, 2020
@Srkanna
Copy link

Srkanna commented Sep 30, 2020

I can add the concurrencyPolicy directly to the cronjob.yaml created in Openshift environment when a scheduled task is being submitted. However I couldn't do the same for backoffPolicy. Is there any workaround we have for setting backoff limit ?
@chrisjs @ilayaperumalg

@chrisjs
Copy link
Contributor

chrisjs commented Sep 30, 2020

please provide

  • reproducible steps to create a scheduled job in the same way you are

  • the changes you are trying to make when editing object(s) directly

@Srkanna
Copy link

Srkanna commented Oct 5, 2020

I cannot edit the actual post as it's not created by me. Hence posting the steps to reproduce.

  1. SCDF Installation & Task creation:
    I followed the steps provided in link https://dataflow.spring.io/docs/installation/kubernetes/kubectl. you can use the same batch job used in the documentation(Link here). Use kubernetes Version of the documentation.

Now once the SCDF deployed in openshift environment, I imported the batch job application as Docker Image using Docker URI instead of maven repo URL. I actually built my batch application in openshift environment and used that Docker URI.

Now I can schedule the Batch job available and the request is submitted to openshift environment. Below is the config-map used by the tasks.

cloud:
    dataflow:
      task:
        platform:
          kubernetes:
            accounts:
              dev:
                limits:
                    memory: 1024Mi
                    cpu: 1
                entry-point-style: exec
                image-pull-policy: always
                backoffLimit: 1
                maxCrashLoopBackOffRestarts: 1
                concurrencyPolicy: forbid
  datasource:
    url: ${oracle-root-url}
    username: ${oracle-root-username}
    password: ${oracle-root-password}
    driver-class-name: oracle.jdbc.OracleDriver
    testOnBorrow: true
    validationQuery: "SELECT 1"
  flyway:
    enabled: false
  jpa:
    hibernate:
      use-new-id-generator-mappings: true

the image-pull-policy property is getting transported to openshift. But not,

                backoffLimit: 1
                maxCrashLoopBackOffRestarts: 1
                concurrencyPolicy: forbid

I found this is not transported to openshift by going to cluster console -> Workloads ->Cronjobs. Then choose the corresponding openshift project from the dropdown list available at the top.
1)
image

image

In the image above I could edit the yaml for any job we scheduled in scdf. It also accepts properties like concurrencyPolicy: forbid. However I don't find any property for backoffLimit.

our jobs would run at very close intervals. Mostly between 2-3 minutes. So in case if a job fails then the pod is getting created for 6 times, and it takes more than 5 minutes for all of them to complete. In the meantime the next scheduled execution also starts and fails which creates another 6 pods . This exhausts the resource in no time.

So It would be great if there's any such property to limit pod creation on failure.

@chrisjs
Copy link
Contributor

chrisjs commented Oct 7, 2020

looks like there are a couple things here.. i don't have openshift to test against but that's likely not a concern. i've made an attempt to reproduce what your seeing and made the notes below as long as opened some issues to track.

1 - the kubernetes property concurrencyPolicy is currently not implemented in the deployer. i have opened an enhancement issue to do so which is located at: #406. please feel free to contribute if you can.

2 - the deployer property maxCrashLoopBackOffRestarts is used only by the deployer for state checking when a container is in CrashLoopBackOff state. this is not a kubernetes property nor does it get set on any pod, job, etc. possibly there is a kubernetes property you would like to use that has the similar intended functionality. if the desired property is not currently supported by the deployer, we can have that open as a separate enhancement issue

3 - in regards to backoffLimit there are two things at play here:

a) when a task is created by default, a "bare pod" is used unless you set the deployer property createJob. this will result in the tasks being run in a kubernetes Job rather than a pod. the backoffLimit property applies to Job's so you need to enable that, noting createJob: true:

              dev:
                limits:
                    memory: 1024Mi
                    cpu: 1
                createJob: true
                backoffLimit: 1

while the above should work, in the current state, the backoffLimit property when set though the ConfigMap is not being passed through from data flow to the deployer properly. i have opened an issue for that located here: spring-cloud/spring-cloud-dataflow#4186

b) to work around backoffLimit not being passed correctly via the configmap, you can set a deployer property when you launch each task, for example:

task create --name t2 --definition "timestamp" 
task launch --name t2 --properties "deployer.timestamp.kubernetes.backoffLimit=1"

results in the following objects:

the job:

job.batch/t2-ddopmx83lp   0/1           45s        45s

the spawned pod:

pod/t2-ddopmx83lp-88b7q            1/1     Running   0          45s

when inspecting the job.batch/t2-ddopmx83lp object, you would then find the backoffLimit property set, ie:

   backoffLimit: 1

4 - when scheduling a task, setting the backoffLimit on a CronJob object is not currently implemented - i have opened an enhancement issue here: #407

I think only spring-cloud/spring-cloud-dataflow#4186 needs to be resolved to close this issue as its the only "bug". the others are logged feature enhancements or incorrect property usage.

@szopal
Copy link

szopal commented May 6, 2022

It's seems that just add in class KubernetesDeployerProperty

private int backoffLimit = 0;

public int getBackoffLimit() {
	return backoffLimit;
}

public void setBackoffLimit(int backoffLimit) {
	this.backoffLimit = backoffLimit;
}

and in KubernetesScheduler:
cronJob.getSpec().getJobTemplate().getSpec().setBackoffLimit(properties.getBackoffLimit());

and it should works. Anyone?

@saugion
Copy link

saugion commented Mar 30, 2023

I'm not able to make this work. I tried to create a scheduler from the scdf UI, once with spring.cloud.deployer.kubernetes.backoffLimit=1 as argument and once with deployer.kubernetes.backoffLimit=1 as property, but in both cases they are not taken into account.

Any suggestion? Tnx

@onobc
Copy link
Contributor

onobc commented Mar 30, 2023

Hi @saulgiordani

In the "Launch Task" screen in the UI, do you see backoffLimit as an option in Deployment Platform -> Properties -> Edit ? If not, what options do you see in there?

The screenshot below is for the "local" (not "kubernetes" platform)
Screen Shot 2023-03-30 at 09 16 49

Try instead using the property deployer.<your-app-name>.kubernetes.backoffLimit=<your-backoff-limit>.
The format is deployer.<app>.<platform>.<property-path>=<property-value>.

If you choose "Free text" in the "Launch Task -> Deployment Platform -> Properties" screen you can see how the UI sets the properties.

Screen Shot 2023-03-30 at 09 27 18

@saugion
Copy link

saugion commented Mar 30, 2023

Hi @onobc, i'm trying to add the backoffLimit from the schedules view, not the launch view
Screenshot 2023-03-30 at 16 51 45

@onobc
Copy link
Contributor

onobc commented Mar 30, 2023

Yeh, my bad on the screens @saulgiordani - you are in the scheduler.

Still, try my suggestion of instead using the property deployer.<your-app-name>.kubernetes.backoffLimit=<your-backoff-limit>.

Note, the scheduler properties ("spring.cloud.scheduler.kubernetes") are deprecated and have been replaced w/ the deployer properties ("spring.cloud.deployer.kubernetes") - although the code still handles both.

@saugion
Copy link

saugion commented Mar 31, 2023

Hi @onobc, i've tried with the following parameters in the properties text area:

  • scheduler.kubernetes.taskServiceAccountName=scdf-sa
  • deployer.my_app_name.kubernetes.backoffLimit=0 (also tried with deployer.my_app_name.kubernetes.backoff-limit=0 and scheduler.deployer.kubernetes.backoff-limit=0 with no luck)

The 1st property is taken correctly (if I use deployer instead of scheduler IS NOT TAKEN), the 2nd is not.

@imitbn
Copy link

imitbn commented Apr 26, 2023

It seems that it's not possible. BackoffLimit is not exposed in KubernetesScheduler:
https://github.com/spring-cloud/spring-cloud-deployer/blob/main/spring-cloud-deployer-kubernetes/src/main/java/org/springframework/cloud/deployer/spi/kubernetes/KubernetesScheduler.java#L237
However, KubernetesTaskLauncher exposes it

@onobc
Copy link
Contributor

onobc commented Apr 26, 2023

Good catch @imitbn ,

I will add this to the KubernetesScheduler as well @saulgiordani .

@saugion
Copy link

saugion commented Apr 26, 2023

Good catch @imitbn ,

I will add this to the KubernetesScheduler as well @saulgiordani .

Great, thanks!

@onobc
Copy link
Contributor

onobc commented Apr 26, 2023

If all goes well, we can get it squeezed into 2.10.3 which is planned to release in a few days.

@onobc onobc self-assigned this Apr 26, 2023
@onobc onobc added status/in-progress Something is happening and removed type/help-needed Calling help labels Apr 26, 2023
@onobc
Copy link
Contributor

onobc commented Apr 26, 2023

Closing this in favor of #407 as I think everything else besides that is done in this issue.

@onobc onobc closed this as completed Apr 26, 2023
@fgapito
Copy link

fgapito commented May 29, 2023

Hi,

I still cannot schedule task with backoffLimit = 0. I put this on my scdf application.yml file:
image

I see that openshift cronjobs are still generated without backoffLimit property.

What else I have to do to let it works?

Thank you.

f

@saugion
Copy link

saugion commented May 29, 2023

Hi, the following is working fine for me:
apiVersion: batch/v1 kind: CronJob metadata: creationTimestamp: "2023-03-30T14:02:08Z" generation: 2 labels: spring-cronjob-id: ewd-conversor name: ewd-test namespace: x2 resourceVersion: "1515110" uid: d240493a-84de-45ba-8de9-417849036152 spec: concurrencyPolicy: Allow failedJobsHistoryLimit: 1 jobTemplate: metadata: creationTimestamp: null spec: backoffLimit: 1

If you want to set it through the scheduler, put this as property:
scheduler.kubernetes.cron.backoffLimit=0

and you will see that the pod definition includes the backoffiLimit

@fgapito
Copy link

fgapito commented May 29, 2023

Thank you, but as far as I understood this scheduler.kubernetes.cron.backoffLimit has been deprecated, isn't so?

where should I put this scheduler.kubernetes.cron.backoffLimit=0?

EDIT: it works if I put that here:
image

@saugion
Copy link

saugion commented May 29, 2023

That's the way, this behaviour has actually been added in the scdf 2.10.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/in-progress Something is happening
Projects
None yet
Development

No branches or pull requests

8 participants