New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically generated server UUID in rundeck docker container breaks scheduled jobs #4181

Closed
wilreichert opened this Issue Nov 6, 2018 · 1 comment

Comments

Projects
None yet
3 participants
@wilreichert

wilreichert commented Nov 6, 2018

Describe the bug
In the latest rundeck docker container it has a hardcoded rundeck.clusterMode.enabled=true in rundeck-config.properties and automatically generates rundeck.server.uuid in framework.properties. Assuming you have $HOME/server/data mapped to persistent storage, upon subsequent container delete / re-creates the scheduled job will no longer run as the job maps to a server UUID that no longer exists.

My Rundeck detail

  • Rundeck version: 3.0.8
  • install type: rundeck official docker container
  • OS Name/version: host - ubuntu 16.04
  • DB Type/version: mysql

To Reproduce

  1. start a containerized rundeck
mkdir -p ~/tmp/rundeck
docker run --name rundeck -p 4440:4440 -v $HOME/tmp/rundeck:/home/rundeck/server/data rundeck/rundeck:3.0.8
  1. Login to http://127.0.0.1:4440/ and create a new project using the default values.
  2. In that project create a job, have it execute something trivial like date, and schedule it to run every minute with a 0 * * ? * * * cron. Save it and confirm the job runs every minute.
  3. In another terminal stop & delete the container
docker stop rundeck
docker rm rundeck
  1. Re-run the original command to start a new docker instance with the previously created project & job
docker run --name rundeck -p 4440:4440 -v $HOME/tmp/rundeck:/home/rundeck/server/data rundeck/rundeck:3.0.8
  1. Login and browse to the job. You will see the time count down to zero, but the job will never run.
    This can be confirmed by going to http://127.0.0.1:4440/metrics/metrics?pretty=true and finding the scheduled jobs value which will be zero like below.
    "rundeck.scheduler.quartz.scheduledJobs" : {
      "count" : 0
    }

No error is ever displayed in the logs making the situation extremely confusing.

  1. If you disable scheduling on the job, save it, then re-enable it everything will once again work as expected.

This can be resolved in a number of ways.

  • Pass RUNDECK_SERVER_UUID to the container as an environment variable. makes clustering a challenge since each node requires a unique consistent value
  • map a static framework.properties file into each cluster, also must be unique for each node.
  • disable cluster mode by default
  • don't map jobs directly to server UUIDs, or allow for dynamic updates of the job server mapping on restarts
  • throw an exception in the log that the job refers to a server UUID that no longer exists or add a warning in the UI

Regardless of how this is addressed the documentation absolutely needs updating or more people will become confused.

@gschueler gschueler added the bug label Nov 14, 2018

@ProTip ProTip self-assigned this Nov 26, 2018

@ProTip

This comment has been minimized.

Contributor

ProTip commented Nov 26, 2018

Thanks for pointing this out @wilreichert . I believe the best workaround is indeed to set the UUID manually, however it does have some drawbacks as you mentioned in a Kubernetes-like deployment. For running OSS(this project) in an HA deployment on Kubernetes I might create two separate deployments so the UUID can be persisted to each instance. This is something that is not an issue with our paid support distributions due to features in the cluster plugin..

To reduce the surprise factor and ensure scheduling works properly out of the box, I will hard code a UUID default and add a notice to the top of the Docker image documentation explaining that this should be overridden for multi-instance implementations.

@ProTip ProTip added this to the 3.0.9 milestone Nov 27, 2018

ProTip added a commit that referenced this issue Nov 27, 2018

@ProTip ProTip closed this in 1fc3faa Nov 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment