Skip to content

Failure detection fixes#89

Merged
jhunt merged 3 commits intomasterfrom
fail-fixes
Feb 4, 2016
Merged

Failure detection fixes#89
jhunt merged 3 commits intomasterfrom
fail-fixes

Conversation

@geofffranks
Copy link
Copy Markdown
Contributor

See commit messages for details

Due to confusing WorkerUpdate struct usage, failed tasks
were being `FAILED`, then subsequently `STOPPED`, to set
the time of job completion. `STOPPED` also sets status to `done`.
`FAILED` already sets the completion time to now. The `FAILED`
op has been updated to be more explicit about setting completion
time, and moved to the end of the worker runloop, so you either
have success, or failure.

Additionally, archives were not being invalidated, as the logic
for that would only be triggered if the creation of an archive
in the database failed. This has been addressed to happen
only if the archive database record was successfully created.

Fixes #79
If a restore key was not parsed/detected by the worker,
the task is now failed, and no archive record is created in
the database.

Additionally, the database is now updated to refuse to allow
archives without restore keys.

Fixes #55
@geofffranks geofffranks changed the title Fail fixes Failure detection fixes Feb 4, 2016
@geofffranks
Copy link
Copy Markdown
Contributor Author

Tested on bosh-lite by:

  1. running a task that would fail, but have parseable json output + valid store key
  2. running a task that would succeed but have unparseable json output
  3. running a task that would succeed but have an empty store_key in the json output
  4. running a task that would succeed

Verified that:

  1. task was marked as 'failed' and archive record marked as 'invalid'
  2. task was marked as 'failed' and no archive record existed
  3. task was marked as 'failed' and no archive record existed
  4. task was marked as 'done' and archive record marked as 'valid'

jhunt added a commit that referenced this pull request Feb 4, 2016
@jhunt jhunt merged commit 814d2be into master Feb 4, 2016
@jhunt jhunt deleted the fail-fixes branch February 4, 2016 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants