Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any way to differentiate return codes for warnings vs diffs? #174

Closed
zylo47 opened this issue Oct 22, 2021 · 9 comments
Closed
Labels

Comments

@zylo47
Copy link

zylo47 commented Oct 22, 2021

Skeema appears to use the same return code of 1 for warnings and when differences are detected. We would like to have our pipeline that uses Skeema fail on warnings because we don't want the pipeline to proceed on most warnings.

Is there any way to split out diffs vs warnings? Is there any planned change to modify the return codes to separate these?

see exit.go

@evanelias
Copy link
Contributor

Are there specific warnings you're concerned about? Generally the intention in Skeema is that warnings are not a major problem, which is why they don't elevate the exit code to 2+ (fatal error).

With linter warnings, you can configure specific linter rules to be "error" as needed, so that use-case is already fully configurable.

With non-linter warnings, currently there isn't any way to make them fatal. For diff/push currently this category includes things like:

  • Table was skipped because it wasn't supported for diff (should be very rare for InnoDB tables at this point)
  • Dir was skipped entirely (config defined a schema but not a host, host-wrapper script returned no hosts, schema shellout returned no schema names, filesystem permissions error listing subdirs, etc)
  • .sql files contain statements that aren't parsed by Skeema, e.g. INSERTs
  • Summary messages after a fatal error, e.g. a warning that X additional operations were skipped due to a prior fatal error

There are a few others but the general theme is "something was skipped but it may not be terribly important". Some users' use-cases involve intentionally doing these things, which is why these aren't fatal or noisier. But if there are any specific warnings that should arguably be configurable to be fatal, I'm open to discussing.

Otherwise, one work-around is to just parse STDERR for "[WARN]", although that's admittedly kludgy.

@evanelias
Copy link
Contributor

Just following up on this, to better understand the use-case and see if anything could be made more configurable. Are there any specific warnings that you're concerned with?

@zylo47
Copy link
Author

zylo47 commented Oct 28, 2021

I will need some time to go through the execution history to get some examples

@evanelias
Copy link
Contributor

👍 No prob, will keep this open a while.

@zylo47
Copy link
Author

zylo47 commented Oct 29, 2021

I was only able to find one concrete example which correlates to your second bullet point above

[WARN] Skipping /tmp/skeema.DR9u/databases/my_database: no host defined for environment "my_environment"

The underlying problem is this: When I run my Skeema projects through a CI/CD pipeline I want to know when unexpected issues are occurring that will cause the project not to fully process. I can't accomplish this with Skeema's current exit code configuration because if it's configured to fail on error. The result is that some problems in the execution silently succeed in the pipeline. If the pipeline is configured to fail on warning and error then anytime a diff is found the pipeline will fail.

In my opinion, the diff should return a 0 exit status because diffs are simply an acceptable part of the execution process. This would allow the bullet points above to remain as warnings if that is needed and it would allow for the pipelines to fail on warning or error if that's the desired configuration.

I'm not sure if there was some underlying necessity to raise a warning on diff that I am missing other than informational purposes for detecting when changes would need to be deployed. If that is the reason, then it seems reasonable to use your suggestion above to parse STDERR for diff messages or possibly use a negative exit code value which shouldn't break current implementations. A negative exit code would actually solve everything fairly easily I think.

Please let me know if this seems like a realistic change or if you have any other suggestions to deal with this situation.

Thank you!

@evanelias
Copy link
Contributor

no host defined for environment "my_environment"

If this is coming up accidentally, I'd imagine this would only happen when testing out a new environment, reconfiguring a CI/CD pipeline, etc right? That's a fairly rare case, and it should be easy to catch by eyeballing the first run (or using skeema diff my_environment once manually before automating it in CI/CD) so it doesn't seem necessary to have a configuration option to make this fatal.

Nor should this one always be fatal btw; some users intentionally configure their subdirs and environments in a way that they always hit this. e.g. similar to the technique in this FAQ entry, but with host instead of schema.

If the pipeline is configured to fail on warning and error then anytime a diff is found the pipeline will fail.

You could just wrap skeema diff in a small script or bash one-liner. For example:

skeema diff || if [ $? -gt 1 ]; then (exit 2); fi

I'm not sure if there was some underlying necessity to raise a warning on diff that I am missing other than informational purposes for detecting when changes would need to be deployed.

skeema diff follows the exit code conventions of GNU diff: 0 on no differences, 1 on differences found, 2+ on fatal error.

This exit code behavior enables automation for use-cases like these:

  • Detecting environmental drift: are there any unexpected changes in staging, prod, etc that shouldn't be there according to the repo?
  • Detecting shard drift: within a sharded environment, do any shards unexpectedly differ in their schema definitions?
  • Determining whether or not a commit or branch contains schema changes (useful in repos that contain application code as well as schema definitions)
  • Double-checking another operation: some users like to run a skeema diff right after a push, pull, or out-of-band/manual change (like a RENAME) to ensure the repo and db truly match. Skeema's integration test suite also uses this technique thoroughly, to confirm commands had the intended end-to-end effect.

Bigger picture, the root of this is that the Skeema CLI was designed to attempt to handle both by-hand and automated pipeline invocations, but pipeline here means in the generic UNIX pipeline sense, not specifically CI/CD. Using the CLI in CI/CD can definitely be a bit awkward at times. I had always intended to build separate tools/systems/products that were specifically designed for CI/CD, and spent a lot of time on this, starting with Skeema Cloud Linter. But the vast majority of Skeema users were completely uninterested in even trying this or providing any feedback, even when the product was completely free during a 15-month beta period. So I'm very hesitant to revisit building CI/CD specific solutions for now.

re: negative exit codes, it's an interesting idea, but in my experience these generally aren't ever used in programs on UNIX-like OSes (even though many OSes do allow them).

@zylo47
Copy link
Author

zylo47 commented Oct 29, 2021

Thank you for the thorough reply. Based on the information you provided above, it sounds like the answer to the question of "can Skeema fail on warning in my CI/CD pipeline" is yes (with caveats). The implementation would look like this:

skeema diff (fail on exit code -gt 1) > skeema push (fail on exit code -gt 1) > skeema diff (fail on exit code -gt 0)

The second skeema diff (this could also be a second skeema push) operation would capture the warnings I may experience without the diff exit codes causing a failure status. Following this pattern would result in late failure detection but accomplishes what I want. It would have the added benefit of validating that the skeema push operation resulted in the database reflecting the desired state from the project code.

Thanks again, I think I'm good on this one!

@evanelias
Copy link
Contributor

evanelias commented Oct 29, 2021

That'll get you closer but you may need to tweak it further. It really depends on what you're aiming to do with that initial skeema diff, and which specific cases you want to abort on.

One problem is that some types of warnings don't affect the exit code of either skeema diff or skeema push, at all:

  • linter warnings
  • unparseable statements in *.sql files: usually this means INSERTs that are tracked alongside the table definitions, but not supported at all by Skeema yet... but in some cases it means someone typo'ed a critical keyword like CREATE or TABLE. (other SQL syntax errors/typos generally won't cause this to trip; Skeema just needs to parse enough to tell what type of object it is, and the object's name)
  • dir skipped because it doesn't map to any hosts (the situation you encountered earlier) or schema names in this environment

Most of the other "something was skipped" cases bump the exit code to 2+. Of the ones that do impact exit code, in terms of log level, some of those are ERROR and some are WARN; there's not a 1:1 match between log level and exit code. This is admittedly inconsistent, as it grew organically. So far the general prevailing principle has been "if some legit use-case may involve hitting this situation regularly, it's a warning and doesn't impact the exit code". I'm definitely open to making this more consistent in a future release, it just hasn't been high-pri compared to other pressing needs.

The other thing to be aware of is that diff and push have different exit code=1 behavior from each other: see the exit code descriptions in the help text for diff and push. More specifically, regarding exit code 1 in particular:

  • skeema push (without --dry-run) only returns exit code 1 if at least one table had a modified definition in its .sql file but was unsupported for diff operations, and aside from that, nothing fatal happened.
  • skeema diff (or equivalently skeema push --dry-run) only returns exit code 1 when differences were found, and nothing fatal happened.

You probably want to fail the CI/CD pipeline if skeema push returns nonzero.

@evanelias
Copy link
Contributor

Closing, but feel free to reopen (or open separate issues) if you hit cases where the inconsistency between log level and exit code is problematic, situations where there's a clear case for making the behavior more configurable, etc.

Maybe it could be useful to eventually have an option or subcommand that validates your skeema configuration for a given environment, to catch those "no host defined" or "no schema defined" cases explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants