Is there any way to differentiate return codes for warnings vs diffs? #174

zylo47 · 2021-10-22T19:41:58Z

Skeema appears to use the same return code of 1 for warnings and when differences are detected. We would like to have our pipeline that uses Skeema fail on warnings because we don't want the pipeline to proceed on most warnings.

Is there any way to split out diffs vs warnings? Is there any planned change to modify the return codes to separate these?

see exit.go

evanelias · 2021-10-22T22:07:53Z

Are there specific warnings you're concerned about? Generally the intention in Skeema is that warnings are not a major problem, which is why they don't elevate the exit code to 2+ (fatal error).

With linter warnings, you can configure specific linter rules to be "error" as needed, so that use-case is already fully configurable.

With non-linter warnings, currently there isn't any way to make them fatal. For diff/push currently this category includes things like:

Table was skipped because it wasn't supported for diff (should be very rare for InnoDB tables at this point)
Dir was skipped entirely (config defined a schema but not a host, host-wrapper script returned no hosts, schema shellout returned no schema names, filesystem permissions error listing subdirs, etc)
.sql files contain statements that aren't parsed by Skeema, e.g. INSERTs
Summary messages after a fatal error, e.g. a warning that X additional operations were skipped due to a prior fatal error

There are a few others but the general theme is "something was skipped but it may not be terribly important". Some users' use-cases involve intentionally doing these things, which is why these aren't fatal or noisier. But if there are any specific warnings that should arguably be configurable to be fatal, I'm open to discussing.

Otherwise, one work-around is to just parse STDERR for "[WARN]", although that's admittedly kludgy.

evanelias · 2021-10-27T14:35:24Z

Just following up on this, to better understand the use-case and see if anything could be made more configurable. Are there any specific warnings that you're concerned with?

zylo47 · 2021-10-28T04:14:08Z

I will need some time to go through the execution history to get some examples

evanelias · 2021-10-28T15:12:57Z

👍 No prob, will keep this open a while.

zylo47 · 2021-10-29T14:56:10Z

I was only able to find one concrete example which correlates to your second bullet point above

[WARN] Skipping /tmp/skeema.DR9u/databases/my_database: no host defined for environment "my_environment"

The underlying problem is this: When I run my Skeema projects through a CI/CD pipeline I want to know when unexpected issues are occurring that will cause the project not to fully process. I can't accomplish this with Skeema's current exit code configuration because if it's configured to fail on error. The result is that some problems in the execution silently succeed in the pipeline. If the pipeline is configured to fail on warning and error then anytime a diff is found the pipeline will fail.

In my opinion, the diff should return a 0 exit status because diffs are simply an acceptable part of the execution process. This would allow the bullet points above to remain as warnings if that is needed and it would allow for the pipelines to fail on warning or error if that's the desired configuration.

I'm not sure if there was some underlying necessity to raise a warning on diff that I am missing other than informational purposes for detecting when changes would need to be deployed. If that is the reason, then it seems reasonable to use your suggestion above to parse STDERR for diff messages or possibly use a negative exit code value which shouldn't break current implementations. A negative exit code would actually solve everything fairly easily I think.

Please let me know if this seems like a realistic change or if you have any other suggestions to deal with this situation.

Thank you!

evanelias · 2021-10-29T16:32:14Z

no host defined for environment "my_environment"

If this is coming up accidentally, I'd imagine this would only happen when testing out a new environment, reconfiguring a CI/CD pipeline, etc right? That's a fairly rare case, and it should be easy to catch by eyeballing the first run (or using skeema diff my_environment once manually before automating it in CI/CD) so it doesn't seem necessary to have a configuration option to make this fatal.

Nor should this one always be fatal btw; some users intentionally configure their subdirs and environments in a way that they always hit this. e.g. similar to the technique in this FAQ entry, but with host instead of schema.

If the pipeline is configured to fail on warning and error then anytime a diff is found the pipeline will fail.

You could just wrap skeema diff in a small script or bash one-liner. For example:

skeema diff || if [ $? -gt 1 ]; then (exit 2); fi

I'm not sure if there was some underlying necessity to raise a warning on diff that I am missing other than informational purposes for detecting when changes would need to be deployed.

skeema diff follows the exit code conventions of GNU diff: 0 on no differences, 1 on differences found, 2+ on fatal error.

This exit code behavior enables automation for use-cases like these:

Detecting environmental drift: are there any unexpected changes in staging, prod, etc that shouldn't be there according to the repo?
Detecting shard drift: within a sharded environment, do any shards unexpectedly differ in their schema definitions?
Determining whether or not a commit or branch contains schema changes (useful in repos that contain application code as well as schema definitions)
Double-checking another operation: some users like to run a skeema diff right after a push, pull, or out-of-band/manual change (like a RENAME) to ensure the repo and db truly match. Skeema's integration test suite also uses this technique thoroughly, to confirm commands had the intended end-to-end effect.

Bigger picture, the root of this is that the Skeema CLI was designed to attempt to handle both by-hand and automated pipeline invocations, but pipeline here means in the generic UNIX pipeline sense, not specifically CI/CD. Using the CLI in CI/CD can definitely be a bit awkward at times. I had always intended to build separate tools/systems/products that were specifically designed for CI/CD, and spent a lot of time on this, starting with Skeema Cloud Linter. But the vast majority of Skeema users were completely uninterested in even trying this or providing any feedback, even when the product was completely free during a 15-month beta period. So I'm very hesitant to revisit building CI/CD specific solutions for now.

re: negative exit codes, it's an interesting idea, but in my experience these generally aren't ever used in programs on UNIX-like OSes (even though many OSes do allow them).

zylo47 · 2021-10-29T18:53:03Z

Thank you for the thorough reply. Based on the information you provided above, it sounds like the answer to the question of "can Skeema fail on warning in my CI/CD pipeline" is yes (with caveats). The implementation would look like this:

skeema diff (fail on exit code -gt 1) > skeema push (fail on exit code -gt 1) > skeema diff (fail on exit code -gt 0)

The second skeema diff (this could also be a second skeema push) operation would capture the warnings I may experience without the diff exit codes causing a failure status. Following this pattern would result in late failure detection but accomplishes what I want. It would have the added benefit of validating that the skeema push operation resulted in the database reflecting the desired state from the project code.

Thanks again, I think I'm good on this one!

evanelias · 2021-10-29T19:47:59Z

That'll get you closer but you may need to tweak it further. It really depends on what you're aiming to do with that initial skeema diff, and which specific cases you want to abort on.

One problem is that some types of warnings don't affect the exit code of either skeema diff or skeema push, at all:

linter warnings
unparseable statements in *.sql files: usually this means INSERTs that are tracked alongside the table definitions, but not supported at all by Skeema yet... but in some cases it means someone typo'ed a critical keyword like CREATE or TABLE. (other SQL syntax errors/typos generally won't cause this to trip; Skeema just needs to parse enough to tell what type of object it is, and the object's name)
dir skipped because it doesn't map to any hosts (the situation you encountered earlier) or schema names in this environment

Most of the other "something was skipped" cases bump the exit code to 2+. Of the ones that do impact exit code, in terms of log level, some of those are ERROR and some are WARN; there's not a 1:1 match between log level and exit code. This is admittedly inconsistent, as it grew organically. So far the general prevailing principle has been "if some legit use-case may involve hitting this situation regularly, it's a warning and doesn't impact the exit code". I'm definitely open to making this more consistent in a future release, it just hasn't been high-pri compared to other pressing needs.

The other thing to be aware of is that diff and push have different exit code=1 behavior from each other: see the exit code descriptions in the help text for diff and push. More specifically, regarding exit code 1 in particular:

skeema push (without --dry-run) only returns exit code 1 if at least one table had a modified definition in its .sql file but was unsupported for diff operations, and aside from that, nothing fatal happened.
skeema diff (or equivalently skeema push --dry-run) only returns exit code 1 when differences were found, and nothing fatal happened.

You probably want to fail the CI/CD pipeline if skeema push returns nonzero.

evanelias · 2021-11-09T15:44:03Z

Closing, but feel free to reopen (or open separate issues) if you hit cases where the inconsistency between log level and exit code is problematic, situations where there's a clear case for making the behavior more configurable, etc.

Maybe it could be useful to eventually have an option or subcommand that validates your skeema configuration for a given environment, to catch those "no host defined" or "no schema defined" cases explicitly.

zylo47 added the question label Oct 22, 2021

evanelias closed this as completed Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any way to differentiate return codes for warnings vs diffs? #174

Is there any way to differentiate return codes for warnings vs diffs? #174

zylo47 commented Oct 22, 2021

evanelias commented Oct 22, 2021

evanelias commented Oct 27, 2021

zylo47 commented Oct 28, 2021

evanelias commented Oct 28, 2021

zylo47 commented Oct 29, 2021

evanelias commented Oct 29, 2021

zylo47 commented Oct 29, 2021 •

edited

Loading

evanelias commented Oct 29, 2021 •

edited

Loading

evanelias commented Nov 9, 2021

Is there any way to differentiate return codes for warnings vs diffs? #174

Is there any way to differentiate return codes for warnings vs diffs? #174

Comments

zylo47 commented Oct 22, 2021

evanelias commented Oct 22, 2021

evanelias commented Oct 27, 2021

zylo47 commented Oct 28, 2021

evanelias commented Oct 28, 2021

zylo47 commented Oct 29, 2021

evanelias commented Oct 29, 2021

zylo47 commented Oct 29, 2021 • edited Loading

evanelias commented Oct 29, 2021 • edited Loading

evanelias commented Nov 9, 2021

zylo47 commented Oct 29, 2021 •

edited

Loading

evanelias commented Oct 29, 2021 •

edited

Loading