compiled dbt test output could be more helpful #517
Not sure if this belongs here or in a separate issue, but it'd be great if the test output printed out in the order of the DAG (and grouped by the model they reference). So the tests for the
referenced this issue
Nov 14, 2018
might not be applicable for all use cases, but it could be useful to also output a dbt output (with appropriate "refs") into a "Test Results" folder. I can see this being helpful if a user would want to:
Some great thoughts on this Discourse post that I wanted to expand on.
To me, the fundamental issue is that the query used to test is often different than the query needed to debug (I think "debug" is a more accurate term for this than "audit").
This means the dbt user has to take a lot of steps to get to the bottom of a test failure:
This is especially problematic when debugging a CI failure, where the compiled test queries aren't directly available to the user unless they have saved them as artifacts of the build.
It would be great if dbt did all of this and presented the test failures clearly!
@drewbanin, you suggested implementing a separate debug query from the test query, either as a CTE or as a comment. I think this is a great idea. However, this only eliminates step 3 above. Why not set this up so dbt runs the debug query on test failure, returns the first 2-3 results to the command line, and logs the full result to an audit table (re: #903)?
Something like this?
I second @joshtemple's idea of having a debug query that gets kicked off when a test fails. It would be especially useful with time-based data tests. I have several tests that only look at the last 24 hours' data, so it's a pain to convert the test SQL into debug SQL AND fix the dates so they cover the same 24-hour period as the failed test (which might've been a few days ago). A debug query kicking off right after a failed test would be able to look at the exact same time period and retain the results.
@joshtemple I'm relatively happy to have dbt run a query here and print the results to stdout, though I do think it could get kind of messy for some types of (mostly custom) tests. The example shown here (uniqueness) makes a ton of sense, but it's less clear to me what the output should be for
I still love the idea of persisting test failures in tables, but I'm unsure about the mechanisms that dbt should employ to persist & manage these tests. Databases like BQ support table expiration, but on other databases, dbt will need to clean up these test tables in order to not make a really big mess of things.
Some very practical (and tractable) questions for us to consider:
Curious about your collective thoughts on these questions, or any other ideas/questions that are conjured up as you think about this one!