Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Run tests after model creation, persisting updates only if the tests pass #1054
There have been a bunch of suggestions int Slack to run tests during model creation, and only persist changes if the tests pass. Here is a sampling of such suggestions:
Who will this benefit?
A paradigm like this would help ensure that errant transformations are caught before they are deployed to production.
Things to think about:
There's lots more to discuss here too, so please drop your thoughts in the comments below!
I think this is a great idea. There are a few inter-related issues and features which I'll try to start picking apart. This would be a pretty major architecture change, so I think we will all benefit if we can get clear on the requirements, priorities, and streams of work.
I feel strongly that DBT should be running in pseudo-transactions(henceforth "transactions") when it's running in production (i.e., there are downstream users querying the transformations). Here's why: in the middle of a DBT run, end users could receive invalid query results if two relations have become inconsistent.
Imagine you're calculating an order rate among users as the share of users that have an order. If this query gets execute after the
In this world, end users will never be querying data that are part old and part new.
DBT Run Modes
As I've alluding to above, you probably do not want this transaction-like behavior when you're developing. To support this, I'd recommend implementing DBT run "modes" that allow users to control how DBT behaves with respect to handling this transaction-like behavior.
To start, I'd propose the following modes (not mutually exclusive):
Note: I'm not at all committed to these names and I'm sure we can come up with better ones.
Once the above concepts are implemented, we can add some configuration so that tests can cause "errors" which may or may not halt a DBT run depending upon which run mode we're in.
One could imagine another configuration flag that indicates whether or not tests should be counted as "errors" for the purpose of a dbt run.
The last feature, which I expect people will want, would be the ability to configure tests with severity levels. So we could introduce "log", "warn", and "error" types (or whatever) for tests, then we could update the configuration options so that different test failure levels can be handled differently. In my mind, this feature comes last.
Shooting blind here, but would it be possible to use tagging under the hood somehow to implement this? After running
This would also help in cases where you run successfully, test and find a few models failing tests, fix the issues with the models, and want to only re-run the models that ran successfully but didn't test successfully. In that case you could just run with tag
hey @michael Kaminsky -- this is a pretty significant departure from how dbt currently works. I just took another scan through here, and I think your description of an all-or-nothing build is most closely aligned with how I think this should work. I think there are still some unanswered questions here, and I think the first step in making headway on this issue is probably making sure that we have most/all of our bases covered. Some questions I have:
Great questions @drewbanin ! Going to table bigquery for now and come back to it (I've never used it, so am fuzzy on a bunch of important details that we'd need to solve it).
How should incremental models work in this paradigm?
Great question, and something I hadn't thought of yet. As I've conceptualized it, there are two issues:
Easy, but sort of defeats the purpose of this whole exercise as it doesn't ensure consistency during your builds
There will be some performance penalty for models that are downstream of the incremental model, but it's unclear how much (depends on how big the change is and how the data are distributed).
This will be slow (unsure how slow??) for large tables, though notably not nearly as slow as rebuilding the whole table -- we're basically just making redshift to do a lot of extra disk I/O for nothing. Might actually be an okay compromise, but would want to test it with someone who has bigger-data to get a sense of performance impact.
How do we interleave tests with models?
I don't think we need to implement this yet. I think we should table and come back to it when we want to implement the testing part of this (phase 2).
At what “level” are the old/new relations swapped? Do we rename whole schemas/datasets? Or the individual database relations?
The way I use dbt, I could / would do it at the schema level (dbt-managed schemas are wholly separate from non-dbt managed schemas). From what I've gathered in the slack channel, not everyone follows this rule so we'd probably have to do everything at the relation / model level. LMK if I'm missing something here, though! I'm thinking this would be trivial as long as we keep track of where everything is in the staging schema and where everything "belongs" but let me know if I'm just not thinking of something.
@mikekaminsky I really like your
I'm always super wary of modifying dbt's materialization code, as we've done seemingly innocuous but wholly incorrect things in the past. There's a whole list of things to know like:
There is more to this list, but I just wanted to sketch out the kind of thinking that goes into a feature like this. These constraints are individually workable, but together, make it challenging to design a feature that 1) operates as intended and 2) uses each database to its fullest potential.
In the past, we've built features that weren't implemented on certain databases due to limitations of the database itself. BigQuery, for instance, only got
Last, I agree that we should tackle the model building part of this first, then add testing into the mix separately. That change has more to do with how dbt's graph selector includes nodes in a given run, and should "just work" once this "atomic swap" paradigm is implemented.
PS I think we'll need to make dbt drop these new relations somehow. A failed run should stick around long enough to debug the problem, and then should be easy to clean up with a dbt command IMO.
Lots of good thoughts, @drewbanin!
I'd be interested in moving this forward without bigquery support (especially because of the draconian ddl limits!!)
You definitely have a better perspective on where / how this can go wrong given the diversity of implementations you've seen. I've only really seen a handful (and all of them with redshift), so I just haven't seen these issues crop up. I'll defer to you if this is worth moving forward on!
Regarding having DBT clean up, it sort of depends on how you implement it. We could have DBT drop the old, failed relations, or we could also just let those get blown away when the next successful run happens (if we build everything in a dedicated staging schema like
Another comment incoming on late-binding views ...
When building views in atomic mode, we need to make sure they are not late-binding. Let's imagine we have
In atomic mode, we run
At the end of the run, we run
So the view would need to be "regular" in order for this to work. Since we use late-binding views to get around the
@mikekaminsky I totally buy the premise but I'm unsure about that implementation. For one, relations on Redshift can't be renamed across schemas :)
Will keep this in mind, but I think that we'll likely be best served by late binding views in the general case. Redshift (and Postgres) are really the only databases out there that support bound views, so I'd be hesitant to leverage them as a core part of this strategy.