You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue describes some end goals for delta loading and table partitioning support. For the delta load, we would want to run a query that is based on the previous state of the delta load. This is described in "Row wise delta". There is also the possibility to use partitioned tables to manage transformations of data only in the relevant partition. This is described in "Partitioned Tables". "Use Cases" describes a number of examples that use various combinations of the row wise and partitioned loads.
Row wise
Here, some conditional where clause is added to a query depending on an input delta_load_date. If None, this query loads the full source table. If this value is set, the query only loaded the latest data. The task needs to return this value so that the state of the delta load can be stored. Moreover, the output table is not recreated in the delta load but rather appended to.
Given specification of partitioning (e.g., below we partition a table by year), this task will not only create a partitioned table but it will also manage the cache invalidation by partition. For example, if only the 2023 data is updated, then only the 2023 partition requires updating.
Hamilton has a feature that is similar to partition based delta loading. In this case, a source node returns an iterator, subsequent tasks will be called separately for each element of the iterator, multiple iterator inputs are zipped, aggregation tasks can source an iterator and return non-iterated outputs: https://hamilton.dagworks.io/en/latest/concepts/parallel-task/
This issue describes some end goals for delta loading and table partitioning support. For the delta load, we would want to run a query that is based on the previous state of the delta load. This is described in "Row wise delta". There is also the possibility to use partitioned tables to manage transformations of data only in the relevant partition. This is described in "Partitioned Tables". "Use Cases" describes a number of examples that use various combinations of the row wise and partitioned loads.
Row wise
Here, some conditional where clause is added to a query depending on an input
delta_load_date
. If None, this query loads the full source table. If this value is set, the query only loaded the latest data. The task needs to return this value so that the state of the delta load can be stored. Moreover, the output table is not recreated in the delta load but rather appended to.Partitioned Tables
Given specification of partitioning (e.g., below we partition a table by year), this task will not only create a partitioned table but it will also manage the cache invalidation by partition. For example, if only the 2023 data is updated, then only the 2023 partition requires updating.
Use Cases
Use case 1 - Row wise delta load with a manual update of inputs
When the cash is valid, only "new" rows from a source table should be loaded and these should be appended to the output table.
Use case 2 - Row wise delta load to partitioned output
Use case 3 - table to table with partitions managed separately
Use case 4 - partitioned table to partitioned table
The text was updated successfully, but these errors were encountered: