Skip to content

Conversation

@aosingh
Copy link
Member

@aosingh aosingh commented Jul 27, 2022

In dbt snapshots, rows that are deleted from the source query are not invalidated by default.
With the config option invalidate_hard_deletes=True, dbt can track rows that no longer exist.

  {{ config(
      check_cols='all', 
      unique_key='id', 
      strategy='check',
      target_database=database, 
      target_schema=schema,
      invalidate_hard_deletes=True
  )}}

The fix was to change a line in the snapshot macro because SQL generated raised ORA-00905: missing keyword

The following line

LEFT JOIN deletes_source_data AS source_data 
      ON snapshotted_data.dbt_unique_key = source_data.dbt_unique_key

is changed to

LEFT JOIN deletes_source_data source_data 
      ON snapshotted_data.dbt_unique_key = source_data.dbt_unique_key

Also added a new test case in the suite to test invalidation of hard deletes in dbt snapshots

- Fixed macro in snapshot.sql
- Added a test case to test invalidation of hard deletes in snapshots
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 27, 2022
@thbaby
Copy link

thbaby commented Jul 27, 2022

I need a bit more information to review this transaction. What is a "snapshot delete"? What does "rows that are deleted from the snapshot query" mean?

@aosingh
Copy link
Member Author

aosingh commented Jul 27, 2022

Thanks @thbaby

dbt snapshots implement type 2 Slowly Changing Dimension over mutable source tables to identify how a table row changes over time.

When you run dbt snapshots command

  • On the first run: dbt will create the initial snapshot table — this will be the result set of your select statement, with additional columns including dbt_valid_from and dbt_valid_to. All records will have a dbt_valid_to = null.

  • On subsequent runs: dbt will check which records have changed or if any new records have been created:

    • The dbt_valid_to column will be updated to the current snapshot time for any existing records that have changed (updated or deleted)
    • The updated record and any new records will be inserted into the snapshot table. These records will now have dbt_valid_to = null

dbt snapshot command first creates a global temporary table to identify the rows which have changed i.e. inserts, updates or deletes

Then it uses the following statement to merge using the temporary table.

MERGE INTO dbt_test.cc_all_snapshot d
    USING o$pt_cc_all_snapshot182811 s
        ON (s.dbt_scd_id = d.dbt_scd_id)
    WHEN MATCHED
        THEN UPDATE
        SET dbt_valid_to = s.dbt_valid_to
        WHERE d.dbt_valid_to IS NULL
            AND s.dbt_change_type IN ('update', 'delete')
    WHEN NOT MATCHED
        THEN INSERT (d.id, d.name, d.some_date, d.dbt_updated_at, d.dbt_valid_from, d.dbt_valid_to, d.dbt_scd_id)
        VALUES (s.id, s.name, s.some_date, s.dbt_updated_at, s.dbt_valid_from, s.dbt_valid_to, s.dbt_scd_id)
        WHERE s.dbt_change_type = 'insert'

For details of the SQL code generated at each step, you can also refer to a documented example in our dbt test project.

Let me know if you have any questions or feedback

Copy link

@ramapemmaraju ramapemmaraju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

@aosingh aosingh merged commit a1f1d4d into main Aug 3, 2022
@aosingh aosingh deleted the feature/snapshot_config branch August 3, 2022 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants