Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent transactions from get_delete_and_insert_queries fail in BigQuery #1109

Closed
github-christophe-oudar opened this issue Aug 22, 2023 · 1 comment
Labels
Bug Something isn't working Triage 👀

Comments

@github-christophe-oudar
Copy link

github-christophe-oudar commented Aug 22, 2023

Describe the bug
The concept of transaction for get_delete_and_insert_queries was introduced in this commit.

However as visible in the documentation:

If a transaction mutates (update or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently. Conflicting transactions are cancelled. Conflicting DML statements that run outside of a transaction are queued to run later, subject to queuing limits.

Therefore if we want to have the delete part we either have:

  • have an outside system to limit concurrency so that only one dbt run can concurrently with that transaction
  • remove the transaction

To Reproduce
Steps to reproduce the behavior:

  1. set up a dbt project using Elementary
  2. Create a simple model
  3. Run concurrently twice the model and hope that they finish around the same time
  4. See error generated by the concurrent transaction on dbt_models that would look like:
/* {"app": "dbt", "dbt_version": "1.6.0", "profile_name": "dbt", "target_name": "prod", "connection_name": "master"} */

    
        begin transaction;
        
            delete from `gcp_project`.`elementary`.`dbt_models`
            where
            metadata_hash is null
            or metadata_hash in (select metadata_hash from `gcp_project`.`elementary`.`dbt_models__tmp_table_XXXX`);
        
        
            insert into `gcp_project`.`elementary`.`dbt_models` select * from `gcp_project`.`elementary`.`dbt_models__tmp_table_YYY`;
        
        commit;

Expected behavior
I would expect to be able to run concurrently 2 dbt run commands without them failing when they happen to start the Elementary data post hook executing get_delete_and_insert_queries macro query concurrently.

Environment (please complete the following information):

  • edr Version: 0.9.2
  • dbt package Version: 0.9.0

Additional context
Slack thread regarding other people affected: https://elementary-community.slack.com/archives/C02CTC89LAX/p1689186975724739

Potential workaround
Override the macro default__get_delete_and_insert_queries by a version without a transaction.

{% macro default__get_delete_and_insert_queries(relation, insert_relation, delete_relation, delete_column_key) %}
    {% set query %}
        {% if delete_relation %}
            delete from {{ relation }}
            where
            {{ delete_column_key }} is null
            or {{ delete_column_key }} in (select {{ delete_column_key }} from {{ delete_relation }});
        {% endif %}
        {% if insert_relation %}
            insert into {{ relation }} select * from {{ insert_relation }};
        {% endif %}
    {% endset %}
    {% do return([query]) %}
{% endmacro %}

However I'm not sure of all the consequences of removing the transaction but since Spark doesn't have transactions, I assume mostly works but it's not "safe"?

@github-christophe-oudar github-christophe-oudar added Bug Something isn't working Triage 👀 labels Aug 22, 2023
@haritamar
Copy link
Collaborator

Hi @github-christophe-oudar !
Thanks for opening this issue and sorry for the super delayed response, we are currently in the process of reviewing our open issues and improving our process there.
Since the issue has been open for over 3 months I'm going to go ahead to close it - please feel free to re-open if still relevant.

Thanks!
Itamar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Triage 👀
Projects
None yet
Development

No branches or pull requests

2 participants