-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: activate_version deletes rows inserted in earlier batches (and deletes despite hard_delete not being set) #2103
Comments
One thing I was unsure of is if it's expected for activate_version to be called multiple times during EL. Is there a chance the tap is the one misbehaving? (although the default still would be wrong in that case) |
Hi @msg555! Thanks for logging and for the detail investigation into the issue. This does seem like a significant problem that could lead to unexpected data loss.
Yeah, I think that's right.
Trying to think what users would experience by changing the default to not hard delete. Would they just start seeing that data is now upserted/soft-deleted instead of removed and would that cause issues for downstream data modeling? I still think we should apply your suggested patch but we should call out this change in the release notes of both the SDK and downstream targets.
I think there would be at most one activate_version message for each stream in the tap, but I could be wrong. What's the tap in question? cc @pnadolny13 Curious if you've seen this problem come up with target-snowflake. And fwiw target-postgres seems to do the right thing. |
Ah, apologies if I've identified the wrong repository. Perhaps the code you linked that's directly in target-snowflake is what is relevant in my case; it still seems to use the <= rather than < operator. Perhaps I should create a new issue over there.
The tap is pipelinewise-tap-mysql==1.5.6 - name: tap-mysql
variant: transferwise
pip_url: pipelinewise-tap-mysql~=1.5.0
config:
host: ${TAP_MYSQL_HOST}
port: ${TAP_MYSQL_PORT}
user: ${TAP_MYSQL_USER}
password: ${TAP_MYSQL_PASSWORD}
filter_dbs: my_db
session_sqls:
- SET @@session.max_statement_time=0
- SET @@session.net_read_timeout=3600
- SET @@session.net_write_timeout=3600
- SET @@session.wait_timeout=28800
- SET @@session.innodb_lock_wait_timeout=3600
select:
... |
The
Thanks! The tap does seem to emit a single message per stream: https://github.com/transferwise/pipelinewise-tap-mysql/blob/572e08a3576702895e2a9edae188773ec9d7a096/tap_mysql/sync_strategies/full_table.py#L137-L138 |
On a second thought, an issue and PR are probably needed for target-snowflake since failing tests are blocking bumping the SDK: MeltanoLabs/target-snowflake#105 |
@edgarrmondragon I havent noticed to me honest. I dont use activate version for anything really. I agree though that seeing hard_delete defaulting to true seems weird. I also dont think In terms of breaking changes I also agree that its better to break someone by getting them back to the behavior they expect vs leaving the bug. |
Singer SDK Version
0.30.0
Is this a regression?
Python Version
3.9
Bug scope
Targets (data type handling, batching, SQL object generation, etc.)
Operating System
Linux
Description
I was attempting to transition to using meltanolabs-target-snowflake (version 0.5.1) from the pipelinewise variant.
When I run EL from a tap_mysql (pipelinewise-tap-mysql) to this target on a table that has 21k rows, I find that after EL completes the destination table only has 1k rows instead. If I turn off
hard_delete
I instead end up with the full 21k rows. From some investigation it appears that this code snippet is the problem:sdk/singer_sdk/sinks/sql.py
Lines 381 to 389 in 299acc0
I see two problems here:
Proposed patch might look like
Loader config
Code
No response
The text was updated successfully, but these errors were encountered: