Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Standard configurable load methods #1893

Merged
merged 13 commits into from
Sep 13, 2023
Merged

Conversation

pnadolny13
Copy link
Contributor

@pnadolny13 pnadolny13 commented Aug 3, 2023

Closes #1783

I was attempting to add this overwrite feature to target-pinecone MeltanoLabs/target-pinecone#4 so I decided it would probably best to draft it in the SDK while I was at it.

  • Adds a standard config load_method for: append-only, upsert, overwrite
  • Implements the overwrite workflow in SQL targets
  • By default overwrite isnt supported until the target developer activates it. Is this what we want or should it default to True?
  • I wasnt sure if the truncate logic should go in setup of the SQLSink class or prepare_table. For now I chose prepare_table
  • Ideally a target would throw a nice NotImplementedError if the user choses one of the load methods that arent supported but I'm not totally sure how to do that as of right now. It might require more thought.

@edgarrmondragon let me know what you think overall and also specifically with the naming of everything.

cc @tayloramurphy in case you have opinions on naming or default behavior.

@codecov
Copy link

codecov bot commented Aug 3, 2023

Codecov Report

Merging #1893 (1307a37) into main (fca070e) will increase coverage by 0.03%.
Report is 1 commits behind head on main.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1893      +/-   ##
==========================================
+ Coverage   87.01%   87.04%   +0.03%     
==========================================
  Files          59       59              
  Lines        5097     5109      +12     
  Branches      826      827       +1     
==========================================
+ Hits         4435     4447      +12     
  Misses        465      465              
  Partials      197      197              
Files Changed Coverage Δ
singer_sdk/connectors/sql.py 82.83% <100.00%> (+0.34%) ⬆️
singer_sdk/helpers/capabilities.py 92.42% <100.00%> (+0.62%) ⬆️
singer_sdk/target_base.py 92.33% <100.00%> (+0.02%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@tayloramurphy
Copy link
Collaborator

@pnadolny13 I like overwrite! It matches Snowflake too https://docs.snowflake.com/en/sql-reference/sql/insert#optional-parameters

Copy link
Collaborator

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnadolny13 I like the approach 👌. Perhaps we can add a test for the SQLite sample.

@pnadolny13 pnadolny13 marked this pull request as ready for review August 8, 2023 18:17
@pnadolny13
Copy link
Contributor Author

@edgarrmondragon I added a test for the sqlite sample, and refactored a little. I couldnt get custom query parameterization to work for some reason so I used f strings and had to add noqa flags for ruff to pass.

@pnadolny13
Copy link
Contributor Author

pnadolny13 commented Sep 7, 2023

@edgarrmondragon I swapped the logic in this to use a drop/recreate flow instead of truncating. I started to implement the rename + rollback workflow but I wasnt able to find anywhere to easily put logic to rollback if an exception is thrown. I tried using the clean_up method which worked when the sync was successful to remove the backup table before exiting but I couldnt figure out how to easily catch exceptions and revert to the backup table prior to exiting. I create a draft PR to this branch with the code I was tinkering with.

@edgarrmondragon edgarrmondragon changed the title feat: standard configurable load methods feat: Standard configurable load methods Sep 12, 2023
@edgarrmondragon
Copy link
Collaborator

@pnadolny13 I think I was probably overstating the risk of data loss caused by pipeline failures when syncing in overwrite mode. If the user is OK with overriding existing data, then they're probably also OK-ish with losing data if the pipeline fails, as long as they can load data on the next run.

@edgarrmondragon
Copy link
Collaborator

@edgarrmondragon I swapped the logic in this to use a drop/recreate flow instead of truncating. I started to implement the rename + rollback workflow but I wasnt able to find anywhere to easily put logic to rollback if an exception is thrown. I tried using the clean_up method which worked when the sync was successful to remove the backup table before exiting but I couldnt figure out how to easily catch exceptions and revert to the backup table prior to exiting. I create a draft PR to this branch with the code I was tinkering with.

@pnadolny13 Let'd punt the rollback support and create an issue for it. We'll worry about it if users are annoyed by losing data 🙂

@pnadolny13 pnadolny13 merged commit 10b61d2 into main Sep 13, 2023
26 checks passed
@pnadolny13 pnadolny13 deleted the standardize_load_methods branch September 13, 2023 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: SQL targets should allow configurable insert vs upsert behavior by default
3 participants