Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Direct INSERT in Final Table #35

Open
flarco opened this issue Oct 25, 2023 Discussed in #33 · 4 comments
Open

Allow Direct INSERT in Final Table #35

flarco opened this issue Oct 25, 2023 Discussed in #33 · 4 comments
Labels
enhancement New feature or request

Comments

@flarco
Copy link
Collaborator

flarco commented Oct 25, 2023

This would only apply to snapshot mode, or incremental mode with only update_key (no primary_key).

See #33 (comment) for more details.

Changes needed in WriteToDb:

  • Add environment variable to capture setting. Something like SLING_ALLOW_DIRECT_INSERT=TRUE.
  • Add error handling, if mode is not snapshot or incremental mode with only update_key.
    • Add config method to determine if direct insert is allowed: DirectInsertAllowed()
  • Ensure PreSQL is executed before direct insert. Currently, is executed after successful insert into temp table
  • Clean up and reorg WriteToDb a bit for easier read
@flarco flarco added the enhancement New feature or request label Oct 25, 2023
@temminks
Copy link

Hi, I'd like to work on that. Do you see any new dependencies with other open issues?

@flarco
Copy link
Collaborator Author

flarco commented Jan 30, 2024

Hey @temminks, thanks for volunteering. As for your question, no, there aren't any pending dependencies that would block this.

This one is a bit complex, as it involves breaking into pieces the WriteToDB function. It's one of the more involved pieces of the Sling codebase, since there are specific sequences of transactions that occur in order (depending on the mode), such as:

  • drop temp table if exists
  • create temp table
  • insert into temp table
  • validate temp table
  • validate columns names
  • execute pre-sql
  • create final table if not exists
  • insert into final table
  • execute post-sql
  • drop temp table

For this new feature, we'd need to modify the sequence depending on whether SLING_ALLOW_DIRECT_INSERT is enabled, to just do:

  • validate columns names
  • execute pre-sql
  • create final table if not exists
  • insert into final table
  • execute post-sql

So I figure it's a good opportunity to break up/clean up WriteToDB into a few small functions (as it is a bit of a mess imho). The small functions could be used/re-used in a neater way.

Do you have any thoughts on how you'd approach this?

@temminks
Copy link

temminks commented Feb 8, 2024

Hi @flarco, I was looking into that function the other day as I was a bit surprise that I have to grant CREATE privileges on a schema to the user that write a table with Sling, instead of just using TEMPORARY (both in PostgreSQL). This is probably to make sure that the implementation works across databases where TEMPORARY is not a privilege.

I actually tried to play around with pre-sql to create a temporary table there and pass that to table_tmp but this did not work as Sling was not satisfied with the temporary table not having a schema.

One thing I'm not sure of right now is how you would read the environment variable: do you want to do this inside the function, just using os.Getenv() or should this be part of the Config struct?

As for an approach, the steps

  • drop temp table if exists
  • create temp table
  • insert into temp table
  • validate temp table
  • AddCleanupTask

appear to be one logical unit: These might become smaller functions themselves but in the end WriteToDB should call a function createTempTable() if SLING_ALLOW_DIRECT_INSERT=TRUE. I'll have to see where it makes sense to extract smaller functions to make the code more readable or simply to be able to test a step.

The step validate columns names consists of adjustments of column types and adding columns. That adding part could be tricky: you add new columns based on the temporary table. We should be able to use the results from the data sampling, though, so this must not be part of createTempTable().

Executing pre-sql doesn't have to be changed, I think. This is just running the statement and does not depend on the temporary table.

I think I'll have to start playing around a bit before I see how this changes the actual part where you write to the target table.

@gu-xie
Copy link

gu-xie commented Feb 22, 2024

+1 on this as we have a restriction wherein we cannot create tables in the target database.
Table creation / modification is managed via CICD for deployment.

Would value a method to avoid creating a temp table and perform a direct insert/truncate insert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants