Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add --ddl-file option and support offload to an existing empty table #149

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

nj1973
Copy link
Collaborator

@nj1973 nj1973 commented Apr 11, 2024

DDL file
This PR adds a new option --ddl-file that will write CREATE TABLE DDL to a file and exit the Offload command without copying data. The option supports writing the file to local storage or cloud storage. Based on comments on this PR the DDL file also includes DDL to support the --create-backend-db option.

The basic use case is output the DDL for a table that has not yet been offloaded at all. We've plugged it into the existing flow at the part where table DDL is generated, therefore and SQL commands to create backend databases (--create-backend-database) or drop existing tables (--reset-backend-table) will still be visible in the output. Any command that runs in execute mode will be downgraded to non-execute mode.

You'll notice in the file changes that there are a lot of changes related to execute and dry_run. That is because I needed to move the execute flag from OrchestrationConfig to OffloadOperation. I'd tried to do this previously but was thwarted by Incremental Update. Now that Incremental Update has been removed I was able to make this change. I needed the change because --ddl-file will switch a command from execute=True to execute=False and changing it in config was the wrong thing to do.

Offload to existing table
This PR also adds support for Offloading to a pre-created table - but only if the table is empty and has no Offload metadata.

If the table has data then the normal --reset-backend-table option is required.

If the table is empty but has metadata (for example a sysadmin may have truncated the backend table) then the new --reuse-backend-table option is required.

@nj1973 nj1973 linked an issue Apr 11, 2024 that may be closed by this pull request
@nj1973 nj1973 marked this pull request as draft April 12, 2024 08:48
@nj1973 nj1973 marked this pull request as ready for review April 25, 2024 16:18
@nj1973 nj1973 requested a review from abb9979 April 25, 2024 16:18
@nj1973 nj1973 changed the title feat: Add --ddl-file option feat: Add --ddl-file option and support offload to an existing empty table Apr 25, 2024
src/goe/offload/offload_constants.py Outdated Show resolved Hide resolved
src/goe/offload/operation/ddl_file.py Show resolved Hide resolved
src/goe/config/option_descriptions.py Outdated Show resolved Hide resolved
@abb9979
Copy link
Contributor

abb9979 commented May 10, 2024

Are the trailing spaces in a DDL statement necessary?

image

@abb9979
Copy link
Contributor

abb9979 commented May 10, 2024

One of my tests was to re-use an existing DDL file.

Option error: 'DDL path already exists: /tmp/t3.sql'

Makes me wonder whether there should be a --reuse-ddl-file option?

@nj1973
Copy link
Collaborator Author

nj1973 commented May 10, 2024

Are the trailing spaces in a DDL statement necessary?

Not necessary no, I've not changed the SQL text generation, only redirected it to a file. I suppose we should look to address this though because I wouldn't like trailing spaces in code in my repo.

@nj1973
Copy link
Collaborator Author

nj1973 commented May 10, 2024

One of my tests was to re-use an existing DDL file.

Option error: 'DDL path already exists: /tmp/t3.sql'

Makes me wonder whether there should be a --reuse-ddl-file option?

I don't think so, I think that is one option too many. We already have too many. We should either abort as above or overwrite the file. I chose abort because it is safer.

@abb9979
Copy link
Contributor

abb9979 commented May 10, 2024

Data type modifications are not reflected in the DDL file. For example --variable-string-columns=created ignored and DDL file still has this column as DATETIME.

@abb9979
Copy link
Contributor

abb9979 commented May 13, 2024

Using the --reuse-backend-table and --reset-backend-table options together gives the latter precedence and drops the backend table. I think this combination should be a message to the user to choose the appropriate option only.

@abb9979
Copy link
Contributor

abb9979 commented May 13, 2024

Screen output is minimal and shows nothing for short-circuited (i.e. prevented) offloads. Includes, for example, trying to --reuse-backend-table when it is populated.

@abb9979
Copy link
Contributor

abb9979 commented May 13, 2024

If a pre-existing backend table has a different column type to the frontend table, the mismatch is raised, even if an offload override option is used to bring them into alignment.

@nj1973
Copy link
Collaborator Author

nj1973 commented May 22, 2024

Are the trailing spaces in a DDL statement necessary?

These have been removed.

@nj1973
Copy link
Collaborator Author

nj1973 commented May 23, 2024

Using the --reuse-backend-table and --reset-backend-table options together gives the latter precedence and drops the backend table. I think this combination should be a message to the user to choose the appropriate option only.

Fixed, thanks

@nj1973
Copy link
Collaborator Author

nj1973 commented May 23, 2024

Screen output is minimal and shows nothing for short-circuited (i.e. prevented) offloads. Includes, for example, trying to --reuse-backend-table when it is populated.

Fixed, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decouple table creation and data loading
2 participants