-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeltaTable Specifications #42
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5f154eb
to
b9df931
Compare
b9df931
to
cd04b9d
Compare
# Conflicts: # src/spetlr/configurator/configurator.py # src/spetlr/delta/delta_handle.py # tests/cluster/delta/test_delta_class.py
263125d
to
9610fe6
Compare
726b67b
to
623ec33
Compare
@LauJohansson I addressed all your comments. Thanks for the thorough review. I left the conversations unresolved, you can close them yourself if you agree. |
LauJohansson
approved these changes
Sep 8, 2023
6de7723
to
6205676
Compare
5f09183
to
edcdb37
Compare
0bf3d46
to
1a4554c
Compare
1a4554c
to
88816be
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Youtube link with introduction to the is PR: https://youtu.be/X57AWD0OsZA
Please approve this PR if you think that
DeltaTableSpec
Abstract
The
DeltaTableSpec
class contains all information about a delta table that can begiven in a
CREATE TABLE
statement.The class can be initialized in pure python, or by parsing a
CREATE TABLE
statement. In addition, the class is able to lift all necessary information from the
spark catalog that fully describe the table. Using these two channels, 1. from code
and 2. from disk, the class can make statements about the degree of agreement
between the two. Crucially, the class can formulate the
ALTER TABLE
statementsthat are necessary to bring the table in spark into alignment with the specification
from code. This is its primary function.
Introduction
Taking a step back from the mechanisms of spark, one could argue that there are
these competing statements that all describe a delta table to some degree:
CREATE TABLE
statementIn order to enable more dynamic analysis of their mutual (dis-)agreements, these
have been extended with the
DeltaTableSpec
class which can exist:DeltaTableSpec(name="...", schema=...)
The class has methods that enable going back and forth between each of these forms:
__init__
andrepr(tbl)
are guaranteed tobe mutual inverses. The result of
eval(repr(tbl))
compares equal to theoriginal object.
DeltaTableSpec.from_sql(str)
will create an instance from sql codetbl.get_create_sql()
will return a fully formed create statement,guaranteed to be the inverse of the above.
DeltaTableSpec.from_path(str)
andDeltaTableSpec.from_name(str)
will readall table details from spark.
tbl.make_storage_match()
will execute the necessary create sql statement tomake the result of the
from_name
call compare equal to the specification intbl
Reference
For a detailed reference, please see the docstrings of each method on the class.
Documentation like the above is being produced, but I really want to get this out into peoples hands after working on it for more than 4 months.