New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New manifest structure and added support for BQ manifest. #295
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a first pass for a review. I recommend slightly less deviation from the existing class structure.
So far, this looks great! I really like what I've seen in the demo DB :)
|
||
def __init__(self, location: Location) -> None: | ||
super().__init__(Location(location[5:])) | ||
TABLE_SCHEMA = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fredzyda I could use a second pair of eyes on the schema definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TylerTCF may have some opinions, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we land this change, I was thinking Tyler could build a prototype dashboard out of sample data you produce. That should be a good test of the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check with Tyler one last time. Otherwise, I think we're good to go.
…from BQ Manifest Client.
Pass manifest instance across pipeline steps instead of dict. Improve mock manifest client & make use of it in tests. Remove None from Status Enum & make set its default value to None in DownloadStatus dataclass.
Instead of creating a separate transaction for each stage, now we will just set the stage using set_stage().
…art time of the stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More feedback. Overall it's looking good!
if parsed_gcs_path.scheme != 'gs' or parsed_gcs_path.netloc == '': | ||
new_status.size = LocalSystemFileSizeStrategy().get_file_size(path) | ||
else: | ||
new_status.size = GCSBlobSizeStrategy().get_file_size(parsed_gcs_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much cleaner. However, could we apply something like the strategy pattern on the manifest subclass itself? e.g. the GCS Manifest has a clac_file_size()
method that gets overridden? I see no need to create a separate class hierarchy, this one should do fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, our _read
and _update
calls use the strategy pattern in my book.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the implementation a bit w.r.t to getting file size. WRYT ?
|
||
def __init__(self, location: Location) -> None: | ||
super().__init__(Location(location[5:])) | ||
TABLE_SCHEMA = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we land this change, I was thinking Tyler could build a prototype dashboard out of sample data you produce. That should be a good test of the schema.
Moved the 'config_name' property to the top level of the config dataclass. Few minor changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few critical points of feedback. Once these are fixed, I'm happy to land these changes. Great work, Rahul!
Thanks Alex for reviewing. |
Fixes #149 and #60.