New manifest structure and added support for BQ manifest. #295

mahrsee1997 · 2023-02-09T22:54:22Z

Fixes #149 and #60.

weather_dl/README.md

alxmrs

Made a first pass for a review. I recommend slightly less deviation from the existing class structure.

So far, this looks great! I really like what I've seen in the demo DB :)

weather_dl/README.md

weather_dl/download_pipeline/clients.py

weather_dl/download_pipeline/fetcher.py

weather_dl/download_pipeline/clients.py

weather_dl/download_pipeline/manifest.py

alxmrs · 2023-02-10T00:56:54Z

weather_dl/download_pipeline/manifest.py

+
+    def __init__(self, location: Location) -> None:
+        super().__init__(Location(location[5:]))
+        TABLE_SCHEMA = [


@fredzyda I could use a second pair of eyes on the schema definition.

@TylerTCF may have some opinions, too.

Before we land this change, I was thinking Tyler could build a prototype dashboard out of sample data you produce. That should be a good test of the schema.

I'll check with Tyler one last time. Otherwise, I think we're good to go.

weather_dl/download_pipeline/manifest.py

…from BQ Manifest Client.

Pass manifest instance across pipeline steps instead of dict. Improve mock manifest client & make use of it in tests. Remove None from Status Enum & make set its default value to None in DownloadStatus dataclass.

…) method.

Instead of creating a separate transaction for each stage, now we will just set the stage using set_stage().

…art time of the stage.

alxmrs

More feedback. Overall it's looking good!

weather_dl/download_pipeline/manifest.py

weather_dl/download_pipeline/clients.py

weather_dl/download_pipeline/manifest.py

alxmrs · 2023-02-14T23:09:59Z

weather_dl/download_pipeline/manifest.py

+        if parsed_gcs_path.scheme != 'gs' or parsed_gcs_path.netloc == '':
+            new_status.size = LocalSystemFileSizeStrategy().get_file_size(path)
+        else:
+            new_status.size = GCSBlobSizeStrategy().get_file_size(parsed_gcs_path)


This is much cleaner. However, could we apply something like the strategy pattern on the manifest subclass itself? e.g. the GCS Manifest has a clac_file_size() method that gets overridden? I see no need to create a separate class hierarchy, this one should do fine.

For example, our _read and _update calls use the strategy pattern in my book.

I changed the implementation a bit w.r.t to getting file size. WRYT ?

alxmrs · 2023-02-14T23:12:34Z

weather_dl/download_pipeline/manifest.py

+
+    def __init__(self, location: Location) -> None:
+        super().__init__(Location(location[5:]))
+        TABLE_SCHEMA = [


Before we land this change, I was thinking Tyler could build a prototype dashboard out of sample data you produce. That should be a good test of the schema.

weather_dl/download_pipeline/manifest.py

weather_dl/download_pipeline/parsers.py

Moved the 'config_name' property to the top level of the config dataclass. Few minor changes.

weather_dl/download_pipeline/manifest.py

weather_dl/download_pipeline/pipeline.py

weather_dl/download_pipeline/util.py

alxmrs

I have a few critical points of feedback. Once these are fixed, I'm happy to land these changes. Great work, Rahul!

mahrsee1997 · 2023-02-22T10:18:01Z

Thanks Alex for reviewing.

mahrsee1997 added 2 commits February 9, 2023 22:40

New manifest structure and added support for BQ manifest.

4ce2e63

Merge branch 'main' into dl-bq-manifest

987ba94

mahrsee1997 requested a review from alxmrs February 9, 2023 22:54

alxmrs reviewed Feb 9, 2023

View reviewed changes

weather_dl/README.md Outdated Show resolved Hide resolved

alxmrs reviewed Feb 10, 2023

View reviewed changes

weather_dl/download_pipeline/manifest.py Outdated Show resolved Hide resolved

mahrsee1997 added 9 commits February 10, 2023 10:48

simplified Enum class names & sytle nit: fluent interface

aa69563

Added description for BQ Schema fields.

b2e4e1f

Implemented strategy pattern for getting the file size & remove lock …

3509000

…from BQ Manifest Client.

Changes as follows:

b132048

Pass manifest instance across pipeline steps instead of dict. Improve mock manifest client & make use of it in tests. Remove None from Status Enum & make set its default value to None in DownloadStatus dataclass.

minor fix in from_dict() class method w.r.t to Stage.

fbfdb4a

Added config file name in manifest.

ed0a15a

Added validation for non existed file in LocalSystemFileSizeStrategy(…

dc364fb

…) method.

Implemented set_stage() method in manifest system.

dc8c30f

Instead of creating a separate transaction for each stage, now we will just set the stage using set_stage().

Attempt to reduce the impact of _read() and _update() calls on the st…

245e1aa

…art time of the stage.

mahrsee1997 requested a review from alxmrs February 14, 2023 04:25

mahrsee1997 added 2 commits February 14, 2023 09:34

Avoid making an update call in manifest.transact() call.

f50b37a

Attempt to fix GCS Manifest.

e1dbd67

alxmrs reviewed Feb 14, 2023

View reviewed changes

mahrsee1997 added 4 commits February 15, 2023 08:58

Changes as follows:

0379ad4

Moved the 'config_name' property to the top level of the config dataclass. Few minor changes.

Implementation changes w.r.t. GCS manifest.

b4d62fc

Made changes w.r.t to getting file size for manifest.

0312a03

Fix GCS Manifest lock.

6c945b8

alxmrs reviewed Feb 15, 2023

View reviewed changes

weather_dl/download_pipeline/manifest.py Show resolved Hide resolved

alxmrs reviewed Feb 15, 2023

View reviewed changes

weather_dl/download_pipeline/manifest.py Show resolved Hide resolved

mahrsee1997 added 5 commits February 16, 2023 12:03

Protection against SQL injection.

82bfdbc

Include area in manifest schema.

15e9b38

Moved down the BQ manifest note in README.md

bae619c

Merge branch 'main' into dl-bq-manifest

5eedf12

Changed the type of 'area' field in BQ schema.

2230050

mahrsee1997 requested a review from alxmrs February 21, 2023 17:03

mahrsee1997 added 2 commits February 21, 2023 17:12

Remove support for GCS manifest.

280fe1f

Incorporate new manifest system in ECMWFPublicClient.

b2db4cc

alxmrs reviewed Feb 21, 2023

View reviewed changes

weather_dl/download_pipeline/manifest.py Show resolved Hide resolved

alxmrs reviewed Feb 21, 2023

View reviewed changes

weather_dl/download_pipeline/manifest.py Outdated Show resolved Hide resolved

alxmrs reviewed Feb 21, 2023

View reviewed changes

weather_dl/download_pipeline/manifest.py Show resolved Hide resolved

alxmrs reviewed Feb 21, 2023

View reviewed changes

weather_dl/download_pipeline/pipeline.py Show resolved Hide resolved

alxmrs reviewed Feb 21, 2023

View reviewed changes

weather_dl/download_pipeline/util.py Show resolved Hide resolved

alxmrs approved these changes Feb 21, 2023

View reviewed changes

mahrsee1997 added 3 commits February 22, 2023 07:52

Sanitize BQ manifest _read() inputs while placing select query.

a430b90

Bumped weather-dl version to 0.1.13.

c6461e5

Updated unit tests in util_test.py(weather-dl).

7ebeea1

mahrsee1997 merged commit 197d815 into main Feb 22, 2023

mahrsee1997 deleted the dl-bq-manifest branch February 22, 2023 10:18

This was referenced Feb 22, 2023

weather-dl: Implement BigQuery manifest #60

Closed

Fix gcs manifest #214

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New manifest structure and added support for BQ manifest. #295

New manifest structure and added support for BQ manifest. #295

mahrsee1997 commented Feb 9, 2023

alxmrs left a comment

alxmrs Feb 10, 2023

alxmrs Feb 10, 2023

alxmrs Feb 14, 2023

alxmrs Feb 21, 2023

alxmrs left a comment

alxmrs Feb 14, 2023

alxmrs Feb 14, 2023

mahrsee1997 Feb 15, 2023

alxmrs Feb 14, 2023

alxmrs left a comment

mahrsee1997 commented Feb 22, 2023

New manifest structure and added support for BQ manifest. #295

New manifest structure and added support for BQ manifest. #295

Conversation

mahrsee1997 commented Feb 9, 2023

alxmrs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alxmrs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alxmrs left a comment

Choose a reason for hiding this comment

mahrsee1997 commented Feb 22, 2023