Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added labels and tags fetching for BigQuery #93

Merged
merged 2 commits into from
Nov 19, 2020

Conversation

manuelzander
Copy link
Contributor

  • Using datacatalog_v1 Python client to fetch tags for a given BQ table
  • Added the tags data to TableMetadata and ColumnMetadata where appropriate
  • Also added labels to TableMetadata

@rsyi rsyi marked this pull request as draft November 11, 2020 19:31
Copy link
Owner

@rsyi rsyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM already, @manuelzander. Thanks for tackling this. :) Let me know when it's ready to come out of Draft, and I'll run my own local tests and merge.

if "tags" in tags_dict:
for tag in tags_dict["tags"]:
if not "column" in tag:
table_tag = tag
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I didn't know you could do not x in y. I always write it x not in y. Learning something new every day :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change it to make it clearer for everybody :)

def test_table_part_of_table_date_range(self, mock_build):
mock_build.return_value = MockBigQueryClient(
@patch("whale.extractor.base_bigquery_extractor.datacatalog_v1")
def test_table_part_of_table_date_range(self, mock_datacatalogue, mock_bigquery):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BRITISH ENGLISH!? UNACCEPTABLE!
Just kidding, please don't change it. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh haha my bad, I mixed it up 😂 Google style...

@manuelzander
Copy link
Contributor Author

LGTM already, @manuelzander. Thanks for tackling this. :) Let me know when it's ready to come out of Draft, and I'll run my own local tests and merge.

@rsyi Are you happy with labels being stored as dict e.g. {"label_1": "this is a label", ...}, and tags as Tag structure, see official JSON representation?
@biomunky FYI

@rsyi
Copy link
Owner

rsyi commented Nov 12, 2020

Yeah I was wondering about this too, but it seemed like bigquery tags are more like weird, nested json annotations than they are plain string tags, so we can't just throw away everything but the field names...

I think this is fine for now. Another option (though a bit more work) could be to create a separate BigqueryTagExtractor class that dumps this data either into a BigqueryTableTag or BigqueryColumnTag model, but feel free to punt on that if it feels like overkill. I'm not one for over-abstracting when it's unlikely the abstraction will ever serve a second purpose in the future. :) @manuelzander let me know what you think!

@manuelzander
Copy link
Contributor Author

@rsyi I've just added an additional test (no tags for a given BQ table).
I'm happy with this to be merged now. Having something like a BigqueryTagExtractor could be useful in case there would be Spanner tags as well - so maybe rather a GoogleCloudTagExtractor - BUT apparently Spanner does currently not support tags natively, so we don't really need this for now 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants