Skip to content

TDL-16008 add new search types #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 11, 2023
Merged

Conversation

kethan1122
Copy link
Contributor

@kethan1122 kethan1122 commented Apr 4, 2023

Description of change

This PR adds support for new search type news, googleNews and discover doc

I thought this is quite simple to add these search_types by appending them to sub_types attribute defined in abstract.py but there are few exceptions for below streams:

  1. performance_report_device: This stream doesn't support discover search_type since Requests for Discover cannot be grouped by device
  2. performance_report_query: This stream doesn't support discover and googleNews search_types since Query is an invalid argument while grouping data for discover and googleNews
  3. performance_report_custom: This stream doesn't support discover and googleNews search_types when user selects discover and query fields for this stream. So, these are being filtered out during making payload for this stream.

Note: This change will be released as major release because of two reasons

  1. This change would spike up the records extraction and might impact the standard customers and also increase their DWH costs.
  2. We are planning for GA release which requires a minimum of 1.x version.

Manual QA steps

  • Ran both discover and sync to make sure extractions ran successfully

Risks

Rollback steps

  • revert this branch

CHANGELOG.md Outdated
@@ -1,5 +1,8 @@
# Changelog

## 1.0.0
* Add support for new search types news, googleNews, discover [#39](https://github.com/singer-io/tap-google-search-console/pull/39)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also list (as another point here) that we're major version changing for general release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@@ -89,7 +89,7 @@ class IncrementalTableStream(BaseStream, ABC):
forced_replication_method = "INCREMENTAL"
replication_key = "date"
pagination = "body"
sub_types = ["web", "image", "video"]
sub_types = ["web", "image", "video", "news", "googleNews", "discover"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to alphabetize these for long-term maintainability? If there's not a specific reason, I think we should given that it's used in multiple places in performance_reports as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Excluding discover sub_type since Requests for Discover cannot be grouped by device.
# this also overwrites the sub_types attribute declared for class IncrementalTableStream
# in abstract.py
sub_types = ["web", "image", "video", "news", "googleNews"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous sub_types comment

# grouping data for discover and googleNews
# this also overwrites the sub_types attribute declared for class IncrementalTableStream
# in abstract.py
sub_types = ["web", "image", "video", "news"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous sub_types comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 169 to 172
# Skip the extraction for type discover
# Requests for Discover cannot be grouped by device
if sub_type == "discover" and "device" in self.body_params["dimensions"]:
self.body_params["dimensions"].remove("device")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing. We remove device and don't appear to skip extraction of discover type; is there something I'm missing? Is there documentation that shows discovery can't be grouped by device, and if so, can we add that to the relevant Jira or the code comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, updated the comment. I couldn't find any documentation on webmasters but these exceptions were found while testing. I have seen few issues reported on stackoverflow for the same

Comment on lines 169 to 172
# Skip the extraction for type discover
# Requests for Discover cannot be grouped by device
if sub_type == "discover" and "device" in self.body_params["dimensions"]:
self.body_params["dimensions"].remove("device")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this removing a selected field for the purpose of the request? If so, I think we should at a minimum write a log line that indicates that we're doing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't removing the selected field. So, for custom reports we consider all the dimensions(page, country, device, query, date) where as for performance_report_device we consider only device as dimension.

If user selects the device field in performance_report_custom and while replicating data for discover sub_type we end up encountering the error Requests for Discover cannot be grouped by device, to avoid this we simple remove device dimension from dimensions_list for discover sub_type. and this dimension_list is passed as param in API request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very similar to not considering discover sub_type for performance_report_device stream here

@kethan1122 kethan1122 requested a review from dsprayberry April 4, 2023 16:51
@kethan1122 kethan1122 merged commit 6f98c43 into master Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants