Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync all fields #18

Merged
merged 16 commits into from Nov 12, 2020
Merged

Sync all fields #18

merged 16 commits into from Nov 12, 2020

Conversation

luandy64
Copy link
Contributor

@luandy64 luandy64 commented Nov 11, 2020

Description of change

LinkedIn made a change to their API to limit the number of fields we can sync. This PR allows you to sync all fields for the ad_analytics_by_campaign stream.

The current implementation of sync_endpoint() has 3 jobs:

  • Get a page of records from the API
  • transform_json()
  • process_records()

This PR breaks ad_analytics_by_campaign out of that flow, but tries to mirror it as closely as possible.

The only change is to the "Get a page of records from the API" step. Depending on the number of fields selected, we make multiple requests to get the data and then merge these requests into the format that transform_json() expects.

Manual QA steps

  • See Add date windowing #17
  • Ran the tap selecting the same fields as the last test in Add date windowing #17, ensured this version produced the same records
  • Ran the tap selecting all fields, ensured the fields from the previous sync look the same
  • There are some unit tests for the new helper functions

Risks

  • Low because we compared the outputted records to what the previous version of the tap produced
  • One thing came up in testing: Both syncs saw cost_in_local_currency but got different values 49.02123 | 49.02
    • Rounding is out of our control here
  • More concerning is that a more recent sync saw a lower number: Both syncs saw approximate_unique_impressions but got different values 3639 | 3638

Rollback steps

  • revert this branch

# we mirror that here
transformed_data = transform_json({data_key: list(raw_records.values())},
stream_name)[data_key]
if not transformed_data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check earlier if raw records is empty


return total_records, max_bookmark
def get_next_url(data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially reuse this method in their code too...

@luandy64 luandy64 merged commit 4d1ba03 into add-date-windowing Nov 12, 2020
@luandy64 luandy64 deleted the sync-all-fields branch November 12, 2020 19:35
luandy64 added a commit that referenced this pull request Nov 12, 2020
* Remove formatting a string

* Add new function to date window the ad analytics sync

* Use new sync function

* Reduce the number of times we write the schema to once per selected stream

* Update windowing logic to work for X days

* Make date_window_size configurable

* Make pylint happy

* Update and clarify a comment

* Sync all fields (#18)

* Remove restriction on the number of fields you can select

* Add a new function that mirrors the functionality of sync_endpoint

* Add function to chunk fields, add unit test

* Add unit tests for get_next_url

* Add shift_sync_window, add unit test for shift_sync_window

* Add merge_responses, add unit tests for merge_responses

* WIP: requesting <20 fields works as expected

* WIP: get bookmarks working

* Make pylint happy

* Clean up comments

* Refactor start_date to last_datetime for clairity

* Finish refactor

* Fix date time parsing bug from refactor

* Make pivot and pivot_value automatic for ad_analytics_by_campaign

* Add optional param as a test seam

* Add some tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants