Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix landings stream to be incremental #58

Merged
merged 3 commits into from
May 17, 2022

Conversation

loeakaodas
Copy link
Contributor

@loeakaodas loeakaodas commented Feb 8, 2022

Description of change

Manual QA steps

  • ran tap locally
  • ran against singer-check-tap and target-stitch --dry-run
  • ran tap-tester locally
  • tests passing in circle

Risks

  • minimal, the stream will now be incremental

Rollback steps

  • revert this branch

LOGGER.info('All landings query')

page_count = 2
sort = 'landed_at,asc'
sort = 'submitted_at,asc' # sorting isn't supported with `after` and defaults to submitted_at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that submitted_at and landed_at are always the same? If not, then shouldn't the replication key be submitted_at?

Copy link
Contributor Author

@loeakaodas loeakaodas Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are definitely not the same, at least in the test data submitted_at is always greater than landed_at. I mimicked the setup of the answers stream for making the landings stream incremental and used landed_at as the replication key. This was because both streams use get_form_responses method from http.py to call the API.

IDS.LANDINGS: {"replication_method": "INCREMENTAL", "replication_keys": ["landed_at"]},
IDS.ANSWERS: {"replication_method": "INCREMENTAL", "replication_keys": ["landed_at"]},

I'm not sure why landed_at was chosen for the answers stream, but maybe there was a case where it would be null if a form isn't submitted, possibly? The answers schema doesn't even have submitted_at as one of it's field.

To keep it consistent with the tap's behavior on the answers stream I think we should leave it as is, but I did test it with tap-tester locally and all tests were passing if I switched the landings stream to use submitted_at as the replication key.

A more robust solution would be to switch both the landings and answers stream to use submitted_at as the replication key given the API limitations when using the after query param. This would require a schema change for the answers stream and might run into data issues if submitted_at can be null.

@MarcosPala
Copy link

Any status on this?

@loeakaodas
Copy link
Contributor Author

Any status on this?

@KrisPersonal @luandy64 could this be merged? it fixes an issues for clients in Stitch and is passing tests.

@luandy64 luandy64 merged commit 5dd3da1 into master May 17, 2022
@luandy64 luandy64 deleted the fix-landings-stream-to-be-incremental branch May 17, 2022 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stitch/Typeform connexion turn into full-synchronization
4 participants