New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update bookmarking for interrupted sync #166
Conversation
5df807f
to
3d91ae9
Compare
3d91ae9
to
0156a80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look okay, but the tests could be improved I think
tap_shopify/streams/base.py
Outdated
updated_at_max = last_sync_interrupted_at or updated_at_min + \ | ||
datetime.timedelta(days=date_window_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style suggestion:
https://pep8.org/#break-before-or-after-binary-operator
updated_at_max = last_sync_interrupted_at or updated_at_min + \ | |
datetime.timedelta(days=date_window_size) | |
updated_at_max = (last_sync_interrupted_at | |
or updated_at_min + datetime.timedelta(days=date_window_size)) |
The preview doesn't look great, but the line is 91 characters long, so it should be fine
@mock.patch("singer.utils.now", return_value=strptime_to_utc(NOW_TIME)) | ||
def test_since_id_and_updated_at_max_deleted(self, mock_now, mock_get_query_params, mock_update_bookmark, mock_get_bookmark, mock_call_api): | ||
"""Verify after successful sync since_id and updated_at_max keys are deleted from the state""" | ||
mock_call_api.return_value = [CUSTOMER_1, CUSTOMER_2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this return value set here, but some of the other patches are set above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pattern was duplicated from the existing test for conistency, will refactor the code at both the places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion on consistency is only copy it when it's good. Otherwise, we should improve the code when we touch it
"since_id": 15, | ||
"updated_at_max": "2021-04-26T00:00:00.000000Z"}}} | ||
|
||
customers = list(CUSTOMER_OBJECT.get_objects()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we don't use customers
after this. Can this just be
CUSTOMER_OBJECT.get_objects()
@mock.patch("singer.utils.now", return_value=strptime_to_utc(NOW_TIME)) | ||
def test_interrupted_sync(self, mock_now, mock_get_query_params, mock_update_bookmark, mock_get_bookmark, mock_call_api): | ||
"""Verify if sync is interrrupted twice in a row then since_id and updated_at_max keys are not deleted from the state""" | ||
mock_call_api.side_effect = Exception("Dummy exception...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it.
try: | ||
customers = list(CUSTOMER_OBJECT.get_objects()) | ||
except: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we should catch our dummy exception because in the current implementation, if an exception is thrown or not, we handle it the same.
How do we distinguish a syntax error in the test from the dummy exception, for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added dummy exception to distinguish dummy exception from code exception and used it as a side effect.
def test_interrupted_sync(self, mock_now, mock_get_query_params, mock_update_bookmark, mock_get_bookmark, mock_call_api): | ||
"""Verify if sync is interrrupted twice in a row then since_id and updated_at_max keys are not deleted from the state""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems poorly implemented.
- We set the state
- We mock all of the code that would alter the state
- We assert the state is what we expect it to be
Shouldn't we start with an empty state, mock the API call so that state looks like
{"bookmarks": {
"currently_sync_stream": 'customers',
"customers": {
"updated_at": "2021-03-27T00:00:00.000000Z",
"since_id": 15,
"updated_at_max": "2021-04-26T00:00:00.000000Z"}}}
then force the tap to throw an exception, and then assert the state has a since_id
and an updated_at_max
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code implementation, we are poping since_id
and updated_at_max
. This test is verifying that we are not poping these intermediate bookmarks too early before we successfully sync till updated_at_max
in the bookmark. Early deletion of intermitent bookmarks will cause the undesired results in case sync is interrupted.
cda0f9a
to
7fa474d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one style change. It's up to you if you want to include it
Co-authored-by: Andy Lu <andy@stitchdata.com>
* update bookmarking for interrupted sync * add unit tests * minor fix * fix reviews * fix style suggestion Co-authored-by: Andy Lu <andy@stitchdata.com> * bump version 1.7.3 --------- Co-authored-by: RushiT0122 <rtodkar@stitchdata-talend.com> Co-authored-by: Andy Lu <andy@stitchdata.com>
Description of change
Consider we have following records to sync from the source,
(order_id | replication_key):
Before sync starts,
Scenario
On completion of next sync,
In this scenario, on resuming sync updated record id#7 with (replication_key=109) will not sync with current implementation.
Solution:
Add
updated_at_max
value along withsince_id
and on resuming interrupted sync set window between (bookmark, updated_at_max).On completion of next sync,
since_id=8
from the bookmarkThis solution will make sure on resuming the interrupted sync, record id#7 (replication_key=109) will be synced.
QA steps
Risks
Rollback steps