Tdl 25859/handle s3 files race condition#67
Merged
rdeshmukh15 merged 18 commits intomasterfrom Nov 20, 2024
Merged
Conversation
RushiT0122
requested changes
Aug 9, 2024
Comment on lines
+43
to
+46
| if s3_file['last_modified'] < sync_start_time: | ||
| state = singer.write_bookmark(state, table_name, 'modified_since', s3_file['last_modified'].isoformat()) | ||
| else: | ||
| state = singer.write_bookmark(state, table_name, 'modified_since', sync_start_time.isoformat()) |
Contributor
There was a problem hiding this comment.
Please add unit test for this change.
RushiT0122
requested changes
Nov 18, 2024
Contributor
RushiT0122
left a comment
There was a problem hiding this comment.
Left some suggestions in-line.
Comment on lines
+35
to
+38
| # Case when file is newer than sync_start_time | ||
| { | ||
| "file_last_modified": datetime(2024, 8, 14, 12, 0, 0), | ||
| "sync_start_time": datetime(2024, 8, 14, 12, 0, 0), |
Contributor
There was a problem hiding this comment.
Fix the comment and indentation.
Suggested change
| # Case when file is newer than sync_start_time | |
| { | |
| "file_last_modified": datetime(2024, 8, 14, 12, 0, 0), | |
| "sync_start_time": datetime(2024, 8, 14, 12, 0, 0), | |
| # Case when file is the same as sync_start_time | |
| { | |
| "file_last_modified": datetime(2024, 8, 14, 12, 0, 0), | |
| "sync_start_time": datetime(2024, 8, 14, 12, 0, 0), |
Comment on lines
+78
to
+79
| now = datetime.now() | ||
| sync_start_time = singer_utils.strptime_with_tz(now.strftime("%Y-%m-%dT%H:%M:%SZ")) |
Contributor
There was a problem hiding this comment.
Below code will better readability.
Suggested change
| now = datetime.now() | |
| sync_start_time = singer_utils.strptime_with_tz(now.strftime("%Y-%m-%dT%H:%M:%SZ")) | |
| now_str = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ") | |
| sync_start_time = singer_utils.strptime_with_tz(now_str) |
| Depending on whether the last_modified date is earlier or later than sync_start_time, | ||
| the bookmark will either be updated to the file's last_modified or the sync_start_time. | ||
| """ | ||
| test_cases = [ |
Contributor
There was a problem hiding this comment.
Separate out test scenarios rather than combining all scenarios in one test. Optionally you can parameterize it if there is a repetition of code. This makes it easier to identify which specific scenario fails if a test does not pass.
RushiT0122
approved these changes
Nov 20, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of change
This PR addresses the issue of S3 file race conditions where the last_modified timestamps of S3 files are being updated while extractions are in progress, causing the bookmark time to advance beyond the current execution time.
Resolution:
sync_start_timeat the beginning of the extraction.last_modifiedis greater than thesync_start_time.sync_start_timeas the bookmark in the state.last_modifiedtimestamp in the state file.Manual QA steps
Risks
Rollback steps