New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TDL-22816 allow discovery of commitless repos #187
TDL-22816 allow discovery of commitless repos #187
Conversation
Love to see the moment I'm searching for an issue and the PR's just been opened! 🚀 |
The tap syncs from the start_date/bookmark until now() and we don't have new test data on the test repo. This causes tests to fail after a while. The ability to shift the bookmark, on a per-stream basis, was already put in place so we use it to make the state work with the data we have. `Commits` only has two days of data, so we need to put the start date before the first day and the bookmark on the second day in order for the second sync to pull less data --------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
--------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
--------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
…uests` --------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
This was initially an optional param, but this is the only test to use the function, so we can update its signature everywhere --------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just comments, not blocking, but wanted to get them down while I was looking.
tap_github/client.py
Outdated
@@ -105,6 +105,10 @@ def raise_for_error(resp, source, stream, client, should_skip_404): | |||
except JSONDecodeError: | |||
response_json = {} | |||
|
|||
if stream == "commits" and response_json.get("message") == "Git Repository is empty.": | |||
LOGGER.warning("Encountered an empty git repository") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there value in this log line? As-is, it seems unactionable, and just potentially confusing in the logs if the tap can continue successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the warning is a little dramatic. I think having the log line is valuable to know that this situation occurs.
The alternative is seeing that some configured repo didn't produce any commit records, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logger.info would be useful because it can help support point to lack of discrepancy
calculated_state_as_datetime = state_as_datetime - datetime.timedelta(days=days, hours=hours, minutes=minutes) | ||
|
||
start_date_as_datetime = dateutil.parser.parse(start_date) | ||
calculated_state_as_datetime = start_date_as_datetime + datetime.timedelta(days=days, hours=hours, minutes=minutes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this change the expectations in strange ways? Going off of the start date instead of the first sync's final bookmark?
It's a bit tough to follow, but it seems like this was attempting to always re-sync some data on the second sync by starting from the first sync's successful bookmarking and then looking back X days. This now looks forward from start_date, which possibly gets data, but does it still live up to the existing expectations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check with QA about this.
My understanding of the bookmark tests has been that it verifies that the tap can sync from any bookmark. And the test use a bookmark that excludes at least one record to show that it works
* Whitespace clean up --------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com> * Params get their own lines --------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com> * Comments go above code --------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com> * Fix long lines --------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com> * Update logger to info [skip-ci] --------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com> --------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
--------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
--------- Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
Description of change
This PR handles Github's error response when we request the commits from an empty repository.
The change is scoped specifically to the
Commits
stream because this is the only stream I've observed this behavior for.Manual QA steps
Risks
Rollback steps