Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDL-22816 allow discovery of commitless repos #187

Merged
merged 13 commits into from May 9, 2023

Conversation

luandy64
Copy link
Contributor

Description of change

This PR handles Github's error response when we request the commits from an empty repository.

The change is scoped specifically to the Commits stream because this is the only stream I've observed this behavior for.

Manual QA steps

  • Ran the tap in discovery mode and sync mode

Risks

  • Low. This is a blocker that's easy to reproduce and this change allows the tap to run.

Rollback steps

  • revert this branch

@riordan
Copy link

riordan commented Apr 25, 2023

Love to see the moment I'm searching for an issue and the PR's just been opened! 🚀

@luandy64 luandy64 changed the title Tdl 22816 allow discovery of commitless repos TDL-22816 allow discovery of commitless repos May 3, 2023
luandy64 and others added 5 commits May 3, 2023 14:43
The tap syncs from the start_date/bookmark until now() and we don't have
new test data on the test repo. This causes tests to fail after a while.
The ability to shift the bookmark, on a per-stream basis, was already put
in place so we use it to make the state work with the data we have.

`Commits` only has two days of data, so we need to put the start date
before the first day and the bookmark on the second day in order for the
second sync to pull less data

---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
…uests`

---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
This was initially an optional param, but this is the only test to use the
function, so we can update its signature everywhere

---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
@luandy64 luandy64 mentioned this pull request May 3, 2023
Copy link
Contributor

@dmosorast dmosorast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just comments, not blocking, but wanted to get them down while I was looking.

@@ -105,6 +105,10 @@ def raise_for_error(resp, source, stream, client, should_skip_404):
except JSONDecodeError:
response_json = {}

if stream == "commits" and response_json.get("message") == "Git Repository is empty.":
LOGGER.warning("Encountered an empty git repository")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there value in this log line? As-is, it seems unactionable, and just potentially confusing in the logs if the tap can continue successfully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the warning is a little dramatic. I think having the log line is valuable to know that this situation occurs.

The alternative is seeing that some configured repo didn't produce any commit records, I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think logger.info would be useful because it can help support point to lack of discrepancy

calculated_state_as_datetime = state_as_datetime - datetime.timedelta(days=days, hours=hours, minutes=minutes)

start_date_as_datetime = dateutil.parser.parse(start_date)
calculated_state_as_datetime = start_date_as_datetime + datetime.timedelta(days=days, hours=hours, minutes=minutes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this change the expectations in strange ways? Going off of the start date instead of the first sync's final bookmark?

It's a bit tough to follow, but it seems like this was attempting to always re-sync some data on the second sync by starting from the first sync's successful bookmarking and then looking back X days. This now looks forward from start_date, which possibly gets data, but does it still live up to the existing expectations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check with QA about this.

My understanding of the bookmark tests has been that it verifies that the tap can sync from any bookmark. And the test use a bookmark that excludes at least one record to show that it works

luandy64 and others added 3 commits May 4, 2023 10:40
* Whitespace clean up

---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>

* Params get their own lines

---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>

* Comments go above code

---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>

* Fix long lines

---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>

* Update logger to info [skip-ci]

---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>

---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
---------

Co-authored-by: Dylan Sprayberry <28106103+dsprayberry@users.noreply.github.com>
@luandy64 luandy64 merged commit 099df74 into master May 9, 2023
1 check passed
@luandy64 luandy64 deleted the tdl-22816-allow-discovery-of-commitless-repos branch May 9, 2023 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants