-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup GitHub Stats #2400
Cleanup GitHub Stats #2400
Conversation
app/models/maintenance_stats/queries/github/commit_count_query.rb
Outdated
Show resolved
Hide resolved
app/models/maintenance_stats/queries/github/commit_count_query.rb
Outdated
Show resolved
Hide resolved
|
||
resp = @client.contributor_stats(params[:full_name]) | ||
if @client.last_response.status == 202 | ||
sleep 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 seconds of total possible sleep time sounds like a lot, although I'm not sure what this is waiting on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This request to GitHub will often return a 202
instead of the actual data to let GitHub process the information needed for the request. This is there to retry the call after a short wait, not ideal. It would be better if this was handled in the Octokit client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a simple increasing backoff?
end | ||
end | ||
|
||
context "repository with no commits" do | ||
let(:repository) { create(:repository, full_name: 'buddhamagnet/heidigoodchild') } | ||
let(:query_results) do | ||
VCR.use_cassette('github/empty_repository', :match_requests_on => [:body]) do | ||
VCR.use_cassette('github/empty_repository', :match_requests_on => [:method, :uri, :body, :query]) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything special that needs to be done to re-record these cassettes? If so, now might be good time to document it for posterity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would be a good spot for information like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past I've just plopped it in comments near the VCR calls
…s instead of class variables
…w to handle VCR changes for maintenance stat queries
dd0e863
to
fea6391
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, but I had a few questions about RepoReleasesQuery
|
||
result.data.repository&.releases.nodes.each do |release| | ||
publish_date = DateTime.parse(release.published_at) | ||
if publish_date > end_date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this backwards? I feel like it should be publish_date < end_date
Also, should it be <=
so end_date
is inclusive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rename these variables. The releases are going from newest to oldest and end_date
is the earliest date we would care about. Like get all releases from the last year, end_date
would be one year ago. So if the current release's date is after end_date
then add it to the list and keep going to the next release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, makes sense. The name was confusing, for sure.
…_date and handle errors within the query class as well as just send back release data instead of the entire graphql result
:v4 | ||
end | ||
|
||
def query(params: {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made changes through here to make this a little clearer and to actually gather and return release data correctly. Now it actually takes into account the publish date and exits the loop correctly. It also is just sending back the release data now instead of the entire graphQL response and I've updated the stat to take that into account.
|
||
resp = @client.contributor_stats(params[:full_name]) | ||
if @client.last_response.status == 202 | ||
sleep (count + 1) * 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to be a quicker second try and a small backoff after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on ditching the + 1
to retry after 500ms the first time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count
would be 0
the first time through that's why I added the +1. I can redo count
to start at 1 to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, that's fine!
0 | ||
end | ||
end.sum | ||
@results.select{ |contributor| contributed?(contributor, weeks_ago) }.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-arranged this section to try and be a little clearer.
end | ||
|
||
def contributed?(contributor, weeks_ago) | ||
# todo: verify weeks are actually within the time period? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it is assuming the last week in the array is the latest week. I might go back and rewrite this to actually check the timestamps of the week instead.
{ | ||
return {} if @results.nil? | ||
|
||
last_week_releases = @results.select {|release| DateTime.parse(release.published_at) > @now - 1.week}.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been rewritten to use just to read an array of releases instead of the entire graphQL result set.
last_two_month_releases = @results.data.repository.releases.nodes.select {|release| release.published_at > @now - 60}.count | ||
last_year_releases = @results.data.repository.releases.nodes.select {|release| release.published_at > @now - 365}.count | ||
{ | ||
return {} if @results.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a slight change. Before we were sending back 0
releases, but our original stance was to return nothing for a stat if we didn't have the data for it. I think this is more correct.
@@ -24,7 +26,7 @@ def get_stats | |||
private | |||
|
|||
def last_release_date | |||
@results.data.repository.releases.nodes.first&.published_at | |||
@results&.map { |node| DateTime.parse(node.published_at) }.first&.strftime('%FT%TZ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strftime
is to keep the publish date in the same format as what it was previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.iso8601
is there if you'd rather use that than strftime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iso8601
was returning something slightly different than what GitHub is sending back. I'm trying to keep the formatting the same as the stats that are already in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple more comments, but nothing blocking - this is looking nice.
cursor = result.data.repository.releases.page_info.end_cursor | ||
|
||
# initalize releases if we have not started gathering data | ||
releases ||= [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 38 releases = []
would make this unnecessary, I think? Unless it's important to return nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using nil
to indicate no data was returned from the API. At this line we have a response from GitHub so I want to initialize to an empty array unless this is after the first page of information.
|
||
resp = @client.contributor_stats(params[:full_name]) | ||
if @client.last_response.status == 202 | ||
sleep (count + 1) * 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on ditching the + 1
to retry after 500ms the first time?
@@ -24,7 +26,7 @@ def get_stats | |||
private | |||
|
|||
def last_release_date | |||
@results.data.repository.releases.nodes.first&.published_at | |||
@results&.map { |node| DateTime.parse(node.published_at) }.first&.strftime('%FT%TZ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.iso8601
is there if you'd rather use that than strftime
I wanted to get the Github stats fixed up a bit.
In this PR:
Github
modulerepository_host
download_issues
anddownload_pull_requests
and use the database to gather those stats