Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup GitHub Stats #2400

Merged
merged 13 commits into from
Jul 16, 2019
Merged

Cleanup GitHub Stats #2400

merged 13 commits into from
Jul 16, 2019

Conversation

mikeyoung85
Copy link
Contributor

I wanted to get the Github stats fixed up a bit.

In this PR:

  • moved query and stat classes into Github module
  • removed some unused or duplicate stats
  • used repository_host download_issues and download_pull_requests and use the database to gather those stats
  • redid specs with a smaller repository to cut down on amount of data in VCR cassettes


resp = @client.contributor_stats(params[:full_name])
if @client.last_response.status == 202
sleep 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 seconds of total possible sleep time sounds like a lot, although I'm not sure what this is waiting on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request to GitHub will often return a 202 instead of the actual data to let GitHub process the information needed for the request. This is there to retry the call after a short wait, not ideal. It would be better if this was handled in the Octokit client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a simple increasing backoff?

app/models/repository_host/github.rb Show resolved Hide resolved
end
end

context "repository with no commits" do
let(:repository) { create(:repository, full_name: 'buddhamagnet/heidigoodchild') }
let(:query_results) do
VCR.use_cassette('github/empty_repository', :match_requests_on => [:body]) do
VCR.use_cassette('github/empty_repository', :match_requests_on => [:method, :uri, :body, :query]) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything special that needs to be done to re-record these cassettes? If so, now might be good time to document it for posterity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would be a good spot for information like that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past I've just plopped it in comments near the VCR calls

@mikeyoung85 mikeyoung85 force-pushed the mikeyoung85/cleanup-github-stats branch from dd0e863 to fea6391 Compare July 10, 2019 20:20
Copy link
Contributor

@kbarrette kbarrette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but I had a few questions about RepoReleasesQuery


result.data.repository&.releases.nodes.each do |release|
publish_date = DateTime.parse(release.published_at)
if publish_date > end_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backwards? I feel like it should be publish_date < end_date

Also, should it be <= so end_date is inclusive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename these variables. The releases are going from newest to oldest and end_date is the earliest date we would care about. Like get all releases from the last year, end_date would be one year ago. So if the current release's date is after end_date then add it to the list and keep going to the next release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, makes sense. The name was confusing, for sure.

…_date and handle errors within the query class as well as just send back release data instead of the entire graphql result
:v4
end

def query(params: {})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made changes through here to make this a little clearer and to actually gather and return release data correctly. Now it actually takes into account the publish date and exits the loop correctly. It also is just sending back the release data now instead of the entire graphQL response and I've updated the stat to take that into account.


resp = @client.contributor_stats(params[:full_name])
if @client.last_response.status == 202
sleep (count + 1) * 0.5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to be a quicker second try and a small backoff after that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on ditching the + 1 to retry after 500ms the first time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count would be 0 the first time through that's why I added the +1. I can redo count to start at 1 to remove this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, that's fine!

0
end
end.sum
@results.select{ |contributor| contributed?(contributor, weeks_ago) }.size
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-arranged this section to try and be a little clearer.

end

def contributed?(contributor, weeks_ago)
# todo: verify weeks are actually within the time period?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it is assuming the last week in the array is the latest week. I might go back and rewrite this to actually check the timestamps of the week instead.

{
return {} if @results.nil?

last_week_releases = @results.select {|release| DateTime.parse(release.published_at) > @now - 1.week}.count
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been rewritten to use just to read an array of releases instead of the entire graphQL result set.

last_two_month_releases = @results.data.repository.releases.nodes.select {|release| release.published_at > @now - 60}.count
last_year_releases = @results.data.repository.releases.nodes.select {|release| release.published_at > @now - 365}.count
{
return {} if @results.nil?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slight change. Before we were sending back 0 releases, but our original stance was to return nothing for a stat if we didn't have the data for it. I think this is more correct.

@@ -24,7 +26,7 @@ def get_stats
private

def last_release_date
@results.data.repository.releases.nodes.first&.published_at
@results&.map { |node| DateTime.parse(node.published_at) }.first&.strftime('%FT%TZ')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strftime is to keep the publish date in the same format as what it was previously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.iso8601 is there if you'd rather use that than strftime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iso8601 was returning something slightly different than what GitHub is sending back. I'm trying to keep the formatting the same as the stats that are already in the database.

Copy link
Contributor

@kbarrette kbarrette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more comments, but nothing blocking - this is looking nice.

cursor = result.data.repository.releases.page_info.end_cursor

# initalize releases if we have not started gathering data
releases ||= []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 38 releases = [] would make this unnecessary, I think? Unless it's important to return nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using nil to indicate no data was returned from the API. At this line we have a response from GitHub so I want to initialize to an empty array unless this is after the first page of information.


resp = @client.contributor_stats(params[:full_name])
if @client.last_response.status == 202
sleep (count + 1) * 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on ditching the + 1 to retry after 500ms the first time?

@@ -24,7 +26,7 @@ def get_stats
private

def last_release_date
@results.data.repository.releases.nodes.first&.published_at
@results&.map { |node| DateTime.parse(node.published_at) }.first&.strftime('%FT%TZ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.iso8601 is there if you'd rather use that than strftime

@mikeyoung85 mikeyoung85 merged commit f2d09ba into master Jul 16, 2019
@mikeyoung85 mikeyoung85 deleted the mikeyoung85/cleanup-github-stats branch July 16, 2019 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants