Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkpoint feature to find_inactive_members.rb #268

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
access_token
api/ruby/find-inactive-members/data
.DS_Store
.bundle
22 changes: 22 additions & 0 deletions api/ruby/find-inactive-members/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,25 @@ Members are defined as inactive if they **have not performed** any of the follow
- Merged or pushed commits into the default branch
- Opened an Issue or Pull Request
- Commented on an Issue or Pull Request

## Checkpoint

The script sometimes breaks while parsing large organization either caused by timeout or connection error.
To deal with that frustration, `checkpoint` flag helps by storing Octokit API responses as json files in `data/`, so that during re-run it will first try to use stored json files instead of calling Github API from the beginning.

```
ruby find_inactive_members.rb [-cehv] -o ORGANIZATION -d DATE --checkpoint
```

Or simply do this (since it is idempotent)

```
while true; do ruby find_inactive_members.rb -o ORGANIZATION -d DATE --checkpoint; sleep 10; done
```

To reset the checkpoint files, simply

```
rm -rf data/*.json
rm -rf data/activities/*.json
```
161 changes: 128 additions & 33 deletions api/ruby/find-inactive-members/find_inactive_members.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
require "csv"
require "octokit"
require 'optparse'
require 'optparse/date'
require "optparse"
require "optparse/date"
require "json"
require "fileutils"

class InactiveMemberSearch
attr_accessor :organization, :members, :repositories, :date, :unrecognized_authors
Expand All @@ -22,6 +24,11 @@ def initialize(options={})
options[:date].nil?
)

if options[:checkpoint]
@checkpoint = true
FileUtils.mkdir_p "data/activities"
end

@date = options[:date]
@organization = options[:organization]
@email = options[:email]
Expand Down Expand Up @@ -79,23 +86,38 @@ def member_email(login)
end

def organization_members
# get all organization members and place into an array of hashes
# get all organization members and place into an array of hashes
members = load_file("data/members.json")
if members == nil
info "Collecting organization members from API\n"
members = @client.organization_members(@organization)
info "Organization members collected, saving data...\n"
save_file("data/members.json", members.map(&:to_h))
end

info "Finding #{@organization} members "
@members = @client.organization_members(@organization).collect do |m|
email =
@members = members.collect do |m|
email =
{
login: m["login"],
email: member_email(m[:login]),
active: false
}
end

info "#{@members.length} members found.\n"
end

def organization_repositories
repositories = load_file("data/repositories.json")
if repositories == nil
repositories = @client.organization_repositories(@organization)
save_file("data/repositories.json", repositories.map(&:to_h))
end

info "Gathering a list of repositories..."
# get all repos in the organizaton and place into a hash
@repositories = @client.organization_repositories(@organization).collect do |repo|
@repositories = repositories.collect do |repo|
repo["full_name"]
end
info "#{@repositories.length} repositories discovered\n"
Expand All @@ -115,13 +137,19 @@ def commit_activity(repo)
# get all commits after specified date and iterate
info "...commits"
begin
@client.commits_since(repo, @date).each do |commit|
file_name = "data/activities/" + sanitize_filename(repo) + "-commits_since.json"
commits_since = load_file(file_name, :symbolize_names => true)
if commits_since == nil
commits_since = @client.commits_since(repo, @date).map(&:to_h)
save_file(file_name, commits_since)
end
commits_since.each do |commit|
# if commmitter is a member of the org and not active, make active
if commit["author"].nil?
if commit[:author].nil?
add_unrecognized_author(commit[:commit][:author])
next
end
if t = @members.find {|member| member[:login] == commit["author"]["login"] && member[:active] == false }
if t = @members.find {|member| member[:login] == commit[:author][:login] && member[:active] == false }
make_active(t[:login])
end
end
Expand All @@ -136,16 +164,20 @@ def commit_activity(repo)
def issue_activity(repo, date=@date)
# get all issues after specified date and iterate
info "...Issues"
begin
@client.list_issues(repo, { :since => date }).each do |issue|
# if there's no user (ghost user?) then skip this // THIS NEEDS BETTER VALIDATION
if issue["user"].nil?
next
end
# if creator is a member of the org and not active, make active
if t = @members.find {|member| member[:login] == issue["user"]["login"] && member[:active] == false }
make_active(t[:login])
end
file_name = "data/activities/" + sanitize_filename(repo) + "-list_issues.json"
list_issues = load_file(file_name, :symbolize_names => true)
if list_issues == nil
list_issues = @client.list_issues(repo, { :since => date }).map(&:to_h)
save_file(file_name, list_issues)
end
list_issues.each do |issue|
# if there's no user (ghost user?) then skip this // THIS NEEDS BETTER VALIDATION
if issue[:user].nil?
next
end
# if creator is a member of the org and not active, make active
if t = @members.find {|member| member[:login] == issue[:user][:login] && member[:active] == false }
make_active(t[:login])
end
rescue Octokit::NotFound
#API responds with a 404 (instead of an empty set) when repo is a private fork for security advisories
Expand All @@ -156,16 +188,20 @@ def issue_activity(repo, date=@date)
def issue_comment_activity(repo, date=@date)
# get all issue comments after specified date and iterate
info "...Issue comments"
begin
@client.issues_comments(repo, { :since => date }).each do |comment|
# if there's no user (ghost user?) then skip this // THIS NEEDS BETTER VALIDATION
if comment["user"].nil?
next
end
# if commenter is a member of the org and not active, make active
if t = @members.find {|member| member[:login] == comment["user"]["login"] && member[:active] == false }
make_active(t[:login])
end
file_name = "data/activities/" + sanitize_filename(repo) + "-issues_comments.json"
issues_comments = load_file(file_name, :symbolize_names => true)
if issues_comments == nil
issues_comments = @client.issues_comments(repo, { :since => date }).map(&:to_h)
save_file(file_name, issues_comments)
end
issues_comments.each do |comment|
# if there's no user (ghost user?) then skip this // THIS NEEDS BETTER VALIDATION
if comment[:user].nil?
next
end
# if commenter is a member of the org and not active, make active
if t = @members.find {|member| member[:login] == comment[:user][:login] && member[:active] == false }
make_active(t[:login])
end
rescue Octokit::NotFound
#API responds with a 404 (instead of an empty set) when repo is a private fork for security advisories
Expand All @@ -176,19 +212,25 @@ def issue_comment_activity(repo, date=@date)
def pr_activity(repo, date=@date)
# get all pull request comments comments after specified date and iterate
info "...Pull Request comments"
@client.pull_requests_comments(repo, { :since => date }).each do |comment|
file_name = "data/activities/" + sanitize_filename(repo) + "-pull_requests_comments.json"
pull_requests_comments = load_file(file_name, :symbolize_names => true)
if pull_requests_comments == nil
pull_requests_comments = @client.pull_requests_comments(repo, { :since => date }).map(&:to_h)
save_file(file_name, pull_requests_comments)
end
pull_requests_comments.each do |comment|
# if there's no user (ghost user?) then skip this // THIS NEEDS BETTER VALIDATION
if comment["user"].nil?
if comment[:user].nil?
next
end
# if commenter is a member of the org and not active, make active
if t = @members.find {|member| member[:login] == comment["user"]["login"] && member[:active] == false }
if t = @members.find {|member| member[:login] == comment[:user][:login] && member[:active] == false }
make_active(t[:login])
end
end
end

def member_activity
def member_activity
@repos_completed = 0
# print update to terminal
info "Analyzing activity for #{@members.length} members and #{@repositories.length} repos for #{@organization}\n"
Expand Down Expand Up @@ -234,6 +276,49 @@ def member_activity
end
end
end

def load_file(filename, parseOptions = {})
if @checkpoint != true
return nil
end
begin
# info "Trying to load #{filename}.\n"
data = JSON.parse(File.read(filename), parseOptions)
# info "File #{filename} has been loaded.\n"
return data
rescue => exception
# info "Cannot load #{filename}\n"
# info "#{exception}\n"
return nil
end
end

def save_file(filename, data)
if @checkpoint != true
return
end
info "Saving data to #{filename}\n"
File.open(filename, 'w') do |f|
f.write(JSON.generate(data))
end
info "File saved to #{filename}\n"
end

def sanitize_filename(filename)
# Split the name when finding a period which is preceded by some
# character, and is followed by some character other than a period,
# if there is no following period that is followed by something
# other than a period (yeah, confusing, I know)
fn = filename.split /(?<=.)\.(?=[^.])(?!.*\.[^.])/m

# We now have one or two parts (depending on whether we could find
# a suitable period). For each of these parts, replace any unwanted
# sequence of characters with an underscore
fn.map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }

# Finally, join the parts with a period and return the result
return fn.join '.'
end
end

options = {}
Expand Down Expand Up @@ -261,6 +346,10 @@ def member_activity
options[:verbose] = v
end

opts.on('--checkpoint', "Enable file checkpoint, to be able to continue from failed request") do |c|
options[:checkpoint] = c
end

opts.on('-h', '--help', "Display this help") do |h|
puts opts
exit 0
Expand All @@ -278,6 +367,12 @@ def member_activity
Octokit.configure do |kit|
kit.auto_paginate = true
kit.middleware = stack if @debug
kit.connection_options = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why you added this configuration?

Copy link
Author

@asendia asendia Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @spier, thank you for taking your time to review my PR. IIRC, the default values for both request.open_timeout and timeout are 60s. I adjusted the value since it was taking a lot of time for me to retry when it failed.
I used this script in conjunction with bash script (I forget the exact script):

while true; do ruby find_inactive_members.rb; sleep 30; done

request: {
open_timeout: 10,
timeout: 10,
}
}
end

options[:client] = Octokit::Client.new
Expand Down