Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Right To Know stats #5

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
c21ae82
Split up scraper into separate files, to improve maintainability
auxesis Aug 15, 2017
ae598a8
Update class names to match scraper
auxesis Aug 15, 2017
cd829a2
Make it easier to see if the run worked or not
auxesis Aug 15, 2017
9127de2
Provide stub text to be overridden, so Jacaranda::Runner works standa…
auxesis Aug 15, 2017
2e5ed94
Add tests for run method
auxesis Aug 15, 2017
d082969
Add Delorean for time based tests
auxesis Aug 15, 2017
2b04e98
Test posted_in_last_fortnight by changing the clock
auxesis Aug 15, 2017
999163e
Add #posts that filters based on runner type
auxesis Aug 15, 2017
6e505f1
Generate unique URLs
auxesis Aug 15, 2017
1a59f7d
Clean database after each test
auxesis Aug 15, 2017
d25cc09
Handle Slack post failures more gracefully
auxesis Aug 15, 2017
d58e0c8
Style fixes, per Rubocop
auxesis Aug 15, 2017
0ba4933
Style improvements courtesy Rubocop
auxesis Aug 15, 2017
34de330
Add tests for Right To Know scraper
auxesis Aug 15, 2017
edab6d7
Merge branch 'refactor' into righttoknow
auxesis Aug 15, 2017
6d4265c
Merge branch 'master' into righttoknow
auxesis Aug 15, 2017
b0526d8
Remove traces of GitHub access tokens, as they're no longer needed
auxesis Aug 15, 2017
427525e
Prepend all output with the name of the scraper
auxesis Aug 15, 2017
6840d96
Display some helper text before the run
auxesis Aug 15, 2017
b797486
Print what the runner will do before doing anything
auxesis Aug 15, 2017
965adfc
Add documentation explaining how to add new runners
auxesis Aug 15, 2017
27eb30c
Fix method description
auxesis Aug 15, 2017
b4ebe9d
Add ability to filter which runners to run
auxesis Aug 15, 2017
3ed18ba
Refactor tests per feedback from @equivalentideas
auxesis Aug 15, 2017
70cc5e5
Set up and tear down database for every test
auxesis Aug 15, 2017
43d6456
Refactor to use subject syntax
auxesis Aug 15, 2017
25b41c8
Fix merge conflict from master
auxesis Aug 16, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
MORPH_SLACK_CHANNEL_WEBHOOK_URL="https://hooks.slack.com/services/XXXXXXXXXXXXX"
MORPH_GITHUB_OAUTH_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
MORPH_LIVE_MODE="false"
1 change: 1 addition & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ Metrics/LineLength:
Metrics/BlockLength:
Exclude:
- 'spec/unit_spec.rb'
- 'spec/runner_spec.rb'
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ group :development do
end

group :test do
gem 'delorean'
gem 'faker'
gem 'rspec'
gem 'rubocop'
Expand Down
4 changes: 4 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,12 @@ GEM
tzinfo (~> 1.1)
addressable (2.4.0)
ast (2.3.0)
chronic (0.10.2)
coderay (1.1.0)
crack (0.4.3)
safe_yaml (~> 1.0.0)
delorean (2.1.0)
chronic
diff-lcs (1.3)
domain_name (0.5.24)
unf (>= 0.0.5, < 1.0.0)
Expand Down Expand Up @@ -119,6 +122,7 @@ PLATFORMS

DEPENDENCIES
activesupport
delorean
dotenv
faker
json
Expand Down
106 changes: 103 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,17 +54,15 @@ bundle exec rspec

## Usage

This scraper requires three environment variables:
This scraper requires these environment variables:

* `MORPH_GITHUB_OAUTH_ACCESS_TOKEN` to talk to the GitHub API. You must generate a [personal access token](https://github.com/settings/tokens) with the `repo` permission.
* `MORPH_SLACK_CHANNEL_WEBHOOK_URL` to post the message to a channel in Slack. You can get a URL by adding an _Incoming Webhook_ customer integration in your Slack org.
* `MORPH_LIVE_MODE` determines if the scraper actually posts to the Slack channel `#townsquare` and save to the database

When developing locally, you can add these environment variables to a [`.env` file](https://github.com/bkeepers/dotenv) so the scraper loads them when it runs:

``` bash
MORPH_SLACK_CHANNEL_WEBHOOK_URL="https://hooks.slack.com/services/XXXXXXXXXXXXX"
MORPH_GITHUB_OAUTH_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
MORPH_LIVE_MODE=false
```

Expand All @@ -82,6 +80,108 @@ You can also run this as a scraper on [Morph](https://morph.io).

To get started [see the documentation](https://morph.io/documentation)

## Contributing

## Adding new runners to Jacaranda

Jacaranda has a very simple model for adding new runners.

Runners pull information from (sometimes multiple) sources, and posts a message into Slack.

To add a new runner, open up `scraper.rb` and define the following class:

``` ruby
module Jacaranda
# A new runner for my new service
class MyNewRunner < Runner
class << self
def build
[
'My text here.'
]
end
end
end
end
```

Then run the scraper:

``` bash
MORPH_LIVE_MODE=false bundle exec ruby scraper.rb --runners MyNewRunner
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies you need the --runners option to run your runner—which is makes it unclear how this could run on morph.io. I think a sentence explaining that would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

```

You'll see output something like this:

```
These are the runners we will execute:

MyNewRunner

[MyNewRunner] We have not posted an update during this fortnight.
[MyNewRunner] Not posting to Slack.
[MyNewRunner] Not recording the message in the database.

> My text here.
```

That's it.

While displaying some static text is a nice start, you'll want to call out to your app to pull information in:

``` ruby
module Jacaranda
# A new runner for my new service
class MyNewRunner < Runner
class << self
def build
[
MyApp.status_text(period: last_fortnight),
]
end
end
end
end
```

We're using a built-in helper method called `last_fortnight` to give us a date range for the last fortnight:

``` ruby
last_fortnight # => [ Mon, 31 Jul 2017, Tue, 01 Aug 2017, ... ]
```

We pass this as the `period` parameter to the `status_text` method on the `MyApp` class.

Both the `MyApp` class and `status_text` method don't exist yet. Let's add them:

``` ruby
# lib/myapp.rb

# MyApp stats from MyApp
class MyApp
class << self
def status_text(period:)
[
':tada:',
count('requests:new', period: period),
'new requests were made through My App in the last fortnight.'
].join(' ')
end

def count(query, period)
start = period.first
finish = period.last

# Make a call out to your service here ...
end
end
end
```

The `status_text` method is very simple – it accepts a time period it needs to produce text for, and returns a string of text.

Typically there is a `count` method used to get hit some endpoint or scrape some pages, and generate aggregate statistics. The exact implementation is up to you! Check out the `PlanningAlerts` and `RightToKnow` classes to see some more complex use cases.

## Image credit

The Jacaranda Slack avatar is cropped from a [photograph of the Jacaranda trees on Gowrie St, Newtown, Sydney by Flickr user murry](https://www.flickr.com/photos/hopeless128/15808564051/in/photolist-aCSCXw-q8S). Thanks murry for making it available under a creative commons license.
28 changes: 28 additions & 0 deletions lib/github.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# frozen_string_literal: true

require 'octokit'

# PlanningAlerts contributor stats from GitHub
class GitHub
class << self
def commits_text(period:)
puts 'Collect information from GitHub'
if commits_count(period: period).zero?
nil
else
"You shipped #{commits_count(period: period)} commits in the same period."
end
end

private

def commits_count(period:)
github = Octokit::Client.new
github.auto_paginate = true
repo = 'openaustralia/planningalerts'
params = { since: period.first, until: period.last }
commits = github.commits(repo, params)
commits.size
end
end
end
110 changes: 110 additions & 0 deletions lib/planningalerts.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# frozen_string_literal: true

require 'mechanize'
require 'rest-client'
require 'json'
require 'active_support/all'

# Duck punches
class Numeric
def percent_of(n)
to_f / n.to_f * 100.0
end
end

# PlanningAlerts stats from PlanningAlerts
class PlanningAlerts
class << self
def new_subscribers_text(period:)
before_period = determine_period_before(period)

puts 'Collect new subscriber information from PlanningAlerts'
period_count = count('new_alert_subscribers', period: period)
period_before_count = count('new_alert_subscribers', period: before_period)

[
period_count,
'people signed up for PlanningAlerts last fortnight :revolving_hearts:',
change_sentence(period_count, period_before_count)
].join(' ')
end

def new_unsubscribers_text(period:)
before_period = determine_period_before(period)

puts 'Collect new unsubscriber information from PlanningAlerts'
period_count = count('emails_completely_unsubscribed', period: period)
period_before_count = count('emails_completely_unsubscribed', period: before_period)

[
period_count,
'people left.',
change_sentence(period_count, period_before_count)
].join(' ')
end

def total_subscribers_text
puts 'Collect total subscribers information from PlanningAlerts'
number = total_planningalerts_subscribers.round(-2)
format = { precision: 0, delimiter: ',' }
[
'There are now',
ActiveSupport::NumberHelper.number_to_rounded(number, format),
'PlanningAlerts subscribers! :star2:'
].join(' ')
end

private

def determine_period_before(period)
(period.first.advance(weeks: -2)..period.last.advance(weeks: -2)).to_a
end

def total_planningalerts_subscribers
# Memoize if we have fetched the data before
return @subscribers_count if @subscribers_count
# Otherwise pull the data from the PlanningAlerts website
page = Mechanize.new.get('https://www.planningalerts.org.au/performance')
@subscribers_count = page.at('#content h2').text.split(' ').first.to_i
end

def percentage_change_in_words(change)
[
change.to_s.delete('-') + '%',
(change.positive? ? 'more' : 'less')
].join(' ')
end

def change_sentence(last_fortnight, fortnight_before_last)
percentage_change_from_fortnight_before = last_fortnight.percent_of(fortnight_before_last) - 100
percentage_change_from_fortnight_before = percentage_change_from_fortnight_before.round(1).floor

[
'That’s',
percentage_change_in_words(percentage_change_from_fortnight_before),
'than the fortnight before.'
].join(' ')
end

def subscribers_data
# Memoize if we have fetched the data before
return @subscribers_data if @subscribers_data
# Otherwise fetch the data, with a _long_ timeout.
url = 'https://www.planningalerts.org.au/performance/alerts.json'
response = RestClient::Request.execute(method: :get, url: url, timeout: 300)
@subscribers_data = JSON.parse(response)
end

def count(attribute, period:)
count = 0

period.each do |date|
subscribers_data.each do |row|
count += row[attribute] if row['date'] == date.to_s
end
end

count
end
end
end
50 changes: 50 additions & 0 deletions lib/righttoknow.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# frozen_string_literal: true

require 'mechanize'

# RightToKnow stats from RightToKnow
class RightToKnow
class << self
def count(query, period:)
start = period.first.strftime('%D')
finish = period.last.strftime('%D')
base = "https://www.righttoknow.org.au/search/#{query}%20#{start}..#{finish}.html"
agent = Mechanize.new
# TODO: This iterates through pages looking for one with trustworthy
# results. It's guessing that the page number of the last page
# of results is no greater than 10. This is based on Right To Know's current usage,
# with a lot of padding built in. Currently the 3rd page is the last.
# Remove this logic and just get the results, once
# https://github.com/openaustralia/righttoknow/issues/673 is fixed.
(1..10).to_a.reverse.each do |n|
page = agent.get("#{base}?page=#{n}")
return page.at('.foi_results').text.split.last if page.at('.foi_results')
end
end

def new_requests_text(period:)
[
':saxophone:',
count('variety:sent', period: period),
'new requests were made through Right To Know last fortnight.'
].join(' ')
end

def annotations_text(period:)
[
':heartbeat:',
'Our contributors helped people with',
count('variety:comment', period: period),
'annotations.'
].join(' ')
end

def success_text(period:)
[
':trophy:',
count('status:successful', period: period),
'requests were marked successful!'
].join(' ')
end
end
end
Loading