Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Participation stats #3447

Merged
merged 23 commits into from Apr 13, 2017
Merged

Participation stats #3447

merged 23 commits into from Apr 13, 2017

Conversation

@nilbus
Copy link

@nilbus nilbus commented Apr 3, 2017

This is for analyzing the results of the experiment in exercism/discussions#123.

Implements #3445; see also the preliminary discussion there.

Goal: Measure review participation quantity and quantity (length) before, during, and after gamification is applied.

This branch contains DB migrations.

WIP screenshot: screenshot

Todo:

  • Convert stats.js to Coffeescript
  • Use use Plotly only on .comments-chart and future friends
  • Determine data format needed for comment length plot, and build a prototype
  • Migrate: Create crc32 function
  • Ruby: Look up and generate stats JSON in the format needed by Plotly
  • Extract data from the JS, and feed in via data attribute
  • Add comment length stats using box & whiskers plots
  • Exclude unqualified users
    • who participated in experiment discussions
    • who joined exercism after the experiment analysis period started (can't compare withdrawal to nonexistent earlier data)
    • who opted out to view the experiment stats
  • Exclude the last day of stats, which is in progress
  • Add a simple route integration test
  • Report performance metrics
  • Optimize performance (only as needed):
    • Test with 1 million comments (currently 300k in production)
    • Query less
      • Cache results, expire daily
      • Append new day's results to existing set, rather than recalculating everything daily
    • Optimize query
      • Store crc32(username) % 100 in users table
      • [ ] Create derived column created_date, updated via trigger
      • Index by created_date to eliminate sort on query
      • Create derived column body_length, updated via trigger
@nilbus nilbus force-pushed the nilbus:participation-stats branch 2 times, most recently from da0c4ba to 6f1d8df Apr 4, 2017
nilbus added 2 commits Apr 4, 2017
This replaces the /stats redirect to the first track.

See exercism/discussions#123

Feature flag: participation_stats

This introduces a new plotting library, Plotly.js.
See discussion in #3445.

The migration introduces postgres to the crc32 hashing function, so it
can determine which users are in the experiment group and not.

This branch will close #3445.
List control first, so experimental has the brighter color.
@nilbus nilbus force-pushed the nilbus:participation-stats branch 3 times, most recently from 2061cb0 to 2344195 Apr 8, 2017
nilbus added 4 commits Apr 8, 2017
The new Track Stats header provides for a better empty state.

We don't particularly want people to opt out of participation tracking.
I intentionally leave unsaid that this is how you can see the
statistics, so as not to reward curiosity during the experiment.
@nilbus nilbus force-pushed the nilbus:participation-stats branch from 2344195 to 87fc2e6 Apr 8, 2017
@nilbus nilbus force-pushed the nilbus:participation-stats branch from be640f2 to 4bcb2b2 Apr 9, 2017
@nilbus
Copy link
Author

@nilbus nilbus commented Apr 9, 2017

Release notes:

  • Feature flag: participation_stats
  • Run migrations
  • This PR introduces application metric reporting to Hosted Graphite using the metric exercism_io.stats.experiment.query.time. Under the Heroku app's resources tab, add the Hosted Graphite add-on (under Find more add-ons), and metrics will begin reporting. Please add me as a team member on the Hosted Graphite dashboard (linked from the Heroku Resources page). In environments without HOSTEDGRAPHITE_APIKEY set, metrics reporting is a no-op.
@nilbus
Copy link
Author

@nilbus nilbus commented Apr 9, 2017

Forgot to mention… Hosted Graphite's free developer tier should be sufficient for us.

class CreateFunctionCrc32 < ActiveRecord::Migration
def up
ActiveRecord::Base.connection.execute <<~SQL
CREATE FUNCTION crc32(text_string text) RETURNS bigint AS $$

This comment has been minimized.

@bernardoamc

bernardoamc Apr 10, 2017
Member

Should we have a check for NULL in this function just in case?

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

Yes, otherwise it starts an infinite loop. Fixed in 8ee996d. Thanks!

Otherwise, crc32(NULL) starts an infinite loop.

Thanks @bernardoamc
new.increment(metric_name)
end

def initialize(api_key: ENV['HOSTEDGRAPHITE_APIKEY'], host: 'carbon.hostedgraphite.com', port: 2003)

This comment has been minimized.

@bernardoamc

bernardoamc Apr 10, 2017
Member

It seems this class simply doesn't work unless we have a valid api_key. Should we be explicit and raise when we don't have this attribute?

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

No, it was an intentional choice to make Metrics no-op when the API key is not present. Consider that most everyone working with this project in development will not have an API key. I don't want to force a dummy key to be provided or an account to be set up. In most cases, developers won't care to look at the metrics anyway.

Now that you mention this though, I just thought of something… instead of doing nothing, Metrics can just print to stdout. That would be even better for development. 👍

This comment has been minimized.

@bernardoamc

bernardoamc Apr 10, 2017
Member

Cool, thanks for the context! <3

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

Added in 1e23ff2.

where('comments.created_at < ?', end_date).
group('created_date').
order('created_date')
relation = filter_experiment_group(relation, experiment_group)

This comment has been minimized.

@bernardoamc

bernardoamc Apr 10, 2017
Member

experiment_group is accessible anywhere in the class, we don't need to use it as a parameter here. :)

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

Fixed in ff9a080.

}
if gamification_markers
result.merge!(
gamification_start_date: GAMIFICATION_START_DATE,

This comment has been minimized.

@bernardoamc

bernardoamc Apr 10, 2017
Member

We could move this to a constant and do something like: result = result.merge(GAMIFICATION_MARKERS) if gamification_markers.present?

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

Less garbage collection for the win! 🐎

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

Done in 334c2f6.

end

def results
result = {

This comment has been minimized.

@bernardoamc

bernardoamc Apr 10, 2017
Member

Is the order important here? date[0] refer to daily_review_count[0] which refer to daily_review_count[0] and so on? If so, we could create a struct like: DailyReview = Struct.new(:date, :count, :length) and instantiate it when iterating through the results.

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

The order is important. The trick is that these values get converted immediately to JSON for storage in a data attribute for the Plotly, and this is the format that it expects. In other circumstances, I agree that would make more sense.

See where this is used in stats.coffee:40.


attr_reader :alice, :opted_out
def setup
super

This comment has been minimized.

@bernardoamc

bernardoamc Apr 10, 2017
Member

Why do we need the super here?

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

DBCleaner starts a transaction in the superclass. When overriding setup, not calling super means that the transaction doesn't start, and any records created get left in the database after each test run.

This comment has been minimized.

@bernardoamc

bernardoamc Apr 10, 2017
Member

I see! Maybe it would be a good case to use prepend.. not the scope of this PR though. :)

class MetricsTest < MiniTest::Test
def test_reports_without_error
metrics = Metrics.new(api_key: 'test', host: 'localhost')
UDPSocket.any_instance.expects(:send)

This comment has been minimized.

@bernardoamc

bernardoamc Apr 10, 2017
Member

Should we use UDPSocket.any_instance.expects(:send).twice here?

metrics = Metrics.new(api_key: 'test', host: 'localhost')
UDPSocket.any_instance.expects(:send).twice
metrics.time('test.time') { 1 }
metrics.increment('test.count')

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

Sure. I wasn't sure if that was a thing in Minitest. 👍

This comment has been minimized.

@bernardoamc

bernardoamc Apr 10, 2017
Member

I think this is provided by mocha, it has a bunch of interesting expectations. :)

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

Right, that's what I meant. 😄 My mind was just going to "not rspec".

This comment has been minimized.

@nilbus

nilbus Apr 10, 2017
Author

Changed in fc7b1bd.

@kytrinyx
Copy link
Member

@kytrinyx kytrinyx commented Apr 10, 2017

This looks great @nilbus. Let me know if/when you're ready to merge (I see a few unchecked boxes).

nilbus added 2 commits Apr 11, 2017
Spread both comments and users over 4 years, with quantity increasing
exponentially toward the present.
@nilbus
Copy link
Author

@nilbus nilbus commented Apr 11, 2017

Performance tests complete. With 50k users and 1M comments, the statistics query for each experiment group (2) takes under 200ms. I chose not to make a few other optimizations that brought the query time down to 100ms, because the complexity of pre-calculated columns that would need to be updated using triggers isn't worth the performance gain. I may revisit these though for long-term stats. With sub-second request times and no obvious existing cache mechanism, I'm also not concerned with application-level caching for this low-traffic page.

This is ready to go! (release notes)

@nilbus
Copy link
Author

@nilbus nilbus commented Apr 11, 2017

Thanks for the thorough review @bernardoamc! 🚀

- Withdrawal date matches actual date.
- Pre gamification period matches withdrawal period duration.
@kytrinyx kytrinyx merged commit 76aebfa into exercism:master Apr 13, 2017
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.3%) to 85.774%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants