New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Participation stats #3447
Participation stats #3447
Conversation
Metrics/MethodLength is apparently loose enough now.
da0c4ba
to
6f1d8df
Compare
This replaces the /stats redirect to the first track. See exercism/discussions#123 Feature flag: participation_stats This introduces a new plotting library, Plotly.js. See discussion in exercism#3445. The migration introduces postgres to the crc32 hashing function, so it can determine which users are in the experiment group and not. This branch will close exercism#3445.
List control first, so experimental has the brighter color.
2061cb0
to
2344195
Compare
The new Track Stats header provides for a better empty state. We don't particularly want people to opt out of participation tracking. I intentionally leave unsaid that this is how you can see the statistics, so as not to reward curiosity during the experiment.
2344195
to
87fc2e6
Compare
be640f2
to
4bcb2b2
Compare
Release notes:
|
Forgot to mention… Hosted Graphite's free developer tier should be sufficient for us. |
class CreateFunctionCrc32 < ActiveRecord::Migration | ||
def up | ||
ActiveRecord::Base.connection.execute <<~SQL | ||
CREATE FUNCTION crc32(text_string text) RETURNS bigint AS $$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a check for NULL
in this function just in case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, otherwise it starts an infinite loop. Fixed in 8ee996d. Thanks!
Otherwise, crc32(NULL) starts an infinite loop. Thanks @bernardoamc
new.increment(metric_name) | ||
end | ||
|
||
def initialize(api_key: ENV['HOSTEDGRAPHITE_APIKEY'], host: 'carbon.hostedgraphite.com', port: 2003) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this class simply doesn't work unless we have a valid api_key
. Should we be explicit and raise when we don't have this attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it was an intentional choice to make Metrics
no-op when the API key is not present. Consider that most everyone working with this project in development will not have an API key. I don't want to force a dummy key to be provided or an account to be set up. In most cases, developers won't care to look at the metrics anyway.
Now that you mention this though, I just thought of something… instead of doing nothing, Metrics
can just print to stdout. That would be even better for development. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for the context! <3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 1e23ff2.
lib/exercism/participation_stats.rb
Outdated
where('comments.created_at < ?', end_date). | ||
group('created_date'). | ||
order('created_date') | ||
relation = filter_experiment_group(relation, experiment_group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
experiment_group
is accessible anywhere in the class, we don't need to use it as a parameter here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in ff9a080.
lib/exercism/participation_stats.rb
Outdated
} | ||
if gamification_markers | ||
result.merge!( | ||
gamification_start_date: GAMIFICATION_START_DATE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could move this to a constant and do something like: result = result.merge(GAMIFICATION_MARKERS) if gamification_markers.present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less garbage collection for the win! 🐎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 334c2f6.
end | ||
|
||
def results | ||
result = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the order important here? date[0]
refer to daily_review_count[0]
which refer to daily_review_count[0]
and so on? If so, we could create a struct like: DailyReview = Struct.new(:date, :count, :length)
and instantiate it when iterating through the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order is important. The trick is that these values get converted immediately to JSON for storage in a data attribute for the Plotly, and this is the format that it expects. In other circumstances, I agree that would make more sense.
See where this is used in stats.coffee:40.
|
||
attr_reader :alice, :opted_out | ||
def setup | ||
super |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the super
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DBCleaner
starts a transaction in the superclass. When overriding setup
, not calling super
means that the transaction doesn't start, and any records created get left in the database after each test run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Maybe it would be a good case to use prepend
.. not the scope of this PR though. :)
test/exercism/metrics_test.rb
Outdated
class MetricsTest < MiniTest::Test | ||
def test_reports_without_error | ||
metrics = Metrics.new(api_key: 'test', host: 'localhost') | ||
UDPSocket.any_instance.expects(:send) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use UDPSocket.any_instance.expects(:send).twice
here?
metrics = Metrics.new(api_key: 'test', host: 'localhost')
UDPSocket.any_instance.expects(:send).twice
metrics.time('test.time') { 1 }
metrics.increment('test.count')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I wasn't sure if that was a thing in Minitest. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is provided by mocha
, it has a bunch of interesting expectations. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's what I meant. 😄 My mind was just going to "not rspec".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in fc7b1bd.
This will reduce garbage collection slightly and cleans up the code some. Thanks @bernardoamc.
Generating 1 million comments takes about 1 minute on my machine.
This looks great @nilbus. Let me know if/when you're ready to merge (I see a few unchecked boxes). |
Spread both comments and users over 4 years, with quantity increasing exponentially toward the present.
From 1300ms to 130ms.
Performance tests complete. With 50k users and 1M comments, the statistics query for each experiment group (2) takes under 200ms. I chose not to make a few other optimizations that brought the query time down to 100ms, because the complexity of pre-calculated columns that would need to be updated using triggers isn't worth the performance gain. I may revisit these though for long-term stats. With sub-second request times and no obvious existing cache mechanism, I'm also not concerned with application-level caching for this low-traffic page. This is ready to go! (release notes) |
Thanks for the thorough review @bernardoamc! 🚀 |
- Withdrawal date matches actual date. - Pre gamification period matches withdrawal period duration.
This is for analyzing the results of the experiment in exercism/discussions#123.
Implements #3445; see also the preliminary discussion there.
Goal: Measure review participation quantity and quantity (length) before, during, and after gamification is applied.
This branch contains DB migrations.
WIP screenshot:
Todo:
Query less- Cache results, expire daily
- Append new day's results to existing set, rather than recalculating everything daily
Optimize query[ ] Create derived column created_date, updated via triggerCreate derived column body_length, updated via trigger