-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: keep cached user stats [BD-38] #361
Conversation
Thanks for the pull request, @xitij2000! I've created BLENDED-1044 to keep track of it in Jira. More details are on the BD-38 project page. When this pull request is ready, tag your edX technical lead. |
05d1b81
to
4645d16
Compare
api/users.rb
Outdated
@@ -11,11 +11,54 @@ | |||
end | |||
end | |||
|
|||
get "#{APIPREFIX}/users/:course_id/stats2" do |course_id| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a temporary name to allow testing both endpoints. This will replace stats.
4645d16
to
e601a31
Compare
spec/api/user_spec.rb
Outdated
# Sort the map entries using the default sort | ||
expected_result = expected_result.values.sort_by { |val| [val["threads"], val["responses"], val["replies"]] }.reverse | ||
|
||
get "/api/v1/users/#{course_id}/stats" | ||
get "/api/v1/users/#{course_id}/stats2" | ||
expect(last_response.status).to eq(200) | ||
res = parse(last_response.body) | ||
expect(res).to eq expected_result | ||
expect(res["user_stats"]).to eq expected_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new API now returns a list of objects sorted by activity but other than that the results should be the same as before.
I'll add more tests for checking that adding comments, removing comments, etc keep the counts updated.
@mtyaka Could you do a review of this PR? |
api/users.rb
Outdated
data = paginated_stats.to_a.map { |user_stats| { | ||
:username => user_stats["username"] | ||
}.merge( | ||
user_stats["course_stats"].first.except("_id", "course_id") | ||
) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flatten the nesting, and remove the ids from each document since the id is unneeded, and the course_id is common.
api/users.rb
Outdated
|
||
paginated_stats = User | ||
.where("course_stats.course_id" => course_id) | ||
.only(:username, :'course_stats.$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only return the username field, and the course_stats
entry that matches the course_id
condition.
models/comment.rb
Outdated
data = Content.collection.aggregate( | ||
[ | ||
# Match all content in the course | ||
{ "$match" => { :course_id => course_id, :author_id => user.external_id } }, | ||
# Keep a count of flags for each entry | ||
{ | ||
"$set" => { | ||
# Just using $ne with null will return true if the field is absent | ||
# So we first fall all absent fields to null, then check if it's null, | ||
# that way we match for the absence of the field or value = null | ||
:is_reply => { "$ne" => [{ "$ifNull" => ["$parent_id", nil] }, nil] } | ||
} | ||
}, | ||
{ | ||
"$group" => { | ||
# Here we're grouping items by the type (comment or thread), and the user, and whether the comment is a reply. | ||
# For threads is_reply will always be false. | ||
:_id => { :type => "$_type", :is_reply => "$is_reply" }, | ||
# This will just count each group, so we get a breakdown of how many comments and threads a user has created. | ||
:count => { "$sum" => 1 }, | ||
# These two will sum up the active and inactive reports in each category | ||
# i.e. reported threads, reported comments, reported replies | ||
# The way this works is (starting from inside out), we take the size of the abuse_flaggers list, we compare | ||
# it to 0 using $cmp. If it's greater than zero then $cmp results in 1 otherwise 0. | ||
# So we're summing up 1 for each abuse_flagger array that has entries, and 0 for the rest. This gives us a | ||
# count of | ||
:active_flags => { "$sum" => { "$cmp" => [{ "$size" => "$abuse_flaggers" }, 0] } }, | ||
:inactive_flags => { "$sum" => { "$cmp" => [{ "$size" => "$historical_abuse_flaggers" }, 0] } }, | ||
} | ||
} | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the same query, but scoped to a single user, so it's a much smaller set of data to go through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good @xitij2000! I posted a few comments, but they are mostly nits. The only larger concern is that it looks like we're treating empty arrays as falsey in some places, but empty arrays are truthy in ruby.
I didn't do any manual testnig. Do we have a sandbox where I could test it out?
api/users.rb
Outdated
per_page = (params["per_page"] || DEFAULT_PER_PAGE).to_i | ||
per_page = DEFAULT_PER_PAGE if per_page <= 0 | ||
|
||
# There are two sorts available, activity t sor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is incomplete.
api/users.rb
Outdated
.paginate(:page => page, :per_page => per_page) | ||
total_count = paginated_stats.total_entries | ||
|
||
data = paginated_stats.to_a.map { |user_stats| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Not sure if there's a edx style guide for ruby code, but the common way to format blocks in ruby is to use do |x| ... end
for multi-line blocks and { |x| ... }
for single line blocks.
models/comment.rb
Outdated
stats.inactive_flags = inactive_flags | ||
stats.save | ||
stats | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no newline at end of file
models/user.rb
Outdated
def mark_as_read(thread) | ||
reconnect_mongo_primary | ||
read_state = read_states.find_or_create_by(course_id: thread.course_id) | ||
read_state.last_read_times[thread.id.to_s] = Time.now.utc | ||
read_state.save | ||
end | ||
|
||
def stats_for_course(course_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used anywhere. Can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think the place where it was used got refactored out, will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks great @xitij2000! The three unless
have to be changed to if
s, which I think is the reason why tests are failing.
Do we have a sandbox where I could test this manually?
74536f7
to
4cbce7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xitij2000 I found a subtle bug in the flag_as_abuse
and un_flag_as_abuse
methods.
However those don't explain the failing tests related to this method. The errors look really baffling, I've been trying to figure out why they happen without luck :/
Actually the first two failures might be related to that, but the "handles removing flags" error makes no sense to me. |
I tried isolating changes since the last passing tests and it seems that the errors are due to a change from after_save to after_create. Not sure why, but looking into it. |
OK, I figured it out. When using after_save, the stats are updated after the flags are added in the test data, however since the flags are added without triggering the update mechanism they are not updated when you switch to after_create. I've now updated the tests so that we run an initial update of course stats, and then perform the rest of the tests. This fixes teh issues. |
Ah, that makes sense. Code looks good to me 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented on a couple potential performance concerns still left, but otherwise this is looking a lot better than the previous solution.
6c4a408
to
f3aceb3
Compare
Nice work @xitij2000! 👍
|
f74602e
to
ee29683
Compare
34ddac4
to
5dcbe0a
Compare
In a previous approach course stats were being generated on the fly for counts of user posts, responses and flags, however this would result in overly large amounts of data in the case of larger courses. This commit instead maintains these counts in the database so that this data can be fetched using a simpler, paginated and sortable query.
5dcbe0a
to
7a5b8f7
Compare
@xitij2000 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
In a previous PR (https://github.com/edx/cs_comments_service/pull/352) we introduced a new API that would generate user stats for a course on-the-fly. The query would return data about all the users in a course who'd participated in discussions. In some cases, this could be in hundreds of thousands or millions of users, and result in dozens if not hundreds of MBs of data.
This data also needed to be sorted, and paginated. Due to the nature of the query, paginating this data would not reduce the performance implications of the query, since it'd still need to query all the course data to build the stats. That said, it would reduce the final amount of data involved.
This PR implements a different mechanism for the same. It will keep track of user counts across courses, by incrementing the counts each time a user creates a post, comment etc. It will also track deletes, and when content is marked/unmarked as abusive. The first time such an increment/decrement happens it will auto-backfill the data to keep counts accurate. However, other than that it will not automatically backfill data for users yet.