feat: keep cached user stats [BD-38] #361

xitij2000 · 2021-12-10T13:41:55Z

In a previous PR (https://github.com/edx/cs_comments_service/pull/352) we introduced a new API that would generate user stats for a course on-the-fly. The query would return data about all the users in a course who'd participated in discussions. In some cases, this could be in hundreds of thousands or millions of users, and result in dozens if not hundreds of MBs of data.

This data also needed to be sorted, and paginated. Due to the nature of the query, paginating this data would not reduce the performance implications of the query, since it'd still need to query all the course data to build the stats. That said, it would reduce the final amount of data involved.

This PR implements a different mechanism for the same. It will keep track of user counts across courses, by incrementing the counts each time a user creates a post, comment etc. It will also track deletes, and when content is marked/unmarked as abusive. The first time such an increment/decrement happens it will auto-backfill the data to keep counts accurate. However, other than that it will not automatically backfill data for users yet.

openedx-webhooks · 2021-12-10T13:42:01Z

Thanks for the pull request, @xitij2000! I've created BLENDED-1044 to keep track of it in Jira. More details are on the BD-38 project page.

When this pull request is ready, tag your edX technical lead.

xitij2000 · 2021-12-13T07:24:30Z

api/users.rb

@@ -11,11 +11,54 @@
  end
 end

+get "#{APIPREFIX}/users/:course_id/stats2" do |course_id|


This is a temporary name to allow testing both endpoints. This will replace stats.

xitij2000 · 2021-12-13T07:29:08Z

spec/api/user_spec.rb

+        # Sort the map entries using the default sort
+        expected_result = expected_result.values.sort_by  { |val| [val["threads"], val["responses"], val["replies"]] }.reverse

-        get "/api/v1/users/#{course_id}/stats"
+        get "/api/v1/users/#{course_id}/stats2"
        expect(last_response.status).to eq(200)
        res = parse(last_response.body)
-        expect(res).to eq expected_result
+        expect(res["user_stats"]).to eq expected_result


The new API now returns a list of objects sorted by activity but other than that the results should be the same as before.

I'll add more tests for checking that adding comments, removing comments, etc keep the counts updated.

xitij2000 · 2021-12-21T08:15:33Z

@mtyaka Could you do a review of this PR?

xitij2000 · 2021-12-21T08:35:17Z

api/users.rb

+  data = paginated_stats.to_a.map { |user_stats| {
+    :username => user_stats["username"]
+  }.merge(
+    user_stats["course_stats"].first.except("_id", "course_id")
+  ) }


Flatten the nesting, and remove the ids from each document since the id is unneeded, and the course_id is common.

xitij2000 · 2021-12-21T08:36:28Z

api/users.rb

+
+  paginated_stats = User
+                      .where("course_stats.course_id" => course_id)
+                      .only(:username, :'course_stats.$')


Only return the username field, and the course_stats entry that matches the course_id condition.

xitij2000 · 2021-12-21T08:53:38Z

models/comment.rb

+  data = Content.collection.aggregate(
+    [
+      # Match all content in the course
+      { "$match" => { :course_id => course_id, :author_id => user.external_id } },
+      # Keep a count of flags for each entry
+      {
+        "$set" => {
+          # Just using $ne with null will return true if the field is absent
+          # So we first fall all absent fields to null, then check if it's null,
+          # that way we match for the absence of the field or value = null
+          :is_reply => { "$ne" => [{ "$ifNull" => ["$parent_id", nil] }, nil] }
+        }
+      },
+      {
+        "$group" => {
+          # Here we're grouping items by the type (comment or thread), and the user, and whether the comment is a reply.
+          # For threads is_reply will always be false.
+          :_id => { :type => "$_type", :is_reply => "$is_reply" },
+          # This will just count each group, so we get a breakdown of how many comments and threads a user has created.
+          :count => { "$sum" => 1 },
+          # These two will sum up the active and inactive reports in each category
+          # i.e. reported threads, reported comments, reported replies
+          # The way this works is (starting from inside out), we take the size of the abuse_flaggers list, we compare
+          # it to 0 using $cmp. If it's greater than zero then $cmp results in 1 otherwise 0.
+          # So we're summing up 1 for each abuse_flagger array that has entries, and 0 for the rest. This gives us a
+          # count of
+          :active_flags => { "$sum" => { "$cmp" => [{ "$size" => "$abuse_flaggers" }, 0] } },
+          :inactive_flags => { "$sum" => { "$cmp" => [{ "$size" => "$historical_abuse_flaggers" }, 0] } },
+        }
+      }
+    ])


We use the same query, but scoped to a single user, so it's a much smaller set of data to go through.

mtyaka

The code looks good @xitij2000! I posted a few comments, but they are mostly nits. The only larger concern is that it looks like we're treating empty arrays as falsey in some places, but empty arrays are truthy in ruby.

I didn't do any manual testnig. Do we have a sandbox where I could test it out?

mtyaka · 2021-12-21T16:09:32Z

api/users.rb

+  per_page = (params["per_page"] || DEFAULT_PER_PAGE).to_i
+  per_page = DEFAULT_PER_PAGE if per_page <= 0
+
+  # There are two sorts available, activity t sor


Comment is incomplete.

lib/helpers.rb

mtyaka · 2021-12-21T16:25:34Z

api/users.rb

+                      .paginate(:page => page, :per_page => per_page)
+  total_count = paginated_stats.total_entries
+
+  data = paginated_stats.to_a.map { |user_stats| {


Nit: Not sure if there's a edx style guide for ruby code, but the common way to format blocks in ruby is to use do |x| ... end for multi-line blocks and { |x| ... } for single line blocks.

models/comment.rb

mtyaka · 2021-12-21T16:52:50Z

models/comment.rb

+  stats.inactive_flags = inactive_flags
+  stats.save
+  stats
+end


Nit: no newline at end of file

mtyaka · 2021-12-21T16:58:15Z

models/user.rb

  def mark_as_read(thread)
    reconnect_mongo_primary
    read_state = read_states.find_or_create_by(course_id: thread.course_id)
    read_state.last_read_times[thread.id.to_s] = Time.now.utc
    read_state.save
  end

+  def stats_for_course(course_id)


This doesn't seem to be used anywhere. Can we remove it?

Sure, I think the place where it was used got refactored out, will remove.

spec/api/user_spec.rb

mtyaka

The code looks great @xitij2000! The three unless have to be changed to ifs, which I think is the reason why tests are failing.

Do we have a sandbox where I could test this manually?

lib/helpers.rb

mtyaka

@xitij2000 I found a subtle bug in the flag_as_abuse and un_flag_as_abuse methods.

However those don't explain the failing tests related to this method. The errors look really baffling, I've been trying to figure out why they happen without luck :/

lib/helpers.rb

mtyaka · 2021-12-24T09:04:12Z

However those don't explain the failing tests related to this method. The errors look really baffling, I've been trying to figure out why they happen without luck :/

Actually the first two failures might be related to that, but the "handles removing flags" error makes no sense to me.

xitij2000 · 2021-12-27T06:02:33Z

Actually the first two failures might be related to that, but the "handles removing flags" error makes no sense to me.

I tried isolating changes since the last passing tests and it seems that the errors are due to a change from after_save to after_create. Not sure why, but looking into it.

xitij2000 · 2021-12-27T08:18:11Z

I tried isolating changes since the last passing tests and it seems that the errors are due to a change from after_save to after_create. Not sure why, but looking into it.

OK, I figured it out. When using after_save, the stats are updated after the flags are added in the test data, however since the flags are added without triggering the update mechanism they are not updated when you switch to after_create.

I've now updated the tests so that we run an initial update of course stats, and then perform the rest of the tests. This fixes teh issues.

mtyaka · 2021-12-27T11:03:47Z

When using after_save, the stats are updated after the flags are added in the test data, however since the flags are added without triggering the update mechanism they are not updated when you switch to after_create.

Ah, that makes sense.

Code looks good to me 👍

felipetrz

I commented on a couple potential performance concerns still left, but otherwise this is looking a lot better than the previous solution.

models/user.rb

mtyaka · 2021-12-30T08:51:56Z

Nice work @xitij2000! 👍

I tested this on the https://discussions.sandbox.opencraft.hosting/ sandbox. I made a few posts, added a few replies, and flaged/unflagged a few other users' posts. I verified that the stats API endpoint returned correct results after each change.
I read through the code
~~I checked for accessibility issues~~ N/A
Includes documentation -- includes inline comments.

In a previous approach course stats were being generated on the fly for counts of user posts, responses and flags, however this would result in overly large amounts of data in the case of larger courses. This commit instead maintains these counts in the database so that this data can be fetched using a simpler, paginated and sortable query.

openedx-webhooks · 2022-01-25T12:45:05Z

@xitij2000 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

openedx-webhooks added blended PR is managed through 2U's blended developmnt program waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. labels Dec 10, 2021

This was referenced Dec 10, 2021

feat: Add a new user API for discussions [BD-38] [TNL-8795] openedx/edx-platform#29287

Merged

feat: Add user course stats API [BD-38] [TNL-8795] [BB-4970] #352

Merged

xitij2000 force-pushed the kshitij/cached-user-stats branch from 05d1b81 to 4645d16 Compare December 13, 2021 07:21

xitij2000 commented Dec 13, 2021

View reviewed changes

xitij2000 force-pushed the kshitij/cached-user-stats branch from 4645d16 to e601a31 Compare December 13, 2021 07:25

xitij2000 commented Dec 13, 2021

View reviewed changes

xitij2000 commented Dec 21, 2021

View reviewed changes

awaisdar001 requested a review from ormsbee December 21, 2021 09:45

mtyaka requested changes Dec 21, 2021

View reviewed changes

mtyaka requested changes Dec 23, 2021

View reviewed changes

lib/helpers.rb Outdated Show resolved Hide resolved

lib/helpers.rb Outdated Show resolved Hide resolved

lib/helpers.rb Outdated Show resolved Hide resolved

lib/helpers.rb Outdated Show resolved Hide resolved

xitij2000 force-pushed the kshitij/cached-user-stats branch from 74536f7 to 4cbce7e Compare December 24, 2021 05:53

mtyaka reviewed Dec 24, 2021

View reviewed changes

lib/helpers.rb Outdated Show resolved Hide resolved

xitij2000 marked this pull request as ready for review December 27, 2021 08:53

openedx-webhooks added needs triage and removed waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. labels Dec 27, 2021

felipetrz suggested changes Dec 27, 2021

View reviewed changes

models/user.rb Outdated Show resolved Hide resolved

models/user.rb Show resolved Hide resolved

xitij2000 force-pushed the kshitij/cached-user-stats branch from 6c4a408 to f3aceb3 Compare December 28, 2021 10:14

asadazam93 approved these changes Jan 10, 2022

View reviewed changes

xitij2000 force-pushed the kshitij/cached-user-stats branch from f74602e to ee29683 Compare January 18, 2022 04:02

asadazam93 requested a review from mtyaka January 19, 2022 06:17

mtyaka approved these changes Jan 19, 2022

View reviewed changes

xitij2000 force-pushed the kshitij/cached-user-stats branch from 34ddac4 to 5dcbe0a Compare January 25, 2022 10:09

xitij2000 force-pushed the kshitij/cached-user-stats branch from 5dcbe0a to 7a5b8f7 Compare January 25, 2022 12:09

xitij2000 merged commit 8103fc5 into master Jan 25, 2022

xitij2000 deleted the kshitij/cached-user-stats branch January 25, 2022 12:45

openedx-webhooks added merged and removed needs triage labels Jan 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: keep cached user stats [BD-38] #361

feat: keep cached user stats [BD-38] #361

xitij2000 commented Dec 10, 2021 •

edited

openedx-webhooks commented Dec 10, 2021 •

edited

xitij2000 Dec 13, 2021

xitij2000 Dec 13, 2021

xitij2000 commented Dec 21, 2021

xitij2000 Dec 21, 2021

xitij2000 Dec 21, 2021

xitij2000 Dec 21, 2021

mtyaka left a comment

mtyaka Dec 21, 2021

mtyaka Dec 21, 2021

mtyaka Dec 21, 2021

mtyaka Dec 21, 2021

xitij2000 Dec 22, 2021

mtyaka left a comment •

edited

mtyaka left a comment

mtyaka commented Dec 24, 2021

xitij2000 commented Dec 27, 2021

xitij2000 commented Dec 27, 2021

mtyaka commented Dec 27, 2021

felipetrz left a comment

mtyaka commented Dec 30, 2021

openedx-webhooks commented Jan 25, 2022

feat: keep cached user stats [BD-38] #361

feat: keep cached user stats [BD-38] #361

Conversation

xitij2000 commented Dec 10, 2021 • edited

openedx-webhooks commented Dec 10, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xitij2000 commented Dec 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtyaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtyaka left a comment • edited

Choose a reason for hiding this comment

mtyaka left a comment

Choose a reason for hiding this comment

mtyaka commented Dec 24, 2021

xitij2000 commented Dec 27, 2021

xitij2000 commented Dec 27, 2021

mtyaka commented Dec 27, 2021

felipetrz left a comment

Choose a reason for hiding this comment

mtyaka commented Dec 30, 2021

openedx-webhooks commented Jan 25, 2022

xitij2000 commented Dec 10, 2021 •

edited

openedx-webhooks commented Dec 10, 2021 •

edited

mtyaka left a comment •

edited