Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content metadata optimizations #8492

Conversation

rtibbles
Copy link
Member

@rtibbles rtibbles commented Oct 11, 2021

Summary

  • Adds BigIntegerField bitmask columns for every 64 allowed labels in each new categorical metadata column
  • Partitions the labels between the columns and assigns a bit value to each based on position in the list of labels represented in that column
  • Adds a has_labels method to the ContentNode queryset that does bitwise matching of one or more labels for a specific categorical metadata field
  • Updates API endpoints to use this - seems to give a roughly 50x speedup over LIKE lookups
  • Adds upgrade and annotation logic for setting these bitmasks on upgrade and at channel import time
  • Uses these bitmasks for the 'available label' queries also
  • Adds a new 'ancestors' JSON field onto ContentNodes (but not on the importable schema)
  • Adds function to annotate the ancestors fields for ContentNodes in a channel
  • Adds an upgrade step to apply to all existing channels
  • Adds to channel import to run this step after channel import
  • Removes ancestor querying logic from API endpoints and just loads this value directly instead, this saves a query that was particularly poorly performing for search results, due to the need to lookup ancestors across multiple tree locations - eliminating this allows search results to return in ~6ms on my dev machine, if the cache for available labels is hot.

References

Follow up to #8488

Reviewer guidance

Any areas of the code where more comments might be useful?
Any way to optimize the available labels code further?
Still needs tests.


Testing checklist

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

PR process

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

  • Automated test coverage is satisfactory
  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

@rtibbles rtibbles added the work-in-progress Not ready for review label Oct 11, 2021
@rtibbles rtibbles added this to the 0.15.0 milestone Oct 11, 2021
@rtibbles rtibbles force-pushed the content_metadata_optimizations branch from 8b93dc6 to 4776c0f Compare October 12, 2021 20:35
@marcellamaki marcellamaki changed the base branch from develop to release-v0.15.x October 12, 2021 22:18
@rtibbles rtibbles force-pushed the content_metadata_optimizations branch from c28f468 to 2d7833d Compare October 16, 2021 15:55
Copy link
Member

@jredrejo jredrejo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's impossible to test this PR using the frontend as https://github.com/learningequality/kolibri/blob/release-v0.15.x/kolibri/plugins/learn/assets/src/views/EmbeddedSidePanel/ActivityButtonsGroup.vue#L56 is broken (plugin_data.learningActivities is undefined)
is there a known workaround (appart from trying to debug and solve it) ?

kolibri/core/content/models.py Show resolved Hide resolved
@rtibbles
Copy link
Member Author

Sorry @jredrejo - looks like I need to map from snake case to camelCase after my last commit, will update.

@jredrejo
Copy link
Member

Sorry @jredrejo - looks like I need to map from snake case to camelCase after my last commit, will update.

thanks, let me know when I can test it

@rtibbles
Copy link
Member Author

Have updated so that it can be tested - have not made any updates as yet changing the integer fields.

Copy link
Member

@jredrejo jredrejo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me (there's an odd linting error that has not been introduced in this commit and it's already resolved in the v0.15.x branch) so rebasing should solve it.
I have tested it as much as I have been able and searching, upgrades & migrations work. Performance is hard to compare without a previous reference but the app works fine on my laptop.

There are a good amount of frontend errors that seem to be related to things this PR is not affecting, but just in case, this is the list I have seen:

  • Continue missing icon errors at Sub.svgIconComponent
  • http://localhost:8080/es-es/learn/ page keeps in blank with the loading animation forever until one tab is clicked
  • No way to clean the search filters
  • vue-router.esm.js:2062 Uncaught (in promise) Error: Avoided redundant navigation to current location: "/library?languages=es". error after searching filtering by language or channel

I think all these problems are related to different PR or issues, but I have doubts with the latest one. If this PR is not causing this issue, we can approve it

@rtibbles
Copy link
Member Author

Yeah, I don't think any of these are caused by this PR.

@rtibbles rtibbles force-pushed the content_metadata_optimizations branch from a6446ad to cdf29f9 Compare October 26, 2021 22:39
@rtibbles rtibbles marked this pull request as ready for review October 26, 2021 22:39
@rtibbles rtibbles added TODO: needs review Waiting for review and removed work-in-progress Not ready for review labels Oct 26, 2021
@rtibbles rtibbles marked this pull request as draft October 27, 2021 21:27
@rtibbles
Copy link
Member Author

Reverting to draft here until I can update the bit generation to come stably and consistently from le-utils.

@rtibbles rtibbles marked this pull request as ready for review October 28, 2021 23:54
Update annotation to fix bug.
@rtibbles rtibbles force-pushed the content_metadata_optimizations branch from 0aadfb4 to e2088c4 Compare October 29, 2021 01:47
@rtibbles rtibbles merged commit 50f8b24 into learningequality:release-v0.15.x Oct 29, 2021
@rtibbles rtibbles deleted the content_metadata_optimizations branch October 29, 2021 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO: needs review Waiting for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants