Content metadata optimizations #8492

rtibbles · 2021-10-11T23:31:10Z

Summary

Adds BigIntegerField bitmask columns for every 64 allowed labels in each new categorical metadata column
Partitions the labels between the columns and assigns a bit value to each based on position in the list of labels represented in that column
Adds a has_labels method to the ContentNode queryset that does bitwise matching of one or more labels for a specific categorical metadata field
Updates API endpoints to use this - seems to give a roughly 50x speedup over LIKE lookups
Adds upgrade and annotation logic for setting these bitmasks on upgrade and at channel import time
Uses these bitmasks for the 'available label' queries also
Adds a new 'ancestors' JSON field onto ContentNodes (but not on the importable schema)
Adds function to annotate the ancestors fields for ContentNodes in a channel
Adds an upgrade step to apply to all existing channels
Adds to channel import to run this step after channel import
Removes ancestor querying logic from API endpoints and just loads this value directly instead, this saves a query that was particularly poorly performing for search results, due to the need to lookup ancestors across multiple tree locations - eliminating this allows search results to return in ~6ms on my dev machine, if the cache for available labels is hot.

References

Follow up to #8488

Reviewer guidance

Any areas of the code where more comments might be useful?
Any way to optimize the available labels code further?
Still needs tests.

Testing checklist

Contributor has fully tested the PR manually
If there are any front-end changes, before/after screenshots are included
Critical user journeys are covered by Gherkin stories
Critical and brittle code paths are covered by unit tests

PR process

PR has the correct target branch and milestone
PR has 'needs review' or 'work-in-progress' label
If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
If this is an important user-facing change, PR or related issue has a 'changelog' label
If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

Automated test coverage is satisfactory
PR is fully functional
PR has been tested for accessibility regressions
External dependency files were updated if necessary (yarn and pip)
Documentation is updated
Contributor is in AUTHORS.md

jredrejo

It's impossible to test this PR using the frontend as https://github.com/learningequality/kolibri/blob/release-v0.15.x/kolibri/plugins/learn/assets/src/views/EmbeddedSidePanel/ActivityButtonsGroup.vue#L56 is broken (plugin_data.learningActivities is undefined)
is there a known workaround (appart from trying to debug and solve it) ?

kolibri/core/content/migrations/0029_metadata_bitmasks.py

kolibri/core/content/models.py

rtibbles · 2021-10-19T15:55:15Z

Sorry @jredrejo - looks like I need to map from snake case to camelCase after my last commit, will update.

jredrejo · 2021-10-19T18:10:36Z

Sorry @jredrejo - looks like I need to map from snake case to camelCase after my last commit, will update.

thanks, let me know when I can test it

rtibbles · 2021-10-22T14:14:15Z

Have updated so that it can be tested - have not made any updates as yet changing the integer fields.

jredrejo

Code looks good to me (there's an odd linting error that has not been introduced in this commit and it's already resolved in the v0.15.x branch) so rebasing should solve it.
I have tested it as much as I have been able and searching, upgrades & migrations work. Performance is hard to compare without a previous reference but the app works fine on my laptop.

There are a good amount of frontend errors that seem to be related to things this PR is not affecting, but just in case, this is the list I have seen:

Continue missing icon errors at Sub.svgIconComponent
http://localhost:8080/es-es/learn/ page keeps in blank with the loading animation forever until one tab is clicked
No way to clean the search filters
vue-router.esm.js:2062 Uncaught (in promise) Error: Avoided redundant navigation to current location: "/library?languages=es". error after searching filtering by language or channel

I think all these problems are related to different PR or issues, but I have doubts with the latest one. If this PR is not causing this issue, we can approve it

rtibbles · 2021-10-26T22:38:49Z

Yeah, I don't think any of these are caused by this PR.

…ormance.

rtibbles · 2021-10-27T21:28:31Z

Reverting to draft here until I can update the bit generation to come stably and consistently from le-utils.

used for generating bits.

Update annotation to fix bug.

rtibbles added the work-in-progress Not ready for review label Oct 11, 2021

rtibbles added this to the 0.15.0 milestone Oct 11, 2021

rtibbles requested review from bjester and jredrejo October 11, 2021 23:31

rtibbles force-pushed the content_metadata_optimizations branch from 8b93dc6 to 4776c0f Compare October 12, 2021 20:35

marcellamaki changed the base branch from develop to release-v0.15.x October 12, 2021 22:18

rtibbles force-pushed the content_metadata_optimizations branch from c28f468 to 2d7833d Compare October 16, 2021 15:55

jredrejo reviewed Oct 19, 2021

View reviewed changes

kolibri/core/content/migrations/0029_metadata_bitmasks.py Show resolved Hide resolved

kolibri/core/content/models.py Show resolved Hide resolved

jredrejo reviewed Oct 26, 2021

View reviewed changes

rtibbles added 7 commits October 26, 2021 15:39

Use bitmasks to improve query performance for labels.

ded9d6d

Annotate ancestors onto content nodes to improve contentnode API perf…

81d09c3

…ormance.

Fix/add tests and behaviour for ancestor annotation and bitmask search.

193de39

Remove wrapper functions to reduce cache calls.

6c0b687

Make generated ancestor JSON robust to malformed topic trees.

a803a89

Properly map metadata labels.

98a6188

Coerce uuid to hex string for ancestor annotation.

cdf29f9

rtibbles force-pushed the content_metadata_optimizations branch from a6446ad to cdf29f9 Compare October 26, 2021 22:39

rtibbles marked this pull request as ready for review October 26, 2021 22:39

rtibbles added TODO: needs review Waiting for review and removed work-in-progress Not ready for review labels Oct 26, 2021

jredrejo approved these changes Oct 27, 2021

View reviewed changes

rtibbles marked this pull request as draft October 27, 2021 21:27

Update le-utils version so that we have a stable ordering of the lists

4837cde

used for generating bits.

rtibbles marked this pull request as ready for review October 28, 2021 23:54

Add tests for searching by all fields.

e2088c4

Update annotation to fix bug.

rtibbles force-pushed the content_metadata_optimizations branch from 0aadfb4 to e2088c4 Compare October 29, 2021 01:47

rtibbles merged commit 50f8b24 into learningequality:release-v0.15.x Oct 29, 2021

rtibbles deleted the content_metadata_optimizations branch October 29, 2021 02:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content metadata optimizations #8492

Content metadata optimizations #8492

rtibbles commented Oct 11, 2021 •

edited

jredrejo left a comment

rtibbles commented Oct 19, 2021

jredrejo commented Oct 19, 2021

rtibbles commented Oct 22, 2021

jredrejo left a comment

rtibbles commented Oct 26, 2021

rtibbles commented Oct 27, 2021

Content metadata optimizations #8492

Content metadata optimizations #8492

Conversation

rtibbles commented Oct 11, 2021 • edited

Summary

References

Reviewer guidance

Testing checklist

PR process

Reviewer checklist

jredrejo left a comment

Choose a reason for hiding this comment

rtibbles commented Oct 19, 2021

jredrejo commented Oct 19, 2021

rtibbles commented Oct 22, 2021

jredrejo left a comment

Choose a reason for hiding this comment

rtibbles commented Oct 26, 2021

rtibbles commented Oct 27, 2021

rtibbles commented Oct 11, 2021 •

edited