Adds Taxonomy, Tag, ObjectTag models and APIs #57

pomegranited · 2023-06-13T08:10:07Z

Description

Implements the Taxonomy, Tag, and ObjectTag models, APIs, and permissions described in the discovery document for the tagging app.

Supporting information

Relates to openedx/modular-learning#63

Testing instructions

Manual testing is possible but not very useful, since we don't yet have views for these models and APIs.

Ensure that all models and methods from the discovery doc are implemented.
Ensure that the tests cover the expected behavior of the models, APIs and permissions as described in the discovery doc.

Other information

The import_tags, export_tags and autocomplete_tags will be added by a subsequents PRs

SystemTaxonomy will be added by a subsequent PR as a subclass of Taxonomy.

ContentTaxonomy and ContentTag will subclass these models, and will be implemented in edx-platform openedx.features.content_tagging.

To do before merge

use multi-collation field types added by Adjust models to properly support MySQL #51
squash migrations into a single 0001_initial.py
squash and merge this change (don't just merge)

openedx-webhooks · 2023-06-13T08:10:11Z

Thanks for the pull request, @pomegranited! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

supporting documentation
Open edX discussion forum threads
timeline information ("this must be merged by XX date", and why that is)
partner information ("this is a course on edx.org")
any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

bradenmacdonald

Very cool :) I am liking this approach.

openedx_tagging/core/tagging/models.py

ChrisChV · 2023-06-16T19:07:05Z

Other information

SystemTaxonomy will be added by a subsquent PR as a subclass of Taxonomy.

ContentTaxonomy and ContentTag will subclass these models, and will be implemented in edx-platform openedx.features.tagging.

Also the import_tags, export_tags and autocomplete_tags will be added by a subsequents PRs

ChrisChV · 2023-06-16T19:07:51Z

@pomegranited Great work! Looks good 👍

pomegranited · 2023-06-15T02:31:17Z

docs/decisions/0009-tagging-administrators.rst

@@ -26,6 +26,8 @@ But because permissions #2 + #3 require access to the edx-platform CMS model `Co

 Per `OEP-9`_, ``openedx_tagging`` will allow applications to use the standard Django API to query permissions, for example: ``user.has_perm('openedx_tagging.edit_taxonomy', taxonomy)``, and the appropriate permissions will be applied in that application's context.

+These rules will be enforced in the tagging `views`_, not the API or models, so that external code using this library need not have a logged-in user in order to call the API. So please use with care.


@ChrisChV CC @ormsbee @bradenmacdonald

Does this seem reasonable?

@pomegranited It's reasonable to me. Maybe it is good to add this also as a comment in the API so that it is taken into account when using it

openedx_tagging/core/tagging/models.py

pomegranited · 2023-06-19T05:47:20Z

openedx_tagging/core/tagging/migrations/0001_initial.py


    operations = [
        migrations.CreateModel(
-            name='TagContent',
+            name="TagContent",


JV TODO: Since we don't need this table at all, omit this and merge all the migrations down to this 0001_initial.py file.

openedx_tagging/core/tagging/models.py

bradenmacdonald · 2023-06-20T22:47:48Z

openedx_tagging/core/tagging/api.py

+    Reconciles all stored ObjectTag entries with any changes made to their associated taxonomies and tags.
+    """
+    num_changed = 0
+    for object_tag in ObjectTag.objects.all():


If there are a lot of ObjecTag rows, this has the downside of loading them all from the database and then converting them all into python objects before iterating over them. It seems like using for object_tag in ObjectTag.objects.all().iterator() is better:

For a QuerySet which returns a large number of objects that you only need to access once, this can result in better performance and a significant reduction in memory.

However, it's still not as good as I hoped:

MySQL doesn’t support streaming results, hence the Python database driver loads the entire result set into memory.

What?! Yet another point for postgres...

@bradenmacdonald Cool, I didn't know about that! Will fix this.

MySQL doesn’t support streaming results, hence the Python database driver loads the entire result set into memory.

What?! Yet another point for postgres...

My favourite related comment so far came from this issue: https://code.djangoproject.com/ticket/2495#comment:14

After eight years, and considering that the fix is far from easy, could we file this under "use a better database" and close the ticket?

openedx_tagging/core/tagging/models.py

pomegranited · 2023-06-21T02:35:21Z

@ChrisChV My apologies for my rebase on main.. I've rolled it back and did a merge with main instead, so you should still be able to change the base branch of your open PRs to this one to ease reviews there.

But I'll need to do a rebase before I merge here so I can squash those migrations. Awaiting word on whether we need a uuid field before I do that.

openedx_tagging/core/tagging/models.py

ormsbee · 2023-06-21T03:42:47Z

openedx_tagging/core/tagging/api.py

+    Reconciles all stored ObjectTag entries with any changes made to their associated taxonomies and tags.
+    """
+    num_changed = 0
+    for object_tag in ObjectTag.objects.all().iterator():


I realize I've been a bit in-and-out of the taxonomy/tagging model discussions, so please pardon me if you folks have already gone over this:

I wonder if this could be less expensive if we make one more layer of model indirection between the Object and the Taxonomy's Tag. Right now, it seems like we have the global representation of a Tag, and a per-tagged-thing copy in case the global thing goes away. We need that because the Taxonomies+Tags ownership and lifecycle is different from ObjectTags. The process of reconciliation is between per-object copies and the taxonomy's canonical Tag.

But do we really need that many copies? Can't we keep a more normalized representation as long as we create a copy that follows the same ownership as the ObjectTags in question?

What if we kept a copy of the Taxonomy at the layer of the importable/exportable thing (e.g. library, course, learning package)? Take a course for example. Let's say:

ObjectTags don't have local copies of the name.

All the ObjectTags for course content is against course-level Taxonomies.

Some of those are CourseTaxonomies that are entirely owned and internal to the course.

Some of those are CourseTaxonomies that serve as links-with-copies to other Taxonomies (CourseTaxonomies controlled by other courses, OrgTaxonomies, SystemTaxonomies).

Inconsistencies within the Course on import are errors (they can be ignored and the values dropped, or they can stop the process, but there's no reconciliation).

Reconciliation for updated values happens between Taxonomies.

That would significantly cut down on the number of rows that would have to be iterated over during any sort of reconciliation process. Would that be helpful?

Hi @ormsbee , we haven't gone over an option like this, so thank you for raising it! And you're right, it is storage-heavy to store copies of the name:value on each ObjectTag, which will store the most rows of any model in this package.

But, this does seem more complicate, so do we care that much about storage space?

Right now, it seems like we have the global representation of a Tag, and a per-tagged-thing copy in case the global thing goes away.

We have three uses for the ObjectTag._value field:

What you said ^

We can import tagged content without having to have a Taxonomy with these Tags in place before importing.

Free-text tags can be anything the content author wants, and are not hierarchical.
So we don't store them as Tags, we just create an ObjectTag with _value="free text value" and tag=None. This also avoids race conditions; we don't have to worry about creating both a Tag and an ObjectTag when free-text tagging an object.

(ObjectTag._name is also in the model for reasons 1 & 2.)

But there's a lot of dancing going on around this _value field, which likely means it's trying to do too much.

If I understand you correctly, you're suggesting something like:

Taxonomy: canonical source of a taxonomy's tags and metadata. Taxonomy Admins create these, import/export tags.

ContentTaxonomy: references a source Taxonomy`, and keeps a copy of its Tag tree, e.g. as ContentTaxonomyTag

ContentTag: references a ContentTaxonomy and a ContentTaxonomyTag.

ContentTaxonomy objects know how to reconcile changes made to its source Taxonomy or the source Taxonomy's Tag without losing ContentTag data.

So this deals with point #1 above, but how should we handle points #2 and #3?

My assumption was that reconciliation would be quite a rare process. If that's the case, I would strongly lean toward the simpler version, even if it uses a bit more space for storage. (Tags are usually quite small so I don't think the space is an issue.)

Even if it's not that rare, I think there's a simpler approach: If it's ObjectTag.objects.all() that's the problem, I suggest changing the reconciliation API to work on a single taxonomy or a single course at a time, rather than all tags of all objects. So for example it could take a course ID as its argument, and then do the reconciliation based on ObjectTag.objects.filter(object_type=..., object_id__startswith="block-v1:the+course+ID@"). Then you get much more efficient (and targeted) reconciliation at appropriate times (e.g. after a course import), and the data model is simple and easy to reason about.

What about taxonomy import? It seems like you might need to iterate over all the ObjectTags in that case, to see if any of them should be "upgraded" to new values in the taxonomy based on their name and value. But I think you can actually do a much more targeted iteration by only iterating the ObjectTag that have either (taxonomy_id is None) or (taxonomy_id==imported_taxonomy_id and tag_id is None). Which is again much much faster than iterating all the ObjectTags, because I expect that the vast majority of ObjectTags will be filtered out by that criteria (because they're linked to a taxonomy and/or a tag).

@bradenmacdonald

If that's the case, I would strongly lean toward the simpler version, even if it uses a bit more space for storage.

That's my thinking too, and hence this PR.. but I am open to other options @ormsbee .

Even if it's not that rare, I think there's a simpler approach: If it's ObjectTag.objects.all() that's the problem, I suggest changing the reconciliation API to work on a single taxonomy or a single course at a time, rather than all tags of all objects.

👍 to adding filter options to the resync_tags method.

What about taxonomy import?

Perfect use of this, yes! We can use that as part of the import taxonomy API.

Thank you!

My apologies: I thought I replied to this yesterday but I never submitted my comment.

@bradenmacdonald: Yes, my concern was much more about the number of records that need to be updated (and any potential locking), and not the space usage.

@pomegranited:

We can import tagged content without having to have a Taxonomy with these Tags in place before importing.

Yeah, what I was proposing assumes that we'd have local Taxonomies/Tags in place for import, because those taxonomies can be locally controlled at the scope the import is happening.

Free-text tags can be anything the content author wants, and are not hierarchical.
So we don't store them as Tags, we just create an ObjectTag with _value="free text value" and tag=None. This also avoids race conditions; we don't have to worry about creating both a Tag and an ObjectTag when free-text tagging an object.

I'm fuzzy on this part. It doesn't have to block this PR, but don't we have to worry about race conditions where we're making the same free text value ObjectTag for the same object multiple times? It seems like it would benefit from being a separate model altogether, so we could put a uniqueness constraints around the freeform tag and the thing being tagged, independent of a taxonomy?

To be clear: I'm fine with the proposal to keep the general shape of the data structures now and do filtering during reconciliation. We can always add in the local-taxonomy-copy thing later if it turns out we want it. The questions I'm asking now are just to improve my understanding.

It seems like it would benefit from being a separate model altogether, so we could put a uniqueness constraints around the freeform tag and the thing being tagged, independent of a taxonomy?

I'll give it some thought.. originally when I was doing the discovery, I thought we'd end up having to subclass all of these base classes, which would lead to lots and lots of classes, e.g. SystemTaxonomy, SystemTaxonomyTag, SystemTaxonomyObjectTag, ... And so I was trying to reduce the number of classes used at the base here. But now that I've gotten into the ContentTaxonomy implementation, I didn't even need to create a ContentTag class, because all of the ObjectTag validation is handled by the taxonomy. We'll still need a FreeFormTaxonomy of some sort, even if it has no tags, because that model is where the namespace for the object tags are stored.

Unfortunately we're running short on time for this phase 1, and so I'd rather not tweak it too much more until we actually have a frontend that's using it. But if I find that the frontend is having to do a lot of `if freeform then do this else do something else" before calling the backend, then separating them out will make much more sense.

openedx_tagging/core/tagging/models.py

pomegranited · 2023-06-22T07:28:09Z

Thank you for all your reviews and advice, @ChrisChV @ormsbee and @bradenmacdonald !

I think this is ready to merge now -- if you agree, could someone please squash+merge this? I haven't squashed and rebased the commits here to keep the conversations working, but I can do that if it's preferred.

ormsbee · 2023-06-22T13:17:27Z

@pomegranited: Could you please resolve merge conflicts? (I think it's fine for you to squash.)

Also: * Adds compile-requirements make target and updates (without upgrading) the requirements, which pulled in some other requirements, since it's been a while since they were compiled. * Adds ddt as a test dependency. * Replaces the placeholder TagContent class, and squashes migrations down to a single file * use the openedx_learning.lib multi-collation char and text fields

Also: * Adds rules requirement and app settings to enable it * Adds mock to test requirements, so we can test system taxonomy rules * ADR: Clarifies that rules will be enforced in the views, not the model or APIs

Tests are failing in openedx on sqlite.

ormsbee · 2023-06-27T02:02:46Z

@pomegranited: is this okay for me to rebase+merge now?

pomegranited · 2023-06-27T03:50:15Z

@ormsbee Yes please! Sorry for the delay, I didn't think you were even up :)

openedx-webhooks · 2023-06-27T04:10:08Z

@pomegranited 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

ormsbee · 2023-06-27T04:10:13Z

I'm a parent. I keep weird hours sometimes. 😉

openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Jun 13, 2023

pomegranited force-pushed the jill/taxonomy-api branch 7 times, most recently from a4910b3 to 12ce0d2 Compare June 15, 2023 02:25

pomegranited marked this pull request as ready for review June 15, 2023 02:26

bradenmacdonald reviewed Jun 15, 2023

View reviewed changes

openedx_tagging/core/tagging/models.py Outdated Show resolved Hide resolved

pomegranited force-pushed the jill/taxonomy-api branch from d499752 to eaee7e4 Compare June 19, 2023 07:30

pomegranited commented Jun 19, 2023

View reviewed changes

ChrisChV mentioned this pull request Jun 19, 2023

import/export Taxonomy API functions #58

Closed

pomegranited mentioned this pull request Jun 20, 2023

feat: adds content tagging app, models, and API openedx/edx-platform#32518

Closed

ChrisChV mentioned this pull request Jun 20, 2023

Autocomplete tags API function #59

Merged

bradenmacdonald reviewed Jun 20, 2023

View reviewed changes

pomegranited commented Jun 21, 2023

View reviewed changes

openedx_tagging/core/tagging/models.py Show resolved Hide resolved

pomegranited force-pushed the jill/taxonomy-api branch 2 times, most recently from c37f9f4 to 25d4a62 Compare June 21, 2023 02:32

ormsbee reviewed Jun 21, 2023

View reviewed changes

openedx_tagging/core/tagging/models.py Outdated Show resolved Hide resolved

ormsbee reviewed Jun 21, 2023

View reviewed changes

openedx_tagging/core/tagging/models.py Show resolved Hide resolved

ormsbee reviewed Jun 21, 2023

View reviewed changes

openedx_tagging/core/tagging/models.py Show resolved Hide resolved

ormsbee reviewed Jun 21, 2023

View reviewed changes

ChrisChV mentioned this pull request Jun 21, 2023

import/export Taxonomy API functions open-craft/openedx-learning#1

Closed

bradenmacdonald reviewed Jun 21, 2023

View reviewed changes

openedx_tagging/core/tagging/models.py Outdated Show resolved Hide resolved

pomegranited force-pushed the jill/taxonomy-api branch from 1d01dfc to 80060df Compare June 22, 2023 07:41

build: added openedx_tagging to build and test config files

72ce457

pomegranited force-pushed the jill/taxonomy-api branch from 198bea7 to 1551cea Compare June 25, 2023 02:40

pomegranited added 2 commits June 26, 2023 17:05

feat: adds django-rules based permissions for tagging app

ac388bb

Also: * Adds rules requirement and app settings to enable it * Adds mock to test requirements, so we can test system taxonomy rules * ADR: Clarifies that rules will be enforced in the views, not the model or APIs

pomegranited force-pushed the jill/taxonomy-api branch from 1551cea to ac388bb Compare June 26, 2023 07:35

fix: remove db_collation check

a1a90d2

Tests are failing in openedx on sqlite.

ormsbee approved these changes Jun 27, 2023

View reviewed changes

ormsbee merged commit 821a53c into openedx:main Jun 27, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Taxonomy, Tag, ObjectTag models and APIs #57

Adds Taxonomy, Tag, ObjectTag models and APIs #57

pomegranited commented Jun 13, 2023 •

edited

Loading

openedx-webhooks commented Jun 13, 2023 •

edited

Loading

bradenmacdonald left a comment

ChrisChV commented Jun 16, 2023

ChrisChV commented Jun 16, 2023

pomegranited Jun 15, 2023

ChrisChV Jun 20, 2023

pomegranited Jun 19, 2023

bradenmacdonald Jun 20, 2023

pomegranited Jun 21, 2023

pomegranited commented Jun 21, 2023

ormsbee Jun 21, 2023 •

edited

Loading

pomegranited Jun 21, 2023

bradenmacdonald Jun 21, 2023

pomegranited Jun 22, 2023 •

edited

Loading

ormsbee Jun 22, 2023

ormsbee Jun 22, 2023 •

edited

Loading

pomegranited Jun 26, 2023

pomegranited commented Jun 22, 2023

ormsbee commented Jun 22, 2023

ormsbee commented Jun 27, 2023

pomegranited commented Jun 27, 2023

openedx-webhooks commented Jun 27, 2023

ormsbee commented Jun 27, 2023

		@@ -26,6 +26,8 @@ But because permissions #2 + #3 require access to the edx-platform CMS model `Co

		Per `OEP-9`_, ``openedx_tagging`` will allow applications to use the standard Django API to query permissions, for example: ``user.has_perm('openedx_tagging.edit_taxonomy', taxonomy)``, and the appropriate permissions will be applied in that application's context.

		These rules will be enforced in the tagging `views`_, not the API or models, so that external code using this library need not have a logged-in user in order to call the API. So please use with care.

Adds Taxonomy, Tag, ObjectTag models and APIs #57

Adds Taxonomy, Tag, ObjectTag models and APIs #57

Conversation

pomegranited commented Jun 13, 2023 • edited Loading

Description

Supporting information

Testing instructions

Other information

To do before merge

openedx-webhooks commented Jun 13, 2023 • edited Loading

bradenmacdonald left a comment

Choose a reason for hiding this comment

ChrisChV commented Jun 16, 2023

ChrisChV commented Jun 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pomegranited commented Jun 21, 2023

ormsbee Jun 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pomegranited Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormsbee Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pomegranited commented Jun 22, 2023

ormsbee commented Jun 22, 2023

ormsbee commented Jun 27, 2023

pomegranited commented Jun 27, 2023

openedx-webhooks commented Jun 27, 2023

ormsbee commented Jun 27, 2023

pomegranited commented Jun 13, 2023 •

edited

Loading

openedx-webhooks commented Jun 13, 2023 •

edited

Loading

ormsbee Jun 21, 2023 •

edited

Loading

pomegranited Jun 22, 2023 •

edited

Loading

ormsbee Jun 22, 2023 •

edited

Loading