Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System defined taxonomies #67

Merged
merged 28 commits into from
Aug 2, 2023

Conversation

ChrisChV
Copy link
Contributor

@ChrisChV ChrisChV commented Jul 21, 2023

Description

Subclasses related to build System-defined taxonomies:

  • Docs updated to reflect the new architecture of taxonomies.
  • models.py moved to models/base.py
  • tag_ref setter to allow override how to set a tag on an object_tag
  • object_tag_class property added to Taxonomy
  • Base system-defined subclasses added.
  • Language tags fixture.
  • Taxonomy.system_defined updated to be a hardcoded property.

Testing instructions

Manual testing is possible but not very useful, since we don't yet have views for these models and APIs.

But you can check that the added tests cover the expected polymorphism behavior supported by this change, and that tests are still covering 100% of the openedx_tagging module.

Other information

Relates to openedx/modular-learning#77

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Jul 21, 2023
@openedx-webhooks
Copy link

Thanks for the pull request, @ChrisChV! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

ChrisChV and others added 15 commits July 21, 2023 10:38
* Adds Taxonomy.object_tag_class property, which can be overridden by
  Taxonomy subclasses to provide their own ObjectTag subclasses.
* Replaces the Taxonomy._set_tag method with an ObjectTag.tag_ref
  property setter, so that ObjectTag subclasses can dynamically create
  Tags as needed during tag_object()
* Moves the tag_class_model and tag_class_value properties to a new
  ModelObjectTag abstract class, so the ModelSystemDefinedTaxonomy can
  be simpler.
* Updates tests for coverage
removing the need for _get_tags_query_set
@ChrisChV ChrisChV force-pushed the chris/system-defined-taxonomies branch from 3d23976 to 0c74bb4 Compare July 21, 2023 15:40
- Taxonomy.system_defined updated to be a hardcoded property.
- TaxonomyAdmin updated to avoid edit and delete system taxonomies.
- TagAdmin updated to avoid edit and delete tags from system taxonomies.
- TagAmdin updated to avoid create Tags on system taxonomies.
Copy link
Contributor

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This is looking great @ChrisChV !

I left some suggestions and a nits, but if you could address them, I think it's ready for upstream review. Feel free to merge open-craft#8 or close and rewrite as you see fit?

  • I tested this with the PR test instructions, and by ensuring code coverage for automated tests.
  • I read through the code and tests
  • I checked for accessibility issues N/A
  • Includes documentation
  • Commit structure follows OEP-0051

openedx_tagging/core/tagging/admin.py Outdated Show resolved Hide resolved
"""
Returns True if the instance is valid
"""
return super()._check_tag(object_tag) and self._check_instance(object_tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about post-MVP:

I totally agree that for a ModelObjectTag to be fully valid, the instance it refers to needs to exist.
But I'm worried about the extra database call required to check this -- e.g if we decide to hide "invalid" object tags from learners, then we could be calling this method a lot.

So I don't think this needs changing right now, but when we know more about how we want to use tags in the LMS, we might decide to change this down the track.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During day-to-day operations, does it even matter if there are ModelObjectTags that refer to non-existent objects?

Certainly when displaying tags for a particular (existing) content object, I think we want to be able to hide invalid tags whose taxonomy doesn't exist or doesn't match. But if there are a bunch of extra ObjectTags in the database pointing to deleted content, does it really matter? We can just have some kind of cleanup script that runs once in a while, to delete them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I thought this was talking about tagging non-existing content but it's talking about verifying if the Tag is valid (e.g. an author tag on an XBlock is invalid if the author has been deleted). Yeah we do want to check that.

openedx_tagging/core/tagging/models/base.py Show resolved Hide resolved
to ensure that we're using the correct Taxonomy and ObjectTag class types
@pomegranited
Copy link
Contributor

Thanks for applying those changes @ChrisChV ! I think this is ready for @ormsbee and/or @bradenmacdonald 's eyes now.

Copy link
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably won't get time for a full review today, but I started reading through the code.

@@ -431,6 +447,7 @@ class ObjectTag(models.Model):
id = models.BigAutoField(primary_key=True)
object_id = case_insensitive_char_field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this, but why is object_id case insensitive? Shouldn't it be case sensitive? abc and ABC are usually different IDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok.. I think that came from a misunderstanding about content UsageKeys.. I thought they were case insensitive.

@ChrisChV Do you mind to fix this here? Sorry there's been so many little mistakes like this..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradenmacdonald Are you ok with the other case-insensitive char fields defined here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix that in a small, focused, separate PR. Just add it to the list you're making. I don't want to muddy the waters or delay this PR any longer than I already have, and smaller focused PRs are usually better.

I am not sure about _name but I guess it makes sense to be case-insensitive.
If _value holds IDs it should be case-sensitive I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return True


class ModelObjectTag(ObjectTag):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with ContentObjectTag?

I don't really like that sometimes the type of taxonomy (system-defined or not) determines which subclass of ObjectTag to use (ModelObjectTag), and other times the type of content (XBlock or Course) determines it (ContentObjectTag). I know that they're both driven by the taxonomy actually, since ContentTaxnonomy is a type of Taxonomy too, but it seems like there's some conceptual confusion. (I'm not asking for changes just yet, but trying to understand what the idea is here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. maybe ModelObjectTag's functionality should be in a mixin instead, so that we can have ModelContentObjectTags..

But that probably also means that the SystemTaxonomy / ModelSystemTaxonomy functionality should be in a mixin too, same reason.. is this getting to complicated?

I really want to do some refactoring/simplification of the Taxonomy vs ObjectTag distinctions we ended up with.. but we don't have time for it right now. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#. Tags are created on the fly when new ObjectTags are added.
#. Tag.external_id we store an identifier from the instance (eg. User.pk).
#. Tag.value we store a human readable representation of the instance (eg. User.username).
#. Resync the tags to re-fetch the value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I definitely like the approach described here ^^

But what I'm still caught up on is ModelObjectTag. Do we need to subclass ObjectTag and push this logic into it? From reading the approach described here, and from what we discussed, I assumed we'd happily just use the normal ObjectTag or even ContentObjectTag. Because we're creating Tag instances as needed, and the regular ObjectTag already works correctly with Tag instances in Taxonomies.

Of course you need some logic to:

  • list available tags even before they've been created as Tag instances: e.g. list all known users or languages → this belongs in the Taxonomy subclass but I don't think it's even necessary for this project, because we never need to display the "available" tags of system taxonomies to users. The tags will get auto-created by the system and the system already knows what user/language is in use. There's no UI and no dropdown menu to show.
  • auto-create Tag instances when tagging an object, if they don't yet exist. Sure, but this logic could live anywhere - in the edx-platform code that's applying the User and Language tags, in the tag_object() API, in the Taxonomy.tag_object() API - I don't think it is necessary to put this into an ObjectTag subclass, and doing so creates a lot of complexity.
    • Think MVP, and what's the simplest solution? Just auto-create the tag in the same place you apply the tag to some content. No special behavior needed in this repo at all. In fact I almost wonder if the User Taxonomy is just a regular closed taxonomy that's not editable, and the platform code can create Tags in it just-in-time... What other functionality do we need here?
  • auto-delete Tag instances when the upstream object is deleted: we don't really need to worry about this for now, as languages won't be deleted and user deletion is very rare. In fact, I don't think user deletion even really happens; it's just retirement/anonymization that happens per GDPR. So in the future we could add a signal listener that detects "user retirement" and then deletes any Tag and ObjectTags associated with the user, but I would leave that out of scope for now. We don't even really know how/where such user tags will be used yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with everything you've said here, except for:

Just auto-create the tag in the same place you apply the tag to some content. No special behavior needed in this repo at all.

I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.

And I think it belongs in this openedx-learning library, because it's not content-specific and could be useful when tagging other types of objects, like people or forum posts. However, if it makes more sense for this to live in edx-platform for the MVP, I'm OK with that, we can always move it into openedx-learning later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list available tags even before they've been created as Tag instances: e.g. list all known users or languages → this belongs in the Taxonomy subclass but I don't think it's even necessary for this project, because we never need to display the "available" tags of system taxonomies to users. The tags will get auto-created by the system and the system already knows what user/language is in use. There's no UI and no dropdown menu to show.

That's true except for language, it's the only system-defined taxonomy that can be "edited" in a content by the content author.

I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.

I agree with Jill too, in addition to the auto creation, it checks that the instance (User, Organization) exists to see if the tag is valid.

auto-delete Tag instances when the upstream object is deleted: we don't really need to worry about this for now, as languages won't be deleted and user deletion is very rare. In fact, I don't think user deletion even really happens; it's just retirement/anonymization that happens per GDPR. So in the future we could add a signal listener that detects "user retirement" and then deletes any Tag and ObjectTags associated with the user, but I would leave that out of scope for now. We don't even really know how/where such user tags will be used yet.

I agree, the simplest thing is to check if the user or organization exists.

To clarify: the ModelTaxonomy is not used for languages, it is only used for the taxonomy of users and organizations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.

👍🏻 Sure, good point. That's totally reasonable, and that sounds like a mixin to me. After all, in the platform if we keep the "ContentTaxonomy" vs "regular Taxonomy" distinction, it would have to be a mixin, to allow Content tagging with auto-creation of tags.

That's true except for language, it's the only system-defined taxonomy that can be "edited" in a content by the content author.

Oh OK. I've asked for clarification on that because I heard the opposite. But in any case, it seems we agree that "list available tags" should live in the Taxonomy. (I think it should be similar to what you implemented in LanguageTaxonomy.get_tags() -> List[Tag] but we'll probably eventually need an API that returns tag strings/IDs, not Tag objects because they may not exist yet. Imagine doing an auto-complete to tag users and there are 1,000,000 users in the system - we don't want to create Tag objects for each user unless they're actually used as a tag. Or maybe that would be a separate search_available_tags API. In any case, future PR.)

I agree with Jill too, in addition to the auto creation, it checks that the instance (User, Organization) exists to see if the tag is valid.

That's fine. I do think however that belongs in the Taxonomy, not the ObjectTag itself. Because it's mostly a question of validating the User/Org then creating the corresponding Tag instance. The ObjectTag can validate itself with the relation to the Tag and doesn't need any special logic to handle ongoing validation related to the User/Org. If necessary, the Taxonomy can have some code to keep the Tag instances in sync with the User/Org models, but I think we can leave that out of scope for now to keep it simple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.
That's totally reasonable, and that sounds like a mixin to me. After all, in the platform if we keep the "ContentTaxonomy" vs "regular Taxonomy" distinction, it would have to be a mixin, to allow Content tagging with auto-creation of tags.

👍

Or maybe that would be a separate search_available_tags API. In any case, future PR.)

Yep, agreed -- and note we don't need to search model tags in MVP (User and Organization system taxonomies are our only model system taxonomies, and neither are editable by the content authors).

I do think however that belongs in the Taxonomy, not the ObjectTag itself...but I think we can leave that out of scope for now to keep it simple.

Cool, thanks for your clarifications here @bradenmacdonald . I'll note this on my "cleanup" task: openedx/modular-learning#85

@@ -119,16 +138,16 @@ class TestModelTagTaxonomy(TestTagTaxonomyMixin, TestCase):

def test_system_defined(self):
assert not self.taxonomy.system_defined
assert self.system_taxonomy.system_defined
assert self.system_taxonomy.cast().system_defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needing to remember this .cast is likely to be a problem in the future.

My recommendation (for you to consider in the future!) is that you add a decorator like @auto_cast to the base class properties like .system_defined that will detect when the developer forgot to .cast() and either auto-cast or throw an exception. Probably the latter to be more predictable.

Copy link
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so: I don't think there's a good justification for ModelObjectTag and I'd like to see you either prove that it's important to the design despite the complexity it adds, or just remove it. I think everything else we've landed on makes sense.

You can merge this now and remove ModelObjectTag soon in a separate PR, or remove it now, or justify why we want to keep it.

@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented Jul 26, 2023

One thing to think about for the future:

I think that our conceptual model may be missing a key piece: a model to track how a Taxonomy should be used.

Right now, there's no place to say "I want to use the Language Taxonomy to tag Content Objects belonging to X organization" or "I want users from org X to use OrgXTaxonomy to tag Course objects belonging to the organization" or "I want users from org Y to use the SpokenLanguage taxonomy to tag learners".

If we had a TaxonomyUseCase (?) model, it could specify who, what, why, where -> which org/users can use which taxonomy for which purpose to tag what type of objects using which ObjectTag subclass.

That would solve the weird problem now where if you use LanguageSystemTaxonomy to tag an XBlock, it gets a plain ObjectTag, but if you use an org's closed taxonomy to tag the same XBlock, it gets a ContentObjectTag. I think both should be using ContentObjectTag if they're both tagging content objects, but currently you have to specify the ObjectTag subclass at the Taxonomy level, not as part of defining a particular use case for the taxonomy.

Anyhow, I'm not saying to do it this way or to launch a major refactor. Just keep this idea in mind while we're iterating the models in the future, in case it resonates with you or inspires something.

@ChrisChV
Copy link
Contributor Author

You can merge this now and remove ModelObjectTag soon in a separate PR, or remove it now, or justify why we want to keep it.

@bradenmacdonald I also agree that it should be removed. @pomegranited added it to do it in a separate PR: #67 (comment)

That would solve the weird problem now where if you use LanguageSystemTaxonomy to tag an XBlock, it gets a plain ObjectTag, but if you use an org's closed taxonomy to tag the same XBlock, it gets a ContentObjectTag. I think both should be using ContentObjectTag if they're both tagging content objects, but currently you have to specify the ObjectTag subclass at the Taxonomy level, not as part of defining a particular use case for the taxonomy.

We are currently solving this by creating subclasses for tagging content. But yes, I agree that there must be a simpler way to do it.

@bradenmacdonald
Copy link
Contributor

We are currently solving this by creating subclasses for tagging content. But yes, I agree that there must be a simpler way to do it.

@ChrisChV OK, somehow I really didn't understand that until now. So to tag content with the (built-in, system defined) language Taxonomy, we cast it to ContentLanguageTaxonomy which is still the same taxonomy, but we now have a different "view" of that Taxonomy which is now using ContentObjectTag to tag objects? Is that right? I guess that works, and I'm fine with that for now, but it seems like a strange use of subclassing to me, as there's nothing really different about ContentLanguageTaxonomy vs LanguageTaxonomy except for what type of ObjectTag it applies, and that only affects the add_tag action, not what the overall Taxonomy is. I thought that just using tag_content_object() would be enough to apply a ContentObjectTag instead of an ObjectTag.

Like if you had a Person class and the person is signing autographs, and now you want the person to sign autographs using blue ink instead of black ink, it makes much more sense to say Person.sign_autograph(book_to_sign, using_ink=Ink.Blue) than to say Person.cast_to(BlueInkUsingPerson).sign_autograph(book_to_sign).

I have already approved this PR, but I just noticed there is no test that actually applies some system-defined tags to content objects. Shouldn't we add that in?

@ChrisChV
Copy link
Contributor Author

@bradenmacdonald Thanks for your feedback!

OK, somehow I really didn't understand that until now. So to tag content with the (built-in, system defined) language Taxonomy, we cast it to ContentLanguageTaxonomy which is still the same taxonomy, but we now have a different "view" of that Taxonomy which is now using ContentObjectTag to tag objects? Is that right?

Yes, it is.

I have already approved this PR, but I just noticed there is no test that actually applies some system-defined tags to content objects. Shouldn't we add that in?

Yes, sorry, I added the system defined to the api test, that includes tag objects: 034726d

@pomegranited
Copy link
Contributor

Sorry @ChrisChV one last change request: could you bump the version to "0.1.1"? We'll tag it after merging to produce a new release.

@pomegranited pomegranited merged commit 04225e7 into openedx:main Aug 2, 2023
6 checks passed
@pomegranited pomegranited deleted the chris/system-defined-taxonomies branch August 2, 2023 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

4 participants