Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System defined taxonomies #67

Merged
merged 28 commits into from
Aug 2, 2023
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
ac21e1c
feat: System-defined taxonomies
ChrisChV Jul 3, 2023
55636f7
feat: Language and Author taxonomies
ChrisChV Jul 5, 2023
327bb58
feat: Generic system defined object tags and Language object tag added
ChrisChV Jul 7, 2023
58fc1e8
chore: get_tags_query_set added to LanguageObjectTag
ChrisChV Jul 10, 2023
9e7ed4e
chore: Adding _validate_taxonomy function to all system defined objec…
ChrisChV Jul 10, 2023
899d61f
chore: Updating system_defined_taxonomy_id
ChrisChV Jul 11, 2023
1e68687
refactor: consolidates ObjectTag validation
ChrisChV Jul 12, 2023
9e5d854
feat: System defined taxonomies subclasses
ChrisChV Jul 19, 2023
9b72fec
style: linting and formatting
pomegranited Jul 19, 2023
69eeece
refactor: use negative numbers as primary keys for system taxonomies …
pomegranited Jul 19, 2023
2a0b4b6
refactor: use ObjectTag subclasses where possible
pomegranited Jul 19, 2023
de3cc3e
refactor: LanguageTaxonomy overrides get_tags
pomegranited Jul 19, 2023
437ae98
docs: System taxonomy creation doc updated with Dynamic tags approach
ChrisChV Jul 20, 2023
8ca16ea
style: Updating comments
ChrisChV Jul 20, 2023
0c74bb4
style: Separating the models into base and system defined
ChrisChV Jul 20, 2023
9a38047
fix: Update language fixture to negative pk
ChrisChV Jul 21, 2023
1847248
feat: Updates on Taxonomy and Tag admins
ChrisChV Jul 21, 2023
eaaba78
feat: Instance validations on ModelSystemDefinedTaxonomy
ChrisChV Jul 21, 2023
bfc532a
feat: use Taxonomy.cast and ObjectTag.cast in rules
pomegranited Jul 24, 2023
c54cdda
fix: adds unique_together
pomegranited Jul 24, 2023
4f51683
fix: indexes
pomegranited Jul 24, 2023
a29d227
fix: Model pk validation on Model taxonomy
ChrisChV Jul 24, 2023
a34d401
style: comments and style
ChrisChV Jul 27, 2023
4b56ec8
feat: Creating language taxonomy on fixtures
ChrisChV Jul 31, 2023
034726d
test: Added system defined to api tests
ChrisChV Jul 31, 2023
8f1cac1
Merge branch 'main' into chris/system-defined-taxonomies
ChrisChV Jul 31, 2023
cde323e
test: Fixing tests after merge with main
ChrisChV Aug 1, 2023
8a9d241
chore: Package version update
ChrisChV Aug 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ include LICENSE.txt
include README.rst
include requirements/base.in
recursive-include openedx_learning *.html *.png *.gif *.js *.css *.jpg *.jpeg *.svg *.py
recursive-include openedx_tagging *.html *.png *.gif *.js *.css *.jpg *.jpeg *.svg *.py
recursive-include openedx_tagging *.html *.png *.gif *.js *.css *.jpg *.jpeg *.svg *.py *.yaml
27 changes: 15 additions & 12 deletions docs/decisions/0012-system-taxonomy-creation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ System Tag lists and validation
Each System-defined Taxonomy will have its own ``ObjectTag`` subclass which is used for tag validation (e.g. ``LanguageObjectTag``, ``OrganizationObjectTag``).
Each subclass can overwrite ``get_tags``; to configure the valid tags, and ``is_valid``; to check if a list of tags are valid. Both functions are implemented on the ``ObjectTag`` base class, but can be overwritten to handle special cases.

We need to create an instance of each System-defined Taxonomy's ObjectTag in a fixture. This instances will be used on different APIs.
We need to create an instance of each System-defined Taxonomy in a fixture. With their respective characteristics and subclasses.
The ``pk`` of these instances must be negative so as not to affect the auto-incremented ``pk`` of Taxonomies.

Later, we need to create content-side ObjectTags that live on ``openedx.features.content_tagging`` for each content and taxonomy to be used (eg. ``CourseLanguageObjectTag``, ``CourseOrganizationObjectTag``).
This new class is used to configure the automatic content tagging. You can read the `document number 0013`_ to see this configuration.
Expand All @@ -30,26 +31,28 @@ We have two ways to handle Tags creation and validation for System-defined Taxon

**Hardcoded by fixtures/migrations**

#. If the tags don't change over the time, you can create all on a fixture (e.g Languages).
#. If the tags don't change over the time, you can create all on a fixture (e.g Languages).
The ``pk`` of these instances must be negative.
#. If the tags change over the time, you can create all on a migration. If you edit, delete, or add new tags, you should also do it in a migration.

**Free-form tags**
**Dynamic tags**

This taxonomy depends on a core data model, but simplifies the creation of Tags by allowing free-form tags,
but we can validate the tags using the ``validate_object_tag`` method. For example we can put the ``AuthorSystemTaxonomy`` associated with
the ``User`` model and use the ``ID`` field as tags. Also we can validate if an ``User`` still exists or has been deleted over time.
Closed Taxonomies that depends on a core data model. Ex. AuthorTaxonomy with Users as Tags

#. Tags are created on the fly when new ObjectTags are added.
#. Tag.external_id we store an identifier from the instance (eg. User.pk).
#. Tag.value we store a human readable representation of the instance (eg. User.username).
#. Resync the tags to re-fetch the value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I definitely like the approach described here ^^

But what I'm still caught up on is ModelObjectTag. Do we need to subclass ObjectTag and push this logic into it? From reading the approach described here, and from what we discussed, I assumed we'd happily just use the normal ObjectTag or even ContentObjectTag. Because we're creating Tag instances as needed, and the regular ObjectTag already works correctly with Tag instances in Taxonomies.

Of course you need some logic to:

  • list available tags even before they've been created as Tag instances: e.g. list all known users or languages → this belongs in the Taxonomy subclass but I don't think it's even necessary for this project, because we never need to display the "available" tags of system taxonomies to users. The tags will get auto-created by the system and the system already knows what user/language is in use. There's no UI and no dropdown menu to show.
  • auto-create Tag instances when tagging an object, if they don't yet exist. Sure, but this logic could live anywhere - in the edx-platform code that's applying the User and Language tags, in the tag_object() API, in the Taxonomy.tag_object() API - I don't think it is necessary to put this into an ObjectTag subclass, and doing so creates a lot of complexity.
    • Think MVP, and what's the simplest solution? Just auto-create the tag in the same place you apply the tag to some content. No special behavior needed in this repo at all. In fact I almost wonder if the User Taxonomy is just a regular closed taxonomy that's not editable, and the platform code can create Tags in it just-in-time... What other functionality do we need here?
  • auto-delete Tag instances when the upstream object is deleted: we don't really need to worry about this for now, as languages won't be deleted and user deletion is very rare. In fact, I don't think user deletion even really happens; it's just retirement/anonymization that happens per GDPR. So in the future we could add a signal listener that detects "user retirement" and then deletes any Tag and ObjectTags associated with the user, but I would leave that out of scope for now. We don't even really know how/where such user tags will be used yet.

Copy link
Contributor

@pomegranited pomegranited Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with everything you've said here, except for:

Just auto-create the tag in the same place you apply the tag to some content. No special behavior needed in this repo at all.

I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.

And I think it belongs in this openedx-learning library, because it's not content-specific and could be useful when tagging other types of objects, like people or forum posts. However, if it makes more sense for this to live in edx-platform for the MVP, I'm OK with that, we can always move it into openedx-learning later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list available tags even before they've been created as Tag instances: e.g. list all known users or languages → this belongs in the Taxonomy subclass but I don't think it's even necessary for this project, because we never need to display the "available" tags of system taxonomies to users. The tags will get auto-created by the system and the system already knows what user/language is in use. There's no UI and no dropdown menu to show.

That's true except for language, it's the only system-defined taxonomy that can be "edited" in a content by the content author.

I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.

I agree with Jill too, in addition to the auto creation, it checks that the instance (User, Organization) exists to see if the tag is valid.

auto-delete Tag instances when the upstream object is deleted: we don't really need to worry about this for now, as languages won't be deleted and user deletion is very rare. In fact, I don't think user deletion even really happens; it's just retirement/anonymization that happens per GDPR. So in the future we could add a signal listener that detects "user retirement" and then deletes any Tag and ObjectTags associated with the user, but I would leave that out of scope for now. We don't even really know how/where such user tags will be used yet.

I agree, the simplest thing is to check if the user or organization exists.

To clarify: the ModelTaxonomy is not used for languages, it is only used for the taxonomy of users and organizations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.

👍🏻 Sure, good point. That's totally reasonable, and that sounds like a mixin to me. After all, in the platform if we keep the "ContentTaxonomy" vs "regular Taxonomy" distinction, it would have to be a mixin, to allow Content tagging with auto-creation of tags.

That's true except for language, it's the only system-defined taxonomy that can be "edited" in a content by the content author.

Oh OK. I've asked for clarification on that because I heard the opposite. But in any case, it seems we agree that "list available tags" should live in the Taxonomy. (I think it should be similar to what you implemented in LanguageTaxonomy.get_tags() -> List[Tag] but we'll probably eventually need an API that returns tag strings/IDs, not Tag objects because they may not exist yet. Imagine doing an auto-complete to tag users and there are 1,000,000 users in the system - we don't want to create Tag objects for each user unless they're actually used as a tag. Or maybe that would be a separate search_available_tags API. In any case, future PR.)

I agree with Jill too, in addition to the auto creation, it checks that the instance (User, Organization) exists to see if the tag is valid.

That's fine. I do think however that belongs in the Taxonomy, not the ObjectTag itself. Because it's mostly a question of validating the User/Org then creating the corresponding Tag instance. The ObjectTag can validate itself with the relation to the Tag and doesn't need any special logic to handle ongoing validation related to the User/Org. If necessary, the Taxonomy can have some code to keep the Tag instances in sync with the User/Org models, but I think we can leave that out of scope for now to keep it simple.

Copy link
Contributor

@pomegranited pomegranited Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think providing a ModelTaxonomy (or mixin) that handles Tag auto-creation is broadly useful.
That's totally reasonable, and that sounds like a mixin to me. After all, in the platform if we keep the "ContentTaxonomy" vs "regular Taxonomy" distinction, it would have to be a mixin, to allow Content tagging with auto-creation of tags.

👍

Or maybe that would be a separate search_available_tags API. In any case, future PR.)

Yep, agreed -- and note we don't need to search model tags in MVP (User and Organization system taxonomies are our only model system taxonomies, and neither are editable by the content authors).

I do think however that belongs in the Taxonomy, not the ObjectTag itself...but I think we can leave that out of scope for now to keep it simple.

Cool, thanks for your clarifications here @bradenmacdonald . I'll note this on my "cleanup" task: openedx/modular-learning#85



Rejected Options
-----------------

Tags created by Auto-generated from the codebase
Free-form tags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Taxonomies that depend on a core data model could create a Tag for each eligible object.
Maintaining this dynamic list of available Tags is cumbersome: we'd need triggers for creation, editing, and deletion.
And if it's a large list of objects (e.g. Users), then copying that list into the Tag table is overkill.
It is better to dynamically generate the list of available Tags, and/or dynamically validate a submitted object tag than
to store the options in the database.
Open Taxonomy that depends on a core data model, but simplifies the creation of Tags by allowing free-form tags,

Rejected because it has been seen that using dynamic tags provides more functionality and more advantages.

.. _document number 0013: https://github.com/openedx/openedx-learning/blob/main/docs/decisions/0013-system-taxonomy-auto-tagging.rst
3 changes: 2 additions & 1 deletion openedx_tagging/core/tagging/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,9 @@ def get_object_tags(
Pass valid_only=False when displaying tags to content authors, so they can see invalid tags too.
Invalid tags will (probably) be hidden from learners.
"""
ObjectTagClass = taxonomy.object_tag_class if taxonomy else ObjectTag
tags = (
ObjectTag.objects.filter(
ObjectTagClass.objects.filter(
object_id=object_id,
)
.select_related("tag", "taxonomy")
Expand Down
Loading
Loading