Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import/export Taxonomy API functions #58

Closed

Conversation

ChrisChV
Copy link
Contributor

@ChrisChV ChrisChV commented Jun 19, 2023

Description

Implements import_tags and export_tags functions

Supporting information

Merge after #57.

Closes openedx/modular-learning#64

Testing instructions

  • Ensure that the tests cover the expected behaviour

@openedx-webhooks
Copy link

Thanks for the pull request, @ChrisChV! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Jun 19, 2023
Copy link
Contributor

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good @ChrisChV ! I added one comment about test coverage -- do you have time to address that as part of this task? Would be great if we could keep the coverage for this module at 100%.

But what's here LGTM: 👍

  • I tested this by running the tests
  • I read through the code and tests with care
  • I checked for accessibility issues N/A
  • Includes documentation N/A

@@ -205,3 +211,124 @@ def test_tag_object_invalid_tag(self):
"course",
)
assert "Invalid object tag for taxonomy" in str(exc.exception)

def test_import_tags_csv(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some lines missing from the test coverage for import and export -- could you add tests for them too?

openedx_tagging/core/tagging/api.py                                                 145     11     64      9    90%   185, 191, 206, 212->224, 215, 221-222, 224->exit, 287-289, 304, 310, 340->exit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks!

@ChrisChV ChrisChV force-pushed the chris/taxonomy-import-export branch 2 times, most recently from 442cb64 to c2e5099 Compare June 20, 2023 17:04
@mphilbrick211 mphilbrick211 added the needs test run Author's first PR to this repository, awaiting test authorization from Axim label Jun 20, 2023
@bradenmacdonald
Copy link
Contributor

@ChrisChV Until @pomegranited 's PR is merged, can you please change the "merge target" of this PR from main to her branch? Then it will only show the changes from your new commit. After her PR merges you can change it back to main.

if taxonomy.allow_free_text:
raise ValueError(
_(
f"Invalid taxonomy ({taxonomy.id}): You can't import free-from tags taxonomies"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is a bit confusing. I'd suggest: "You cannot import into a free-form taxonomy."

)
)
tags_data = list(csv_reader)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using if format not in TaxonomyDataFormat.__members__.values(): above and assuming here that there are only two valid options, it's better to use

elif format == TaxonomyDataFormat.JSON:
    ...
else:
    raise ValueError(_(f"Invalid format: {format}"))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradenmacdonald I think this is valid on the import. In the export I have left the verification at the beginning before loading the tags

if tag_id not in new_tags and tag_id not in updated_tags:
try:
# Update tag
tag_instance = Tag.objects.get(external_id=tag_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to use taxonomy=taxonomy here, or you can have cross-taxonomy bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow.. I should have seen that!

# if there is no parent in the data import
tag_instance.parent = None
updated_tags.append(tag_id)
except ObjectDoesNotExist:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use Tag.DoesNotExist instead of the generic base class.

return tag_instance
else:
# Returns the created/updated tag from history
return Tag.objects.get(external_id=tag_id)
Copy link
Contributor

@bradenmacdonald bradenmacdonald Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, needs taxonomy=taxonomy. The same external_id can appear in multiple taxonomies.

BTW if you add something like this to the Taxonomy class:

@property 
def tags(self):
    return Tag.objects.filter(taxonomy=self)

then this code can become return taxonomy.tags.get(external_id=tag_id) 😎

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already taxonomy.tag_set, so I don't think we need another property?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, duh... not sure why I forgot that. But yes, use that :)

@ChrisChV ChrisChV force-pushed the chris/taxonomy-import-export branch from a10c6f6 to c0e5a48 Compare June 21, 2023 16:57
@ChrisChV
Copy link
Contributor Author

ChrisChV commented Jun 21, 2023

@ChrisChV Until @pomegranited 's PR is merged, can you please change the "merge target" of this PR from main to her branch? Then it will only show the changes from your new commit. After her PR merges you can change it back to main.

@bradenmacdonald I have not found the way to change openedx/openedx-learning to open-craft/openedx-learning. I created this open-craft#1 on open-craft/openedx-learning. Do yo know a way to change the "merge target" between repos?
Personally, in these cases I do not like to have two PRs since you can lose the thread of the reviews

@bradenmacdonald
Copy link
Contributor

@ChrisChV Oh, I see. In that case, what you'd have to do is push Jill's branch to the openedx fork if you can; if not, just leave it how you have it and link to the commit. That's fine.

@ChrisChV ChrisChV force-pushed the chris/taxonomy-import-export branch from c0e5a48 to ebda011 Compare June 21, 2023 18:29
@ChrisChV
Copy link
Contributor Author

@ormsbee This is ready for your review

@ChrisChV ChrisChV force-pushed the chris/taxonomy-import-export branch from ebda011 to 455e783 Compare June 22, 2023 20:55
@e0d e0d removed the needs test run Author's first PR to this repository, awaiting test authorization from Axim label Jun 26, 2023
@ChrisChV ChrisChV force-pushed the chris/taxonomy-import-export branch from a043b41 to e5b1698 Compare June 27, 2023 21:50
@ChrisChV
Copy link
Contributor Author

@ormsbee I merged the changes of #57. Now the review is easier and it's ready

Copy link
Contributor

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisChV I found a couple of nits, but otherwise good 👍

  • I tested this by running the tests locally and ensuring 100% coverage.
  • I read through the code and tests, and ensured that the tests cover our use cases.
  • I checked for accessibility issues N/A
  • Includes documentation
  • Commit structure follows OEP-0051

from django.utils.translation import gettext_lazy as _

from .models import ObjectTag, Tag, Taxonomy

csv_fields = ['id', 'name', 'parent_id', 'parent_name']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would like to make it clear that this is a constant, and it's not part of the externally-exportable python api:

Suggested change
csv_fields = ['id', 'name', 'parent_id', 'parent_name']
_CSV_FIELDS = ['id', 'name', 'parent_id', 'parent_name']

tagging_api.resync_object_tags([object_tag])
object_tag = ObjectTag.objects.get(object_id=object_id)
self.assertEqual(object_tag.tag.value, 'Bacteria')
self.assertEqual(object_tag._value, 'Bacteria')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pylint doesn't like this, but I agree it's necessary for the test. So could you please add this here and in the two other mentions below?

Suggested change
self.assertEqual(object_tag._value, 'Bacteria')
self.assertEqual(object_tag._value, 'Bacteria') # pylint: disable=protected-access

if replace:
taxonomy.tag_set.exclude(external_id__in=updated_tags).delete()

resync_object_tags(ObjectTag.objects.filter(taxonomy=taxonomy))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a large operation.. so I don't think it should sit under the same atomic operation. Can you bump it out a level?

Suggested change
resync_object_tags(ObjectTag.objects.filter(taxonomy=taxonomy))
resync_object_tags(ObjectTag.objects.filter(taxonomy=taxonomy))

json_result = {
'name': taxonomy.name,
'description': taxonomy.description,
'tags': result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
'tags': result
'tags': result,

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started writing per-line reviews, but then I thought it best to step back a bit.

Import is going to be one of those places where the complexity is going to grow a lot as time goes on. We already know that we're going to have to support a variety of modifications in the future–consolidating tags, renaming tags, etc.

We're clearly not going to implement that functionality yet. But the fact that this is going to grow a lot means that we should be really disciplined and explicit about the stages of data transformation that we have here. We also have to keep the code as painfully simple and easy to test as possible.

What I expect to see as a top level import function would be something fairly short that that calls out to other classes/functions to implement something like the following steps, where the output of one step is the input into the next:

  1. Parse the raw input into a DSL of actions (e.g. CreateOrUpdate, Delete, Rename, etc.).
  2. Validate that the parsed set of actions is internally consistent (e.g. we're not adding a Tag as a parent of itself).
  3. Validate that the actions make sense against the Taxonomy data being operated on (requires reading data form our Taxonomy).
  4. Create a plan for the operations we'll need to do (because preview functionality is really nice to have when there's no revert button).
  5. Execute that plan.

Each of those steps should be separately testable. We should be able to have a CSVParser and a JSONParser and have the inputs and outputs of their tests only be the raw data and the actions they generate.

We should be able to have a set of actions, and test the function that validates them.

When we chain things together, we should be able to set up a database state and a list of DSL statements and test the plan that we generate.

Right now, all of these concerns are mixed together in the import_tags. This will make it harder to debug as time goes on, and encourage the use of side-effects to effectively "message" later parts of the pipeline with dicts built earlier on.

These may not seem like immediate problems, but if we make it so that the easiest way to add new features is to add a couple dozen lines to import_tags and make a new 50 line inner function there, things are going to get very confusing very quickly.


updated_tags = []

def create_update_tag(tag):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to avoid inner functions this large and complex. The create_update_tag function has interesting logic but it's harder to test separately because it's nested in here.

@ChrisChV
Copy link
Contributor Author

Closed in favor of #64

@ChrisChV ChrisChV closed this Jul 14, 2023
@openedx-webhooks
Copy link

@ChrisChV Even though your pull request wasn’t merged, please take a moment to answer a two question survey so we can improve your experience in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[Tagging] Import/export tags API methods
7 participants