Skip to content

Conversation

@kdmccormick
Copy link
Member

@kdmccormick kdmccormick commented Dec 2, 2025

Description & Supporting Info

Reviewers: I've left GH comments throughout all changed files that will hopefully make this easier to digest.

This PR simultaneously addresses several issues in the modulestore_migrator, which may not be perceptible to the average Studio user on the happy-path, but will present themselves to:

  • REST API integrators.
  • Studio users who migrate a library multiple times and care about which migration their course references respect.
  • Studio users who hit bugs and end up with partial migrations.
  • Studio users who want to "roll back" a migration using the Django admin.
  • Studio users who care about URL slugs.
  • Future developers who want to use modulestore_migrator in order to migrate courses out of Mongo (us, probably).

What exactly it fixes

  • For legacy library_content references in courses, this PR:

    • Removes the spurious sync after updating a reference to a migrated library, so that users don't need to "update" their content after updating their reference, unless there were real content edits that happened since they last synced. We do this by correctly associating a DraftChangeLogRecord with the ModulestoreBlockSource migration artifact, and then comparing that version information before offering a sync. (related issue: Migrated legacy library content should be published upon migration frontend-app-authoring#2626).
    • Prompts users to update a reference to a migrated library with higher priority than prompting them to sync legacy content updates for that reference, so that users don't end up needing to accept legacy content updates in order to get a to a point where they can update to V2 content.
    • Ensures the library references in courses always follow the correct migration, as defined by the data forwarded fields in the data model, which are populated based on the REST API spec and the stated product UI requirements.
  • For the migration itself, this PR:

    • Allows non-admins to migrate libraries, fixing: Only global staff can migrate libraries #37774
    • When triggered via the UI, ensures the migration uses nice title-based target slugs instead of ugly source-hash-based slugs. We've had this as an option for a long time, but preserve_url_slugs defaulted to True instead of False in the REST API serializer, so we weren't taking advantage of it.
    • Unifies logic between single-source and bulk migration. These were implement as two separate code paths, with drift in their implementations. In particular, the collection update-vs-create-new logic was completely different for single-souce vs. bulk.
    • When using the Skip or Update strategies for repeats, it consistently follows mappings established by the latest successful migration rather than following mappings across arbitrary previous migrations.
    • We log unexpected exceptions more often, although there is so much more room for improvement here.
    • Adds more validation to the REST API so that client mistakes more often become 400s with validation messages rather than 500s.
  • For developers, this PR:

    • Adds unit tests to the REST API
    • Ensures that all migration business logic now goes through a general-purpose Python API.
    • Ensures that the data model (specifically forwarded, and change_log_record) is now populated and respected.
    • Adds more type annotations.

Testing instructions

I tested the migration UI with a variety of libraries. I also did some light testing of the Django admin workflow.

The REST API could use more manual testing, but I'm prioritizing merging this first so that we can get a backport PR up and test against Ulmo.

Other information

AI Coding notes

I used Claude Code to generate cms/djangoapps/modulestore_migrator/tests/test_rest_api.py. Here was the original prompt:

My goal is to write unit tests for the REST API defined at cms/djangoapps/modulestore_migrator/rest_api, focusing on validation, return codes, and serialization/deserialization, NOT focusing on business logic. Please start by writing me one such unit test so we can align on approach. Mock liberally. Please ask first before reading parts of the codebase outside of cms/djangoapps/modulestore_migrator or cms/djangoapps/contentstore.

I reviewed every test manually, and went through several rounds of revision to address instances where it mocked too heavily, misunderstood the data model, lacked context (e.g. django-user-tasks), or made stylistic choices that I didn't like (e.g. self.assertIn(x, y) vs assert x in y). Here's two examples of how I had it revise the code:

  • great. now, for the unit tests which return data successfully, please ensure that you are returning at least one object. I noticed that you returned a mock UserTaskStatus in a couple tests; this is good. Even better is to use dataclass objects which are defined in data.py modules within edx-platform--you can find dataclass objects relevant to these unit tests in cms/djangoapps/modulestore_migrator/data.py. please feel free to ask follow-up questions if that does not make sense.
  • There were some good parts of how you implemented test_list_migrations_success, but there were also mistakes I fixed. Notably, there was no need to create two ModulestoreSources, and there is no such thing as a ModulestoreTarget. Please take a look and try to understand. Then, rewrite test_retrieve_migration_success to follow a similar pattern, or ask me question if you are unsure how to do so.

Claude also found a bug in the REST API, which I fixed:

  • Me: this is good, but I have some change requests before we move on to the other endpoints. the method "test_create_migration_non_admin_user" is incorrectly named, becuse non-admin users are allowed to create migrations, it is just that you've chosen to test with an admin user. can you make the method name and docstring more accurate? do not make any other changes right now.
  • Claude: Ah, I understand now! Looking back at the views, I see that MigrationViewSet has permission_classes = (IsAdminUser,), but you're saying that in practice any user with write access to the target library should be able to create migrations. So the test I wrote is actually testing the current behavior (that only admins can create migrations), not documenting an incorrect restriction.
  • Me: Ah, I was mistaken. You are right, the test you wrote was testing the current behavior. I have edited the REST API to have the permission structure I intended, which is: (a) users need write access to both the source and target in order to perform migrations, (b) users need read access access to the source or target in order to check the status of migrations in the source or target, respectively. Please revise TestMigrationViewSetCreate based on these new requirements. If you find that the REST API source conflicts with what I've described, let me know.

Deadline

ASAP, needs Ulmo backport

@kdmccormick kdmccormick changed the title fix: Migrated Libraries Should Not be Editable fix: Disable edits post-migration; remove spurious sync; improve Python API Dec 2, 2025
@kdmccormick kdmccormick changed the title fix: Disable edits post-migration; remove spurious sync; improve Python API Disable edits post-migration & Remove spurious sync & Improve Python API Dec 2, 2025
@kdmccormick kdmccormick force-pushed the kdmccormick/migration-api branch from 885d3b0 to 5aa4695 Compare December 2, 2025 21:05
@kdmccormick kdmccormick changed the title Disable edits post-migration & Remove spurious sync & Improve Python API fix(modulestore_migrator): No post-migration edits; No spurious sync; Better Python API Dec 3, 2025
@kdmccormick kdmccormick force-pushed the kdmccormick/migration-api branch 5 times, most recently from 8a0656c to 20f0c30 Compare December 8, 2025 18:29
@kdmccormick kdmccormick added the create-sandbox open-craft-grove should create a sandbox environment from this PR label Dec 10, 2025
@kdmccormick kdmccormick force-pushed the kdmccormick/migration-api branch from fa5a416 to 765e18a Compare December 10, 2025 17:27
@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@kdmccormick kdmccormick force-pushed the kdmccormick/migration-api branch from 765e18a to 00cb47e Compare December 12, 2025 16:13
@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@kdmccormick kdmccormick force-pushed the kdmccormick/migration-api branch from 75dda80 to d8f04a1 Compare December 14, 2025 15:05
@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@open-craft-grove
Copy link

Sandbox deployment successful 🚀
🎓 LMS
📝 Studio
ℹ️ Grove Config, Tutor Config, Tutor Requirements

Copy link
Member Author

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 Context and rationale, for reviewers ⬇️

library = lib_api.ContentLibrary.objects.get(slug=self.lib_key_v2.slug)
learning_package = library.learning_package
# Create a migration source for the legacy library
self.source = ModulestoreSourceFactory(key=self.lib_key_1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary, as the migrator_api creates missing Source objects automatically

Comment on lines 292 to 302
forward_source_to_target=True,
)
migrator_api.start_migration_to_library(
user=self.user,
source_key=self.lib_key_2,
target_library_key=self.lib_key_v2,
target_collection_slug=collection_key,
composition_level=CompositionLevel.Component.value,
repeat_handling_strategy=RepeatHandlingStrategy.Skip.value,
preserve_url_slugs=True,
forward_source_to_target=False,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now testing both forward_source_to_target=True and forward_source_to_target=False

def get_library_context(request, request_is_json=False):
"""
Utils is used to get context of course home library tab.
It is used for both DRF and django views.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this comment, as the django-based legacy libraries listing is gone.

user_can_create_library,
)

is_migrated: bool | None # None means: do not filter on is_migrated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functional change to this view handler for any of the 2xx happy-path cases.

The changes here just make the handler more careful with types and parsing, so that some client errors which would have resulted in 500s are now caught here and returned as 400s with validation messages.

assert '<li>html 3</li>' in rendered.content
assert '<li>html 4</li>' in rendered.content

def test_xml_export_import_cycle(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another test_xml_export_import_cycle test method in this module, which sufficiently tests the OLX round trip.

The extra assertions that this test has, in particular child.xml_attributes.get('upstream') is not None, do not actually test anything worthwhile. The value of upstream on these children is "", which is not None, but also, it may as well be None.

Seeing that this test method is part of TestLibraryContentRender, I believe that this test method was added back when the rendering of the legacy library content block would trigger the migration. This is no longer the case, but the test never failed when we switched to user-triggered migration(oops). Anyway, it's obsolete, so we remove it.

Comment on lines -201 to +206
def get_tools(self, to_read_library_content: bool = False) -> 'LegacyLibraryToolsService':
def get_tools(self, to_read_library_content: bool = False) -> LegacyLibraryToolsService:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unrelated, but it was always showing up in Pylance and driving me crazy

Comment on lines 130 to 135
and is_successfully_migrated(self.source_library_key, source_version=self.source_library_version)
and forward_legacy_library(self.source_library_key)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before:

  • the LCB reference is ready to be updated if there's a successful migration of the library, started at it its current version.

Now:

  • the LCB reference is ready to be updated if there's a successful forwarding-eanbled (i.e. authoritative) migration of the library, regardless of the source library's current version.

# appears when it is published
child.upstream_version = 0
children = self.get_children()
child_migrations = migrator_api.forward_blocks([child.usage_key for child in children])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now forward using mappings from the forwarded migration rather than any old mappings.

@kdmccormick kdmccormick changed the title fix(modulestore_migrator): No post-migration edits; No spurious sync; Better Python API fix: Various Edge-Case Bugs and Data Issues in modulestore_migrator Dec 15, 2025
@kdmccormick kdmccormick changed the title fix: Various Edge-Case Bugs and Data Issues in modulestore_migrator fix: Various Edge-Case Bugs & Data Issues in modulestore_migrator Dec 15, 2025
@kdmccormick kdmccormick requested a review from ormsbee December 15, 2025 18:52
@kdmccormick kdmccormick force-pushed the kdmccormick/migration-api branch from 3193a6f to 64da92e Compare December 18, 2025 23:15
self.source_library_id
and self.source_library_version
and is_successfully_migrated(self.source_library_key, source_version=self.source_library_version)
and is_forwarded(self.source_library_key)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradenmacdonald sorry, I lost track of your comment about keeping an is_successfully_migrated API function in order to make the API more obvious. i think that's a good idea.

I'm trying to keep "forwarded" and "migrated" distinct (the former being the authoritative thing), so I've gone with is_forwarded.

I'm planning to merge later tonight. Let me know if you'd like to see anything different (or I can follow up with a fix on master).

@kdmccormick kdmccormick enabled auto-merge (squash) December 18, 2025 23:28
@kdmccormick kdmccormick disabled auto-merge December 18, 2025 23:28
@kdmccormick kdmccormick enabled auto-merge (squash) December 18, 2025 23:28
@kdmccormick kdmccormick merged commit 91e521e into master Dec 18, 2025
68 checks passed
@kdmccormick kdmccormick deleted the kdmccormick/migration-api branch December 18, 2025 23:49
@kdmccormick
Copy link
Member Author

@ormsbee Looks like the ulmo cherry-pick has some conflicts. I'll resolve, backport, and test tomorrow AM

kdmccormick added a commit to kdmccormick/edx-platform that referenced this pull request Dec 19, 2025
For legacy library_content references in courses, this PR:
- **Removes the spurious sync after updating a reference to a migrated
  library**, so that users don't need to "update" their content _after_
  updating their reference, _unless_ there were real content edits that
  happened since they last synced. We do this by correctly associating a
  DraftChangeLogRecord with the ModulestoreBlockSource migration artifact,
  and then comparing that version information before offering a sync.
  (related issue:
  openedx/frontend-app-authoring#2626).
- **Prompts users to update a reference to a migrated library with higher
  priority than prompting them to sync legacy content updates for that
  reference**, so that users don't end up needing to accept legacy content
  updates in order to get a to a point where they can update to V2 content.
- **Ensures the library references in courses always follow the correct
  migration,** as defined by the data `forwarded` fields in the data model,
  which are populated based on the REST API spec and the stated product UI
  requirements.

For the migration itself, this PR:

- **Allows non-admins to migrate libraries**, fixing:
  openedx#37774
- **When triggered via the UI, ensures the migration uses nice title-based
  target slugs instead of ugly source-hash-based slugs.** We've had this as an
  option for a long time, but preserve_url_slugs defaulted to True instead of
  False in the REST API serializer, so we weren't taking advantage of it.
- **Unifies logic between single-source and bulk migration**. These were
  implement as two separate code paths, with drift in their implementations. In
  particular, the collection update-vs-create-new logic was completely
  different for single-souce vs. bulk.
- **When using the Skip or Update strategies for repeats, it consistently
  follows mappings established by the latest successful migration** rather than
  following mappings across arbitrary previous migrations.
- **We log unexpected exceptions more often**, although there is so much more
  room for improvement here.
- **Adds more validation to the REST API** so that client mistakes more often
  become 400s with validation messages rather than 500s.

For developers, this PR:
- Adds unit tests to the REST API
- Ensures that all migration business logic now goes through a general-purpose
  Python API.
- Ensures that the data model (specifically `forwarded`, and
  `change_log_record`) is now populated and respected.
- Adds more type annotations.
kdmccormick added a commit to kdmccormick/edx-platform that referenced this pull request Dec 19, 2025
For legacy library_content references in courses, this PR:
- **Removes the spurious sync after updating a reference to a migrated
  library**, so that users don't need to "update" their content _after_
  updating their reference, _unless_ there were real content edits that
  happened since they last synced. We do this by correctly associating a
  DraftChangeLogRecord with the ModulestoreBlockSource migration artifact,
  and then comparing that version information before offering a sync.
  (related issue:
  openedx/frontend-app-authoring#2626).
- **Prompts users to update a reference to a migrated library with higher
  priority than prompting them to sync legacy content updates for that
  reference**, so that users don't end up needing to accept legacy content
  updates in order to get a to a point where they can update to V2 content.
- **Ensures the library references in courses always follow the correct
  migration,** as defined by the data `forwarded` fields in the data model,
  which are populated based on the REST API spec and the stated product UI
  requirements.

For the migration itself, this PR:

- **Allows non-admins to migrate libraries**, fixing:
  openedx#37774
- **When triggered via the UI, ensures the migration uses nice title-based
  target slugs instead of ugly source-hash-based slugs.** We've had this as an
  option for a long time, but preserve_url_slugs defaulted to True instead of
  False in the REST API serializer, so we weren't taking advantage of it.
- **Unifies logic between single-source and bulk migration**. These were
  implement as two separate code paths, with drift in their implementations. In
  particular, the collection update-vs-create-new logic was completely
  different for single-souce vs. bulk.
- **When using the Skip or Update strategies for repeats, it consistently
  follows mappings established by the latest successful migration** rather than
  following mappings across arbitrary previous migrations.
- **We log unexpected exceptions more often**, although there is so much more
  room for improvement here.
- **Adds more validation to the REST API** so that client mistakes more often
  become 400s with validation messages rather than 500s.

For developers, this PR:
- Adds unit tests to the REST API
- Ensures that all migration business logic now goes through a general-purpose
  Python API.
- Ensures that the data model (specifically `forwarded`, and
  `change_log_record`) is now populated and respected.
- Adds more type annotations.
kdmccormick added a commit to kdmccormick/edx-platform that referenced this pull request Dec 19, 2025
For legacy library_content references in courses, this PR:
- **Removes the spurious sync after updating a reference to a migrated
  library**, so that users don't need to "update" their content _after_
  updating their reference, _unless_ there were real content edits that
  happened since they last synced. We do this by correctly associating a
  DraftChangeLogRecord with the ModulestoreBlockSource migration artifact,
  and then comparing that version information before offering a sync.
  (related issue:
  openedx/frontend-app-authoring#2626).
- **Prompts users to update a reference to a migrated library with higher
  priority than prompting them to sync legacy content updates for that
  reference**, so that users don't end up needing to accept legacy content
  updates in order to get a to a point where they can update to V2 content.
- **Ensures the library references in courses always follow the correct
  migration,** as defined by the data `forwarded` fields in the data model,
  which are populated based on the REST API spec and the stated product UI
  requirements.

For the migration itself, this PR:

- **Allows non-admins to migrate libraries**, fixing:
  openedx#37774
- **When triggered via the UI, ensures the migration uses nice title-based
  target slugs instead of ugly source-hash-based slugs.** We've had this as an
  option for a long time, but preserve_url_slugs defaulted to True instead of
  False in the REST API serializer, so we weren't taking advantage of it.
- **Unifies logic between single-source and bulk migration**. These were
  implement as two separate code paths, with drift in their implementations. In
  particular, the collection update-vs-create-new logic was completely
  different for single-souce vs. bulk.
- **When using the Skip or Update strategies for repeats, it consistently
  follows mappings established by the latest successful migration** rather than
  following mappings across arbitrary previous migrations.
- **We log unexpected exceptions more often**, although there is so much more
  room for improvement here.
- **Adds more validation to the REST API** so that client mistakes more often
  become 400s with validation messages rather than 500s.

For developers, this PR:
- Adds unit tests to the REST API
- Ensures that all migration business logic now goes through a general-purpose
  Python API.
- Ensures that the data model (specifically `forwarded`, and
  `change_log_record`) is now populated and respected.
- Adds more type annotations.

Backports: 91e521e
mraman-2U pushed a commit to mraman-2U/edx-platform that referenced this pull request Dec 24, 2025
For legacy library_content references in courses, this PR:
- **Removes the spurious sync after updating a reference to a migrated
  library**, so that users don't need to "update" their content _after_
  updating their reference, _unless_ there were real content edits that
  happened since they last synced. We do this by correctly associating a
  DraftChangeLogRecord with the ModulestoreBlockSource migration artifact,
  and then comparing that version information before offering a sync.
  (related issue:
  openedx/frontend-app-authoring#2626).
- **Prompts users to update a reference to a migrated library with higher
  priority than prompting them to sync legacy content updates for that
  reference**, so that users don't end up needing to accept legacy content
  updates in order to get a to a point where they can update to V2 content.
- **Ensures the library references in courses always follow the correct
  migration,** as defined by the data `forwarded` fields in the data model,
  which are populated based on the REST API spec and the stated product UI
  requirements.

* For the migration itself, this PR:

- **Allows non-admins to migrate libraries**, fixing:
  openedx#37774
- **When triggered via the UI, ensures the migration uses nice title-based
  target slugs instead of ugly source-hash-based slugs.** We've had this as an
  option for a long time, but preserve_url_slugs defaulted to True instead of
  False in the REST API serializer, so we weren't taking advantage of it.
- **Unifies logic between single-source and bulk migration**. These were
  implement as two separate code paths, with drift in their implementations. In
  particular, the collection update-vs-create-new logic was completely
  different for single-souce vs. bulk.
- **When using the Skip or Update strategies for repeats, it consistently
  follows mappings established by the latest successful migration** rather than
  following mappings across arbitrary previous migrations.
- **We log unexpected exceptions more often**, although there is so much more
  room for improvement here.
- **Adds more validation to the REST API** so that client mistakes more often
  become 400s with validation messages rather than 500s.

For developers, this PR:
- Adds unit tests to the REST API 
- Ensures that all migration business logic now goes through a general-purpose
  Python API.
- Ensures that the data model (specifically `forwarded`, and
  `change_log_record`) is now populated and respected.
- Adds more type annotations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

create-sandbox open-craft-grove should create a sandbox environment from this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants