Skip to content

fix: make studio reindex more robust, provide better logging [FC-0117]#38498

Merged
bradenmacdonald merged 2 commits intoopenedx:masterfrom
open-craft:braden/fix-reindex-bugs
May 5, 2026
Merged

fix: make studio reindex more robust, provide better logging [FC-0117]#38498
bradenmacdonald merged 2 commits intoopenedx:masterfrom
open-craft:braden/fix-reindex-bugs

Conversation

@bradenmacdonald
Copy link
Copy Markdown
Contributor

@bradenmacdonald bradenmacdonald commented Apr 30, 2026

Description

Part of #36868 , also fixes a bug reported on the forum

This PR:

  1. Improves the reindex_studio management command so that it will gracefully log and continue when it tries to index a deleted course, rather than crashing.
    • The list of course IDs is retrieved from CourseOverview, but if there is a CourseOverview entry that no longer exists in modulestore, the whole reindex process would crash.
  2. Addresses a warning seen when running reindex_studio:
    UnorderedObjectListWarning: Pagination may yield inconsistent results with an unordered object_list:
    <class 'openedx.core.djangoapps.content.course_overviews.models.CourseOverview'> QuerySet.
    
  3. Improves the logging when resuming an interrupted reindex. In such cases, it would say something like "Found 28 courses, 0 libraries." even when you have more courses or libraries than that, which was confusing. It now explicitly says "Resuming incremental index - NNN courses/libraries already indexed."
  4. Fixes a bug where a library could be marked as indexed before its containers were fully indexed.
  5. Improves formatting and verbosity of the logging.

Logging before: (note the numbers are quite confusing - are there 34 or 11 or 4 different things to index? Several levels of progress are all mixed together)

1/34. Now indexing blocks in library lib:OpenCraftX:NEWTEST
0/11. Now indexing collections in library lib:OpenCraftX:NEWTEST
11/11 collections indexed for library lib:OpenCraftX:NEWTEST
0/4. Now indexing containers in library lib:OpenCraftX:NEWTEST
4/4 containers indexed for library lib:OpenCraftX:NEWTEST
2/34. Now indexing blocks in library lib:BradenX:552
0/0. Now indexing collections in library lib:BradenX:552
0/0 collections indexed for library lib:BradenX:552
0/14. Now indexing containers in library lib:BradenX:552
14/14 containers indexed for library lib:BradenX:552
3/34. Now indexing blocks in library lib:OpenCraftX:RO
0/0. Now indexing collections in library lib:OpenCraftX:RO
0/0 collections indexed for library lib:OpenCraftX:RO
0/0. Now indexing containers in library lib:OpenCraftX:RO
0/0 containers indexed for library lib:OpenCraftX:RO
4/34. Now indexing blocks in library lib:OpenCraftX:BNTL

Logging after: (the overall progress is always shown at the beginning, like 1/34, and sub-tasks state their progress differently.)

1/34. Now indexing blocks in library lib:OpenCraftX:NEWTEST
Now indexing 11 collections in library lib:OpenCraftX:NEWTEST
Indexed 11/11 collections in library lib:OpenCraftX:NEWTEST
Now indexing 4 containers in library lib:OpenCraftX:NEWTEST
Indexed 4/4 containers in library lib:OpenCraftX:NEWTEST
2/34. Now indexing blocks in library lib:BradenX:552
Indexed 0/0 collections in library lib:BradenX:552
Now indexing 14 containers in library lib:BradenX:552
Indexed 14/14 containers in library lib:BradenX:552
3/34. Now indexing blocks in library lib:OpenCraftX:RO
Indexed 0/0 collections in library lib:OpenCraftX:RO
Indexed 0/0 containers in library lib:OpenCraftX:RO
4/34. Now indexing blocks in library lib:OpenCraftX:BNTL

Testing instructions

  1. First, don't check out this branch yet.
  2. Open tutor dev exec cms bash and ./manage.py cms shell and use the following commands to create an invalid CourseOverview entry:
    from opaque_keys.edx.locator import CourseLocator
    co = CourseOverview.objects.all()[10]
    co.id = CourseLocator('non', 'existent', 'course', None, None)
    co.save()  # Saves a copy without affecting the original.
    
  3. Run ./manage.py cms reindex_studio . It should fail with the error reported on the forum (AttributeError: 'NoneType' object has no attribute 'has_children')
  4. Run ./manage.py cms reindex_studio again. Note the confusing numbering, like no libraries to index even if you have some.
  5. Check out this PR and repeat steps 3-4. Confirm they're fixed.
  6. Delete that invalid CourseOverview entry that we created in step 2 so it doesn't cause any more trouble. 🐒

Deadline

No particular deadline.

@bradenmacdonald bradenmacdonald added the FC Relates to an Axim Funded Contribution project label Apr 30, 2026
@openedx-webhooks openedx-webhooks added open-source-contribution PR author is not from Axim or 2U core contributor PR author is a Core Contributor (who may or may not have write access to this repo). labels Apr 30, 2026
@openedx-webhooks
Copy link
Copy Markdown

Thanks for the pull request, @bradenmacdonald!

This repository is currently maintained by @openedx/wg-maintenance-openedx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

1 similar comment
@openedx-webhooks
Copy link
Copy Markdown

Thanks for the pull request, @bradenmacdonald!

This repository is currently maintained by @openedx/wg-maintenance-openedx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added open-source-contribution PR author is not from Axim or 2U core contributor PR author is a Core Contributor (who may or may not have write access to this repo). labels Apr 30, 2026
@github-project-automation github-project-automation Bot moved this to Needs Triage in Contributions Apr 30, 2026
@farhaanbukhsh farhaanbukhsh self-requested a review May 2, 2026 09:24
Copy link
Copy Markdown
Member

@farhaanbukhsh farhaanbukhsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I have a small nit, but overall the PR looks good to me.

✅ I tested this:

without the code:

2026-05-04 06:53:26,341 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] tasks.py:206 - Starting incremental Studio search index population...
2026-05-04 06:53:26,343 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:584 - Counting libraries...
2026-05-04 06:53:26,348 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:596 - Counting courses...
2026-05-04 06:53:26,349 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:605 - Found 2 courses, 0 libraries.
2026-05-04 06:53:26,349 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:616 - Indexing libraries...
2026-05-04 06:53:26,350 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:732 - Indexing courses...
2026-05-04 06:53:26,350 WARNING 198 [py.warnings] [user None] [ip None] warnings.py:112 - /openedx/edx-platform/openedx/core/djangoapps/content/search/api.py:735: UnorderedObjectListWarning: Pagination may yield inconsistent results with an unordered object_list: <class 'openedx.core.djangoapps.content.course_overviews.models.CourseOverview'> QuerySet.
  paginator = Paginator(CourseOverview.objects.only("id", "display_name"), 1000)

2026-05-04 06:53:26,351 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:738 - 1/2. Now indexing course Open edX Demo Course (course-v1:non+existent+course)
2026-05-04 06:53:26,370 ERROR 198 [celery_utils.logged_task] [user None] [ip None] logged_task.py:48 - [0e9f35f9-939a-4b7e-ba56-ca63d963582c] failed due to Traceback (most recent call last):
  File "/openedx/venv/lib/python3.12/site-packages/celery/app/trace.py", line 585, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/openedx/venv/lib/python3.12/site-packages/celery/app/autoretry.py", line 38, in run
    return task._orig_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/openedx/venv/lib/python3.12/site-packages/edx_django_utils/monitoring/internal/code_owner/utils.py", line 195, in new_function
    return wrapped_function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/openedx/edx-platform/openedx/core/djangoapps/content/search/tasks.py", line 209, in rebuild_index_incremental
    api.rebuild_index(status_cb=log.info, incremental=True)
  File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 744, in rebuild_index
    course_docs = index_course(course.id, index_name)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 564, in index_course
    _recurse_children(course, add_with_children)
  File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 300, in _recurse_children
    if block.has_children:
       ^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'has_children'

2026-05-04 06:53:26,370 ERROR 198 [celery.app.trace] [user None] [ip None] trace.py:309 - Task openedx.core.djangoapps.content.search.tasks.rebuild_index_incremental[0e9f35f9-939a-4b7e-ba56-ca63d963582c] raised unexpected: AttributeError("'NoneType' object has no attribute 'has_children'")
Traceback (most recent call last):
  File "/openedx/venv/lib/python3.12/site-packages/celery/app/trace.py", line 585, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/openedx/venv/lib/python3.12/site-packages/celery/app/autoretry.py", line 38, in run
    return task._orig_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/openedx/venv/lib/python3.12/site-packages/edx_django_utils/monitoring/internal/code_owner/utils.py", line 195, in new_function
    return wrapped_function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/openedx/edx-platform/openedx/core/djangoapps/content/search/tasks.py", line 209, in rebuild_index_incremental
    api.rebuild_index(status_cb=log.info, incremental=True)
  File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 744, in rebuild_index
    course_docs = index_course(course.id, index_name)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 564, in index_course
    _recurse_children(course, add_with_children)
  File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 300, in _recurse_children
    if block.has_children:
       ^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'has_children'
2026-05-04 06:53:26,371 INFO 198 [celery_utils.logged_task] [user None] [ip None] logged_task.py:25 - Task openedx.core.djangoapps.content.search.tasks.rebuild_index_incremental[0e9f35f9-939a-4b7e-ba56-ca63d963582c] submitted with arguments (), {}
Indexing complete!

using the PR

2026-05-04 07:49:26,385 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] tasks.py:206 - Starting incremental Studio search index population...
2026-05-04 07:49:26,388 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:595 - Counting libraries...
2026-05-04 07:49:26,394 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:609 - Counting courses...
2026-05-04 07:49:26,400 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:618 - Found 2 courses, 0 libraries.
2026-05-04 07:49:26,400 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:629 - Indexing libraries...
2026-05-04 07:49:26,400 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:747 - Indexing courses...
2026-05-04 07:49:26,402 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:753 - 1/2. Now indexing course Open edX Demo Course (course-v1:non+existent+course)
2026-05-04 07:49:26,413 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:564 - Error: course course-v1:non+existent+course does not seem to exist! It may have been incompletely deleted.
2026-05-04 07:49:26,418 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:753 - 2/2. Now indexing course Open edX Demo Course (course-v1:OpenedX+DemoX+DemoCourse)
2026-05-04 07:49:27,911 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:766 - Done! 412 blocks indexed across 2 courses, collections and libraries.
2026-05-04 07:49:27,912 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] tasks.py:219 - Incremental Studio search index population complete.
2026-05-04 07:49:27,930 INFO 270 [celery.app.trace] [user None] [ip None] trace.py:128 - Task openedx.core.djangoapps.content.search.tasks.rebuild_index_incremental[6c0968db-8e24-4a6f-81db-09b140d88c7b] succeeded in 1.5453137090080418s: None
2026-05-04 07:49:27,930 INFO 270 [celery_utils.logged_task] [user None] [ip None] logged_task.py:25 - Task openedx.core.djangoapps.content.search.tasks.rebuild_index_incremental[6c0968db-8e24-4a6f-81db-09b140d88c7b] submitted with arguments (), {}
Indexing complete!

✅ I read through the code
❌ I checked for accessibility issues
❌ Includes documentation

course_key: CourseKey,
index_name: str | None = None,
status_cb: Callable[[str], None] | None = None,
) -> list:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we improve this while we are at it? Instead of list, we can do a list[dict], not in the scope I am just trying for extra cleaning. If we had a typedict for the fields, that would be the best, but we can make small improvement, what do you say?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll change it to list[dict]. Normally I'd agree about using TypedDict, but in this particular case our code isn't really using the return value other than checking its length, so I don't think it's worth spending too much time on.

@farhaanbukhsh
Copy link
Copy Markdown
Member

@bradenmacdonald, this is unrelated, but do you think block.has_children we should check for block is not None? I am not too adamant on it since if this is an anomaly, the code should scream.

@bradenmacdonald
Copy link
Copy Markdown
Contributor Author

@farhaanbukhsh Yeah, my thinking is more that the function should never be called with block=None in the first place, so I'm keeping the checks in the calling function.

@bradenmacdonald bradenmacdonald enabled auto-merge (squash) May 4, 2026 23:55
@bradenmacdonald bradenmacdonald merged commit 2fd5857 into openedx:master May 5, 2026
41 checks passed
@bradenmacdonald bradenmacdonald deleted the braden/fix-reindex-bugs branch May 5, 2026 00:13
@github-project-automation github-project-automation Bot moved this from Needs Triage to Done in Contributions May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core contributor PR author is a Core Contributor (who may or may not have write access to this repo). FC Relates to an Axim Funded Contribution project open-source-contribution PR author is not from Axim or 2U

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants