fix: make studio reindex more robust, provide better logging [FC-0117]#38498
Conversation
|
Thanks for the pull request, @bradenmacdonald! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
1 similar comment
|
Thanks for the pull request, @bradenmacdonald! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
There was a problem hiding this comment.
👍
I have a small nit, but overall the PR looks good to me.
✅ I tested this:
without the code:
2026-05-04 06:53:26,341 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] tasks.py:206 - Starting incremental Studio search index population...
2026-05-04 06:53:26,343 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:584 - Counting libraries...
2026-05-04 06:53:26,348 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:596 - Counting courses...
2026-05-04 06:53:26,349 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:605 - Found 2 courses, 0 libraries.
2026-05-04 06:53:26,349 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:616 - Indexing libraries...
2026-05-04 06:53:26,350 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:732 - Indexing courses...
2026-05-04 06:53:26,350 WARNING 198 [py.warnings] [user None] [ip None] warnings.py:112 - /openedx/edx-platform/openedx/core/djangoapps/content/search/api.py:735: UnorderedObjectListWarning: Pagination may yield inconsistent results with an unordered object_list: <class 'openedx.core.djangoapps.content.course_overviews.models.CourseOverview'> QuerySet.
paginator = Paginator(CourseOverview.objects.only("id", "display_name"), 1000)
2026-05-04 06:53:26,351 INFO 198 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:738 - 1/2. Now indexing course Open edX Demo Course (course-v1:non+existent+course)
2026-05-04 06:53:26,370 ERROR 198 [celery_utils.logged_task] [user None] [ip None] logged_task.py:48 - [0e9f35f9-939a-4b7e-ba56-ca63d963582c] failed due to Traceback (most recent call last):
File "/openedx/venv/lib/python3.12/site-packages/celery/app/trace.py", line 585, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/openedx/venv/lib/python3.12/site-packages/celery/app/autoretry.py", line 38, in run
return task._orig_run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/openedx/venv/lib/python3.12/site-packages/edx_django_utils/monitoring/internal/code_owner/utils.py", line 195, in new_function
return wrapped_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/openedx/edx-platform/openedx/core/djangoapps/content/search/tasks.py", line 209, in rebuild_index_incremental
api.rebuild_index(status_cb=log.info, incremental=True)
File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 744, in rebuild_index
course_docs = index_course(course.id, index_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 564, in index_course
_recurse_children(course, add_with_children)
File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 300, in _recurse_children
if block.has_children:
^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'has_children'
2026-05-04 06:53:26,370 ERROR 198 [celery.app.trace] [user None] [ip None] trace.py:309 - Task openedx.core.djangoapps.content.search.tasks.rebuild_index_incremental[0e9f35f9-939a-4b7e-ba56-ca63d963582c] raised unexpected: AttributeError("'NoneType' object has no attribute 'has_children'")
Traceback (most recent call last):
File "/openedx/venv/lib/python3.12/site-packages/celery/app/trace.py", line 585, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/openedx/venv/lib/python3.12/site-packages/celery/app/autoretry.py", line 38, in run
return task._orig_run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/openedx/venv/lib/python3.12/site-packages/edx_django_utils/monitoring/internal/code_owner/utils.py", line 195, in new_function
return wrapped_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/openedx/edx-platform/openedx/core/djangoapps/content/search/tasks.py", line 209, in rebuild_index_incremental
api.rebuild_index(status_cb=log.info, incremental=True)
File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 744, in rebuild_index
course_docs = index_course(course.id, index_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 564, in index_course
_recurse_children(course, add_with_children)
File "/openedx/edx-platform/openedx/core/djangoapps/content/search/api.py", line 300, in _recurse_children
if block.has_children:
^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'has_children'
2026-05-04 06:53:26,371 INFO 198 [celery_utils.logged_task] [user None] [ip None] logged_task.py:25 - Task openedx.core.djangoapps.content.search.tasks.rebuild_index_incremental[0e9f35f9-939a-4b7e-ba56-ca63d963582c] submitted with arguments (), {}
Indexing complete!using the PR
2026-05-04 07:49:26,385 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] tasks.py:206 - Starting incremental Studio search index population...
2026-05-04 07:49:26,388 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:595 - Counting libraries...
2026-05-04 07:49:26,394 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:609 - Counting courses...
2026-05-04 07:49:26,400 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:618 - Found 2 courses, 0 libraries.
2026-05-04 07:49:26,400 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:629 - Indexing libraries...
2026-05-04 07:49:26,400 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:747 - Indexing courses...
2026-05-04 07:49:26,402 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:753 - 1/2. Now indexing course Open edX Demo Course (course-v1:non+existent+course)
2026-05-04 07:49:26,413 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:564 - Error: course course-v1:non+existent+course does not seem to exist! It may have been incompletely deleted.
2026-05-04 07:49:26,418 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:753 - 2/2. Now indexing course Open edX Demo Course (course-v1:OpenedX+DemoX+DemoCourse)
2026-05-04 07:49:27,911 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] api.py:766 - Done! 412 blocks indexed across 2 courses, collections and libraries.
2026-05-04 07:49:27,912 INFO 270 [openedx.core.djangoapps.content.search.tasks] [user None] [ip None] tasks.py:219 - Incremental Studio search index population complete.
2026-05-04 07:49:27,930 INFO 270 [celery.app.trace] [user None] [ip None] trace.py:128 - Task openedx.core.djangoapps.content.search.tasks.rebuild_index_incremental[6c0968db-8e24-4a6f-81db-09b140d88c7b] succeeded in 1.5453137090080418s: None
2026-05-04 07:49:27,930 INFO 270 [celery_utils.logged_task] [user None] [ip None] logged_task.py:25 - Task openedx.core.djangoapps.content.search.tasks.rebuild_index_incremental[6c0968db-8e24-4a6f-81db-09b140d88c7b] submitted with arguments (), {}
Indexing complete!✅ I read through the code
❌ I checked for accessibility issues
❌ Includes documentation
| course_key: CourseKey, | ||
| index_name: str | None = None, | ||
| status_cb: Callable[[str], None] | None = None, | ||
| ) -> list: |
There was a problem hiding this comment.
Nit: can we improve this while we are at it? Instead of list, we can do a list[dict], not in the scope I am just trying for extra cleaning. If we had a typedict for the fields, that would be the best, but we can make small improvement, what do you say?
There was a problem hiding this comment.
Sure, I'll change it to list[dict]. Normally I'd agree about using TypedDict, but in this particular case our code isn't really using the return value other than checking its length, so I don't think it's worth spending too much time on.
|
@bradenmacdonald, this is unrelated, but do you think |
|
@farhaanbukhsh Yeah, my thinking is more that the function should never be called with |
Description
Part of #36868 , also fixes a bug reported on the forum
This PR:
reindex_studiomanagement command so that it will gracefully log and continue when it tries to index a deleted course, rather than crashing.CourseOverview, but if there is aCourseOverviewentry that no longer exists in modulestore, the whole reindex process would crash.reindex_studio:Logging before: (note the numbers are quite confusing - are there 34 or 11 or 4 different things to index? Several levels of progress are all mixed together)
Logging after: (the overall progress is always shown at the beginning, like 1/34, and sub-tasks state their progress differently.)
Testing instructions
tutor dev exec cms bashand./manage.py cms shelland use the following commands to create an invalidCourseOverviewentry:./manage.py cms reindex_studio. It should fail with the error reported on the forum (AttributeError: 'NoneType' object has no attribute 'has_children')./manage.py cms reindex_studioagain. Note the confusing numbering, like no libraries to index even if you have some.CourseOverviewentry that we created in step 2 so it doesn't cause any more trouble. 🐒Deadline
No particular deadline.