Skip to content

Conversation

shanbady
Copy link
Contributor

@shanbady shanbady commented Sep 9, 2025

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/8436

Description (What does it do?)

This PR makes it so:

  • we include published html/wiki content and assignment pages as part of ingesting a canvas course.
  • explicitly filter out contents of the tutorbot folder even if it is somehow included in the files_meta.xml (out of abundance of caution)

How can this be tested?

  1. checkout main
  2. restart celery
  3. ingest a canvas course: python manage.py backpopulate_canvas_courses --canvas-ids 34062 --overwrite
  4. view the list of files that were ingested: LearningResource.objects.filter(readable_id__istartswith="34062").first().runs.first().content_files.all().values_list("source_path", flat=True)
  5. note that it is empty for this particular course.
  6. do the same for another with published files:
    python manage.py backpopulate_canvas_courses --canvas-ids 34062 --overwrite
LearningResource.objects.filter(readable_id__istartswith="14566").first().runs.first().content_files.all().values_list("source_path", flat=True)
  1. note the files and filetypes
  2. checkout this branch
  3. restart celery
  4. re-run the backpopulate commmands for each
  5. note that published html content exists for both courses

Additional Context

  • This PR also includes an extra check to filter out web_resources/ai/tutor just in case it were to inadvertently end up in files manifest.
  • The files that have assignment_settings.xml also have an extra check to see if it is only visible to instructors

@shanbady shanbady added the Needs Review An open Pull Request that is ready for review label Sep 9, 2025
@shanbady shanbady marked this pull request as ready for review September 9, 2025 23:51
@mbertrand mbertrand self-assigned this Sep 10, 2025
Copy link
Member

@mbertrand mbertrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

item_keys = ["course_id", "root_account_id", "canvas_domain", "root_account_name"]
for key in item_keys:
element = root.find(f"ns:{key}", namespaces)
element = root.find(f"cccv1p0:{key}", NAMESPACES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why only cccv1p0 is used here (and elsewhere in this code) and not imscp, lom, and lomimscc

@mbertrand mbertrand added Waiting on author and removed Needs Review An open Pull Request that is ready for review labels Sep 10, 2025
@shanbady shanbady merged commit d281ead into main Sep 10, 2025
13 checks passed
@shanbady shanbady deleted the shanbady/ingest-canvas-wiki-content branch September 10, 2025 13:43
@odlbot odlbot mentioned this pull request Sep 10, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants