ingest canvas html content #2502

shanbady · 2025-09-09T18:07:21Z

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/8436

Description (What does it do?)

This PR makes it so:

we include published html/wiki content and assignment pages as part of ingesting a canvas course.
explicitly filter out contents of the tutorbot folder even if it is somehow included in the files_meta.xml (out of abundance of caution)

How can this be tested?

checkout main
restart celery
ingest a canvas course: python manage.py backpopulate_canvas_courses --canvas-ids 34062 --overwrite
view the list of files that were ingested: LearningResource.objects.filter(readable_id__istartswith="34062").first().runs.first().content_files.all().values_list("source_path", flat=True)
note that it is empty for this particular course.
do the same for another with published files:
python manage.py backpopulate_canvas_courses --canvas-ids 34062 --overwrite

LearningResource.objects.filter(readable_id__istartswith="14566").first().runs.first().content_files.all().values_list("source_path", flat=True)

note the files and filetypes
checkout this branch
restart celery
re-run the backpopulate commmands for each
note that published html content exists for both courses

Additional Context

This PR also includes an extra check to filter out web_resources/ai/tutor just in case it were to inadvertently end up in files manifest.
The files that have assignment_settings.xml also have an extra check to see if it is only visible to instructors

mbertrand

LGTM 👍

mbertrand · 2025-09-10T13:36:00Z

learning_resources/etl/canvas.py

    item_keys = ["course_id", "root_account_id", "canvas_domain", "root_account_name"]
    for key in item_keys:
-        element = root.find(f"ns:{key}", namespaces)
+        element = root.find(f"cccv1p0:{key}", NAMESPACES)


Just curious why only cccv1p0 is used here (and elsewhere in this code) and not imscp, lom, and lomimscc

shanbady added 8 commits September 9, 2025 08:52

initial working version of gathering assignments and pages

6e0815f

test fix

20064af

check metadata for visibility to students vs authors

3ada4b0

type hints

aca1bc5

remove typehint

09c2dda

consolidation of namespaces

d2f277e

explicitely exclude tutor folder no matter what and added tests

9f128d4

fix test

f65ce82

shanbady added the Needs Review An open Pull Request that is ready for review label Sep 9, 2025

shanbady marked this pull request as ready for review September 9, 2025 23:51

more conditionals for assignment workflow state

48b7027

mbertrand self-assigned this Sep 10, 2025

mbertrand approved these changes Sep 10, 2025

View reviewed changes

mbertrand added Waiting on author and removed Needs Review An open Pull Request that is ready for review labels Sep 10, 2025

shanbady merged commit d281ead into main Sep 10, 2025
13 checks passed

shanbady deleted the shanbady/ingest-canvas-wiki-content branch September 10, 2025 13:43

odlbot mentioned this pull request Sep 10, 2025

Release 0.42.3 #2504

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ingest canvas html content #2502

ingest canvas html content #2502

Uh oh!

shanbady commented Sep 9, 2025

Uh oh!

mbertrand left a comment

Uh oh!

mbertrand Sep 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ingest canvas html content #2502

ingest canvas html content #2502

Uh oh!

Conversation

shanbady commented Sep 9, 2025

What are the relevant tickets?

Description (What does it do?)

How can this be tested?

Additional Context

Uh oh!

mbertrand left a comment

Choose a reason for hiding this comment

Uh oh!

mbertrand Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants