Skip to content

Conversation

@mbertrand
Copy link
Member

@mbertrand mbertrand commented Oct 12, 2023

What are the relevant tickets?

Closes #132

Description (What does it do?)

  • Sets readable_id for OCW courses to be <course_id>+<semester>_<year>
  • Sets etl_source value for OCW courses
  • Adds a migration to update above values for existing OCW courses

How can this be tested?

  • Start with the main branch and try running the following (copy OCW_* values from RC to your .env file first):
    ./manage.py backpopulate_ocw_data  --course-name 21g-104-chinese-iv-regular-spring-2004
    ./manage.py backpopulate_ocw_data  --course-name 21g-104-chinese-iv-regular-spring-2006
    ./manage.py backpopulate_ocw_data  --course-name 21g-104-chinese-iv-regular-spring-2018
    
  • You will end up with one LearningResource that has 3 runs. This is actually not what is desired for OCW courses, because each run of a course has its own separate site url.
  • Switch to this branch and restart containers. A new migration should convert the readable_id for the LearningResource above to include the semester and year of one of the 3 runs, and the other 2 runs should be deleted.
  • Run the 3 backpopulate commands again. When finished, there should be 3 LearningResource objects with 1 run each and readable_ids of 21G.104+spring_2004, 21G.104+spring_2006, and 21G.104+spring_2018

@mbertrand mbertrand changed the title Fix readable_id and etl_source for OCW courses Fix readable_id and run for OCW courses Oct 12, 2023
@codecov
Copy link

codecov bot commented Oct 12, 2023

Codecov Report

Merging #133 (3ea379c) into main (228fe0e) will increase coverage by 9.41%.
Report is 2 commits behind head on main.
The diff coverage is 100.00%.

❗ Current head 3ea379c differs from pull request most recent head e2a5d06. Consider uploading reports for the commit e2a5d06 to get more accurate results

@@            Coverage Diff             @@
##             main     #133      +/-   ##
==========================================
+ Coverage   79.45%   88.86%   +9.41%     
==========================================
  Files         334      201     -133     
  Lines       14522     9581    -4941     
  Branches     2318     1437     -881     
==========================================
- Hits        11538     8514    -3024     
+ Misses       2708      891    -1817     
+ Partials      276      176     -100     
Files Coverage Δ
learning_resources/admin.py 100.00% <100.00%> (ø)
learning_resources/etl/ocw.py 91.73% <100.00%> (+0.13%) ⬆️

... and 134 files with indirect coverage changes

@mbertrand mbertrand changed the title Fix readable_id and run for OCW courses Update readable_id and run for OCW courses Oct 12, 2023
@mbertrand mbertrand added Needs Review An open Pull Request that is ready for review and removed Work in Progress labels Oct 12, 2023
@ChristopherChudzicki ChristopherChudzicki self-assigned this Oct 13, 2023
Copy link
Contributor

@ChristopherChudzicki ChristopherChudzicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense and is working as described 👍

I do wonder whether it might be better to have separate "primary" and "extra" course number fields. If primary vs extra is a meaningful distinction (I suspect it is), it seems better to not rely on the ordering of an array to signify that distinction.

If we do decide to keep at the course numbers in a single array, maybe course_numbers (rather than extra_ would be a better name for the field.

@ChristopherChudzicki ChristopherChudzicki added Waiting on author and removed Needs Review An open Pull Request that is ready for review labels Oct 13, 2023
@mbertrand mbertrand added Needs Review An open Pull Request that is ready for review and removed Waiting on author labels Oct 13, 2023
Copy link
Contributor

@ChristopherChudzicki ChristopherChudzicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbertrand What I describe below might have happened this morning, and I just missed it. I'm unsure.

When I looked at this this afternoon, I:

  1. unapplied migration 0020
  2. Deletes all my courses with LearningResource.objects.all().filter(resource_type="course").delete()
  3. Checked out main, and started following instructions.

After running

./manage.py backpopulate_ocw_data  --course-name 21g-104-chinese-iv-regular-spring-2004
./manage.py backpopulate_ocw_data  --course-name 21g-104-chinese-iv-regular-spring-2006
./manage.py backpopulate_ocw_data  --course-name 21g-104-chinese-iv-regular-spring-2018

then switching to this branch and migrating, the course I had left was:

{
    "id": 2395,
    "topics": [
        {
            "id": 24,
            "name": "Humanities"
        },
        {
            "id": 25,
            "name": "Language"
        },
        {
            "id": 26,
            "name": "Chinese"
        }
    ],
    "offered_by": "OCW",
    "resource_content_tags": [
        "Presentation Assignments"
    ],
    "departments": [
        {
            "department_id": "21G",
            "name": "Global Studies and Languages"
        }
    ],
    "certification": null,
    "prices": [],
    "course": {
        "extra_course_numbers": []
    },
    "learning_path": null,
    "podcast": null,
    "podcast_episode": null,
    "runs": [
        {
            "id": 61,
            "instructors": [
                {
                    "id": 3,
                    "first_name": "Tong",
                    "last_name": "Chen",
                    "full_name": "Tong Chen"
                }
            ],
            "image": {
                "id": 1241,
                "url": "https://ocw.mit.edu/courses/21g-104-chinese-iv-regular-spring-2004/7c6c7552f7d4b485e14a8341634d1db4_21g-104s04.jpg",
                "description": "Image of a Chinese garden. Courtesy of the Library of Congress",
                "alt": "A black and white photo of a stream with a bridge, and a pagoda."
            },
            "run_id": "5ac72131a1b756a6e1e9f45ea7c30be9",
            "title": "Chinese IV (Regular)",
            "description": "This is the continuing instruction in spoken and written Chinese, with particular emphasis on consolidating basic conversational skills and improving reading confidence and depth.\n\nUpon completion of the course, students should be able to speak Chinese with some fluency on basic conversational topics, achieve a basic level of reading competence within simplified and traditional characters learned plus common compounds, and be able to write short compositions.\n",
            "full_description": null,
            "last_modified": "2023-10-09T14:08:05Z",
            "published": false,
            "languages": null,
            "url": "https://ocw.mit.edu/courses/21g-104-chinese-iv-regular-spring-2004",
            "level": "Undergraduate",
            "slug": "courses/21g-104-chinese-iv-regular-spring-2004",
            "availability": "Current",
            "semester": "Spring",
            "year": 2004,
            "start_date": null,
            "end_date": null,
            "enrollment_start": null,
            "enrollment_end": null,
            "prices": null,
            "checksum": null
        }
    ],
    "image": {
        "id": 1242,
        "url": "https://ocw.mit.edu/courses/21g-104-chinese-iv-regular-spring-2018/25e56cde2e0039019d36f4fe161b288b_21g-104s18.jpg",
        "description": "",
        "alt": "A picture at a Chinese temple with a lot of red lanterns with a blue sky background."
    },
    "learning_path_parents": [],
    "user_list_parents": [],
    "program": null,
    "readable_id": "21G.104+spring_2004",
    "title": "Chinese IV (Regular)",
    "description": "Together with [21G.103 Chinese III](/courses/21g-103-chinese-iii-regular-fall-2018/), this course forms the intermediate level of what constitutes a four-term foundation in Mandarin. Upon completion of Chinese III and IV, students should be able to speak Chinese with fluency on everyday topics, reach a literacy level of 750 characters (approximately 1200 common words written in both traditional and simplified characters), read materials written in simple standard written Chinese, and produce both orally and in writing short compositions on everyday topics. Throughout the course we will address issues of how cultural differences inform and are informed by different linguistic contexts and practices.\n",
    "full_description": null,
    "last_modified": "2023-10-09T14:18:15Z",
    "published": true,
    "languages": null,
    "url": "https://ocw.mit.edu/courses/21g-104-chinese-iv-regular-spring-2018",
    "resource_type": "course",
    "professional": false,
    "platform": "ocw"
}

Issue: Compare the readable_id and url fields: one references 2004, the other references 2018.

Note: If we I delete the course and run backpopulate again, everything looks good. So it looks like just a migration issue.

Question: Do we need this migration? We could just run backpopulate with an overwrite option, right?

@mbertrand
Copy link
Member Author

@ChristopherChudzicki I repeated the steps and could not reproduce that issue - each course was linked to the correct run. Did you restart containers after switching branches? Celery tasks will keep running on whatever branch the container started on.

Without the migration, we will end up with an extra, obsolete dupe resource (21G.104, 21G.104+spring_2004, 21G.104+spring_2006, 21G.104+spring_2018). And would have to reimport all the OCW courses (could delete all then reimport to avoid the dupe issue), but that takes awhile, the migration is intended as a shortcut.

@ChristopherChudzicki
Copy link
Contributor

@mbertrand I may not have. I'll check again.

Copy link
Contributor

@ChristopherChudzicki ChristopherChudzicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbertrand Picking the run based on LearningResource.url fixed the migration for me 👍.

@mbertrand mbertrand merged commit 4166fda into main Oct 16, 2023
@mbertrand mbertrand deleted the mb/ocw_readable_id branch January 16, 2024 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OCW resource readable_id needs to include more than course number

3 participants