Skip to content

Conversation

@mbertrand
Copy link
Member

@mbertrand mbertrand commented Jun 13, 2024

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/4567

Description (What does it do?)

Uses a different method of comparing tarfiles that seems more accurate in distinguishing different vs identical archives.

How can this be tested?

  • Set .env variables for AWS_* and MITX_ONLINE_* using RC values
  • Run ./manage.py backpopulate_mitxonline_data if you haven't done so before.
from learning_resources.models import LearningResourceRun
from learning_resources.etl.edx_shared import *

run = LearningResourceRun.objects.get(run_id="course-v1:MITxT+8.01.4x+1T2024")
sync_edx_course_files(
    "mitxonline", 
    [run.learning_resource.id], 
    ["20231213/courses/course-v1:MITxT+8.01.4x+1T2024.tar.gz"]
)

This should take a little while, and a bunch of contentfiles should be imported. You will see the error "Could not read verticals from path /tmp/course-v1:MITxT+8j2tzjuox/course" at one point, that's okay.

Now run sync_edx_course_files again, slightlly altering the last argument value to the next day's archive:

sync_edx_course_files(
    "mitxonline", 
    [run.learning_resource.id], 
    ["20231214/courses/course-v1:MITxT+8.01.4x+1T2024.tar.gz"]
)

It should finish much more quickly and you should see the log message "Checksums match for 20231214/courses/course-v1:MITxT+8.01.4x+1T2024.tar.gz, skipping"

Run again, with a more recent archive that has changes:

sync_edx_course_files(
    "mitxonline", 
    [run.learning_resource.id], 
    ["20240613/courses/course-v1:MITxT+8.01.4x+1T2024.tar.gz"]
)

It should take a little while again and you should not see the same log message above.

@mbertrand mbertrand changed the title Mb/tarfile checksum Contentfile archive comparison fix Jun 13, 2024
@mbertrand mbertrand force-pushed the mb/tarfile_checksum branch from 350f3b0 to eec5d55 Compare June 13, 2024 17:46
@mbertrand mbertrand added the Needs Review An open Pull Request that is ready for review label Jun 13, 2024
@jkachel jkachel requested review from abeglova and jkachel and removed request for abeglova June 17, 2024 15:28
@jkachel jkachel self-assigned this Jun 17, 2024
Copy link
Contributor

@jkachel jkachel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@mbertrand mbertrand merged commit 330738a into main Jun 17, 2024
@odlbot odlbot mentioned this pull request Jun 18, 2024
13 tasks
@rhysyngsun rhysyngsun deleted the mb/tarfile_checksum branch February 7, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants