Contentfile archive comparison fix #1078
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What are the relevant tickets?
Closes https://github.com/mitodl/hq/issues/4567
Description (What does it do?)
Uses a different method of comparing tarfiles that seems more accurate in distinguishing different vs identical archives.
How can this be tested?
AWS_*andMITX_ONLINE_*using RC values./manage.py backpopulate_mitxonline_dataif you haven't done so before.This should take a little while, and a bunch of contentfiles should be imported. You will see the error "Could not read verticals from path /tmp/course-v1:MITxT+8j2tzjuox/course" at one point, that's okay.
Now run
sync_edx_course_filesagain, slightlly altering the last argument value to the next day's archive:It should finish much more quickly and you should see the log message "Checksums match for 20231214/courses/course-v1:MITxT+8.01.4x+1T2024.tar.gz, skipping"
Run again, with a more recent archive that has changes:
It should take a little while again and you should not see the same log message above.