-
Notifications
You must be signed in to change notification settings - Fork 3
Professional Education ETL Pipeline #1210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works great! Back to you with a request for some variable name changes and a question
learning_resources/etl/loaders.py
Outdated
| **{unique_field: unique_value}, | ||
| platform=platform, | ||
| resource_type=resource_type, | ||
| published=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused by this logic. Why is the delete necessary? And does that mean that courses that are imported as unpublished, such as mitpe courses with no future runs in the logic below are deleted and recreated every run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code should be run only when the readable_id is not the unique_field value, and in that case if there may be multiple unpublished resources with the same value for unique_field that need to be removed to avoid database duplicate errors.
Good catch on the mitpe courses with no future runs though, those probably should just be skipped altogether instead of (re)created as unpublished. Not much point in importing unpublished resources.
learning_resources/etl/mitpe.py
Outdated
| img_metadata = get_related_object(img_url) | ||
| if img_metadata: | ||
| field_image = img_metadata["relationships"]["field_media_image"] | ||
| img_url_2 = field_image["links"]["related"]["href"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be called image_data_url or something like that?
learning_resources/etl/mitpe.py
Outdated
| Returns: | ||
| list of dict: list containing topic dicts with a name attribute | ||
| """ | ||
| topic_data = resource_data["relationships"]["field_course_topics"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be
topics_exist = resource_data["relationships"]["field_course_topics"]["data"]
instead of reusing the variable name topic_data to mean different things
|
@abeglova Peter and I are going to have a meeting at some point with the Professional Education API developer to discuss possible enhancements like clearly delineating programs vs courses. So I'm going to label this as "Blocked" until after that meeting. |
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
d48bc1a to
9a5299a
Compare
580fcc4 to
f3495f4
Compare
45537cf to
ddcb055
Compare
023533a to
41770bb
Compare
41770bb to
c66c795
Compare
c66c795 to
52421c4
Compare
2fff71a to
ab55004
Compare
f9cb43c to
70d6e5e
Compare
70d6e5e to
a24d970
Compare
abeglova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
work great one minor fix required
learning_resources/etl/mitpe.py
Outdated
|
|
||
| log = logging.getLogger(__name__) | ||
|
|
||
| BASE_URL = "https://professional.mit.edu/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates settings.MITPE_BASE_API_URL. You should get rid of either the constant or the setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, at first the value was different while the API was in development (professional-dev.mit.edu) but now they're the same and that's unlikely to ever change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a settings.MITPE_BASE_URL, going to use that here instead and remove both the above constant BASE_URL as well as settings.MITPE_BASE_API_URL
7c420db to
a1927f1
Compare
What are the relevant tickets?
Closes https://github.com/mitodl/hq/issues/4731
Related
ol-infrastructurePR: mitodl/ol-infrastructure#2536Description (What does it do?)
Screenshots (if appropriate):
How can this be tested?
Set the following values in your backend .env file:
Run
./manage.py backpopulate_prolearn_dataWhen it is done, there should be roughly >130 learning resources with
etl_source=prolearnandoffered_by=mitpe. They will all be courses and not programs. http://api.open.odl.local:8063/admin/learning_resources/learningresource/?o=2&offered_by__code__exact=mitpeIf you don't already have a populated featured learning path for Professional Education, run
./manage.py populate_featured_liststo create oneSwitch to this branch.
Set
MITPE_API_ENABLED=Truein your backend .env file, restart containers.Run
./manage.py backpopulate_mitpe_datafollowed by./manage.py backpopulate_prolearn_data. When it is done, most/all the same courses returned above should now haveetl_source=mitpeand have newreadable_idvalues. Some that were courses in prolearn should be unpublished, and created as new programs instead of courses. There should not be any published dupes from different sources (prolearn and mitpe). Some will be unpublished now due to changed criteria (past enrollment end date or past end date).If you go to
http://open.odl.local:8062/learningpathsand open the Professional Education featured learning path, it will be empty. Run the following to reassign the old featured courses from prolearn to their new equivalents./manage.py update_index --all