Skip to content

Conversation

@mbertrand
Copy link
Member

@mbertrand mbertrand commented Sep 9, 2024

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/5389

Description (What does it do?)

Adds an ETL pipeline for ingesting MIT edX programs from an API endpoint

Screenshots (if appropriate):

Screenshot 2024-09-09 150424

How can this be tested?

  • Set the following settings in your backend env (same values as RC if not specified here):
    EDX_API_ACCESS_TOKEN_URL=https://api.edx.org/oauth2/v1/access_token
    EDX_API_CLIENT_ID=
    EDX_API_CLIENT_SECRET=
    EDX_API_URL=https://api.edx.org/catalog/v1/catalogs/10/courses
    EDX_PROGRAMS_API_URL=https://discovery.edx.org/api/v1/programs/
    
  • Run ./manage.py backpopulate_edx_data
  • Go to http://open.odl.local:8062/search/?offered_by=mitx&resource_category=program&platform=edx&sortby=new
  • The 3 new programs at the top should be "Future Energy Systems", "Computational Thinking using Python", and "Circuits and Electronics"
  • The price shown for each program should equal the amount shown on the program url page, before discount. No idea where the discount comes from, it doesn't seem to be present in the API output.
  • Each should have topics and instructors based on the child courses.

Additional Context

The programs API endpoint also returns micromasters programs, which are currently ignored in favor of the data from the micromasters API endpoint.

@mbertrand mbertrand added the Needs Review An open Pull Request that is ready for review label Sep 9, 2024
Copy link
Contributor

@rhysyngsun rhysyngsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review

Comment on lines 21 to 27
Helper function to determine if a course is an MIT course
Args:
course (dict): The JSON object representing the course with all its course runs
Returns:
bool: indicates whether the course is owned by MIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring needs to be updated from copy/paste.

# use the OpenEdx factory to create our extract and transform funcs
extract, _transform = openedx_extract_transform_factory(get_open_edx_config)

# modified transform function that filters the course list to ones that pass the _is_mit_course() predicate # noqa: E501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment referencing wrong function

Comment on lines 375 to 381
Transform a course run into the normalized data structure
Args:
config (OpenEdxConfiguration): configuration for the openedx backend
Returns:
dict: the tranformed course run data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy/paste typos

return dates


def _add_course_prices(program):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest changing the name of this function since _add_course_prices sounds like it'd add the course prices to the program object being passed in. Replacing "add" with "sum" is probably good enough.

programs = pipelines.mit_edx_programs_etl(api_datafile)
clear_search_cache()
return len(courses)
return len(courses + programs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it's probably a bit better to take len(courses) + len(programs) so that you're not generating an entirely new list in memory just to take the len() of it.

@mbertrand
Copy link
Member Author

@rhysyngsun ready for another look

Copy link
Contributor

@rhysyngsun rhysyngsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mbertrand mbertrand merged commit 61c6441 into main Sep 11, 2024
@odlbot odlbot mentioned this pull request Sep 12, 2024
17 tasks
@rhysyngsun rhysyngsun deleted the mb/mit_edx_programs branch February 7, 2025 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants