Skip to content

Refactor course number fields for Course model, serializer, search index #138

@mbertrand

Description

@mbertrand

Description/Context

Initially, the LearningResource.readable_id field was thought to be a suitable place to store the primary course number for all courses. However, this did not take into consideration the fact that multiple OCW courses with the same primary number would exist. Therefore, the readable_id values for OCW had to be modified, but the primary course number still needs to be stored somewhere for search purposes.

In addition, filtering by course number and department necessitates that the serializers and indexed documents be structured in such a way to facilitate that.

Plan/Design

Model changes

Option A:

Replace the Course.extra_course_numbers ArrayField with a Course.course_numbers JSONField. The format of the data stored here could look something like this:

[
       {
            "value": "8.370",  
            "listing_type": "canonical",
            "department": "Philosophy"
        }, {
            "value": "2.111",
            "listing_type": "cross-listed",
            "department": "Chemistry"
        }, {
            "value": "MITx.LASER+R01",
            "listing_type": "cross-listed",
            "department": null
        }, {
            "value": "18.435",
            "listing_type": "cross-listed",
            "department": "Social Science"
        }
]

The department info above is a bit redundant because OCW course numbers begin with the department id, and the "departments" field in the API serializer will provide both the department id and name. So it could potentially be dropped.

Option B:

Leave the current extra_course_numbers field as is, and add a primary_course_number field. This seems a bit simpler and perhaps more intuitive to API users if we include the two fields in the serializer as is. The department if any could still be deduced from the course numbers.

Serializer changes

The API CourseSerializer will need to be changed to reflect the model changes above.

Search index changes

The search index will require a department_course_numbers field as described here in order to search and sort by all course numbers with or without a department filter applied. This field currently looks like this:

                    "department_course_numbers": [
                        {
                            "coursenum": "21H.983J",
                            "department": "History",
                            "primary": true,
                            "sort_coursenum": "21H.983J"
                        },
                        {
                            "coursenum": "21H.109J,WGS.310J,WGS.303J",
                            "department": "History",
                            "primary": false,
                            "sort_coursenum": "21H.109J,WGS.310J,WGS.303J"
                        }
                    ],

This is not a field we want to include in the API serializer, nor in the response data sent back in search results, yet we need to include it in the data sent to populate the opensearch index.

There are a couple of ways this could be done:

  • Subclass the LearningResourceSerializer (OSLearningResourceSerializer) to add the above field when appropriate (for resources of type course), and use this serializer to populate the search index.
  • Provide a single root function that:
    • serializes the resource using the same serializer as the APIs
    • generates an object that represents the extra properties to merge into the original.
    • does a deepmerge of those two objects

Finally, to remove this field from the search results, an exclude pattern for the _source field can be added:

{
    "_source": {
        "includes": [ "*" ],
        "excludes": [ "*.course.department_course_numbers" ]
    },
    "query" : {
        "term" : { "department" : "Physics" }
    }
}

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions