Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OEP-0026: Real-time Events (xAPI/Caliper) #73

Merged
merged 1 commit into from Jan 19, 2019

Conversation

@nasthagiri
Copy link
Member

commented Jul 12, 2018

Open edX Proposal for Real-time Events, primarily for Adaptive Learning capabilities.

Review Period
November 29, 2018 -> December 20, 2018

Arbiter
@brianhw

Motivation
To satisfy emerging use cases that require notifying external systems of LMS events in real-time in standardized formats, supporting real-time events is a natural evolution of Open edX's eventing and API capabilities and its impact on connecting users, organizations, and learning services.

@nasthagiri nasthagiri force-pushed the arch/xapi-oep branch from fc1fab0 to ff21a32 Jul 12, 2018

@jmaupetit
Copy link

left a comment

Thanks for helping us bootstrapping the design of this project. I've initiated two threads about the Open edX User UUID and the router layer.

oeps/oep-0026-arch-xapi-integration.rst Outdated Show resolved Hide resolved
oeps/oep-0026-arch-xapi-integration.rst Outdated Show resolved Hide resolved
oeps/oep-0026-arch-xapi-integration.rst Outdated Show resolved Hide resolved
oeps/oep-0026-arch-xapi-integration.rst Outdated Show resolved Hide resolved

* User restriction - certain consumers can access all events for certain users.
* Site restriction - certain consumers are limited to accessing events of certain sites.
* Activity type restriction - certain consumers can access only certain types of events.

This comment has been minimized.

Copy link
@jmaupetit

jmaupetit Jul 12, 2018

Do you think it must be a layer of the app? IMO it must be a separated component acting as a proxy to forward the right event to the right consumer with appropriate permissions.

This comment has been minimized.

Copy link
@nasthagiri

nasthagiri Jul 13, 2018

Author Member

To be clear, when I use the term "layer" here, I simply mean a separable logical part of the plugin. I do not mean to imply a specific way of how it is implemented.

Would it be clearer if I use the term "component" instead of "layer"?

This comment has been minimized.

Copy link
@jmaupetit

jmaupetit Jul 16, 2018

IMO yes. Thanks!

This comment has been minimized.

Copy link
@nasthagiri

nasthagiri Jul 16, 2018

Author Member

done. thanks.

oeps/oep-0026-arch-xapi-integration.rst Outdated Show resolved Hide resolved

@nasthagiri nasthagiri requested a review from brianhw Jul 13, 2018

oeps/oep-0026-arch-xapi-integration.rst Outdated Show resolved Hide resolved
oeps/oep-0026-arch-xapi-integration.rst Outdated Show resolved Hide resolved
oeps/oep-0026-arch-xapi-integration.rst Outdated Show resolved Hide resolved
@jmaupetit

This comment has been minimized.

Copy link

commented Jul 16, 2018

Thanks for having add the list of events @nasthagiri 🎉

@ormsbee

This comment has been minimized.

Copy link
Member

commented Jul 16, 2018

Sorry I'm late to the party, but what were the performance concerns about having a new table with a UUID in it?

@jmaupetit

This comment has been minimized.

Copy link

commented Jul 16, 2018

I might be wrong, but don't you think that this relationship could increase requests performance over the auth_user model?

@ormsbee

This comment has been minimized.

Copy link
Member

commented Jul 16, 2018

@jmaupetit: I don't think it will really make that much of a difference. But since we'd expect the user:UUID association to never change, if it ever does become an issue, we can throw caching at it pretty easily.

@sampaccoud

This comment has been minimized.

Copy link

commented Jul 16, 2018

Adding a new column on a huge table should not require down time if it is done in several steps:

  • adding the column as optional (requires lock but very fast)
  • adding a default
  • running a data migration on the existing records
  • making the column required (requires lock but very fast)
@ormsbee

This comment has been minimized.

Copy link
Member

commented Jul 16, 2018

I haven't been a participant in a DB-related outage RCA in a while, but my understanding is that we've suffered long locks on table migrations, even for adding an optional field, on the order of 1 min per million rows. @jibsheet or @feanil might have more recent edX operational perspective to offer.

(Updated comment because I couldn't remember what version of MySQL we were running at any given time, and the devops folks would know better than my fuzzy recollection.)

@ormsbee

This comment has been minimized.

Copy link
Member

commented Jul 16, 2018

Migrations have been one of the leading cause of operational issues on edx.org. There are definitely ways to make MySQL add a column without downtime, but I think it's easier to just make a separate table and join it to auth_user.

@AnnaCampus

This comment has been minimized.

Copy link

commented Jul 17, 2018

Hi, so sorry again for joining you in the middle of the discussion. I just understood from Roi today the great progress that was done here with the Spec. I'll review the whole spec later on but I want to join to @ormsbee concern about the addition of a column vs new table. From my previous experience addition of new column to existing table might by tricky and might need a more complex upgrade process

@nasthagiri

This comment has been minimized.

Copy link
Member Author

commented Jul 17, 2018

@AnnaCampus Thank you for chiming in. I agree with you. We recently ran into unforeseen migration issues when we tried to replace an out-of-box OAuth Application class to our own custom class. Apparently, others on StackOverflow had faced similar issues. We eventually backed out of that approach and embraced an alternative data model change. The StackOverflow articles referred to similar issues when replacing Django's user model with a custom one (in a live production environment).

While I also agree with @sampaccoud and @jmaupetit on the merit of updating the user table, I believe there is a lot else to be done with this initiative. For example, we need to enhance the platform to expose good URNs for users, courses, and enrollments. We may also need to update existing openedX events to include additional data that we're not already including.

Given this, the current proposal of sending a hash of the LMS user-id should be sufficient for our immediate needs. (Apparently, hash-of-user-id is also the value that we send with our edX data packages. @brianhw can correct me.) So we can look into customizing the user table at a later point. For now, let's focus our efforts on the very many other enhancements to our platform as required by this feature.

@sampaccoud

This comment has been minimized.

Copy link

commented Jul 17, 2018

The concern with adding a new table is that we would make a product design decision based on operational issues.

It may make sense for edx.org (and a few others like us maybe), but it's hard to justify for the mass of all the other users of Open edX. If not impacting performances, it complicates the design, the code and will generate more technical debt.

So +1 for @nasthagiri's pragmatic approach.

@ormsbee

This comment has been minimized.

Copy link
Member

commented Jul 17, 2018

I guess as long as we're sure we don't need to do reverse-lookups to map the anonymous ID back to a real user, the hash works for me too. Even more so if we specifically do not want the ability to do a reverse lookup, for privacy reasons.

@sampaccoud: FWIW, even from a code perspective, I would favor a separate model over tacking on an extra field to the user model. Anonymous user mapping, whether more generally or for xAPI in particular, is distinct enough where I think it belongs in its own space. User models can become a catch-all space for all kinds of random things, and I especially wouldn't want to tack it onto the standard Django contrib auth_user model, given it's owned by an entirely separate space. Granted, we could make a custom user model, point it to the same table via config, and do a little fudging to make that transition work smoothly -- but I think all that combines to more technical debt and confusion than just making a separate model and joining it. It's not as compact, but it's pretty simple.

@jmaupetit

This comment has been minimized.

Copy link

commented Jul 18, 2018

@nasthagiri one last remark: in the perspective of a sha256 hash of the user ID "long-term temporary" solution (:trollface:), can we explicitly mention that this hash will be at least salted? and generated with bcrypt or alike?

+ `edx.problem.hint.demandhint_displayed <http://edx.readthedocs.io/projects/devdata/en/latest/internal_data_formats/tracking_logs/student_event_types.html#edx-problem-hint-demandhint-displayed>`_.
Whenever a learner requests a hint to a problem.

- **Video events**

This comment has been minimized.

Copy link
@Colin-Fredericks

Colin-Fredericks Jul 18, 2018

There are a few video-related events missing from the list. The ones I can think of right now are changing the transcript language, changing the video speed, changing the volume, opening/closing the captions or the transcript, downloading the videos, and downloading the transcripts. We might not need all of those for this iteration, but I think getting the language could be important.

@bradenmacdonald

This comment has been minimized.

Copy link
Member

commented Jul 20, 2018

Very nice proposal!

What about grade-producing XBlocks other than capa? For example, Drag and Drop, or even ORA2. Such XBlocks don't emit any of the "Problem interaction events" listed herein, but they do emit standard grade and/or completion events and could be considered of importance for adaptive learning.

Should the translator provide an xAPI event for generic XBlock grade events?

@Colin-Fredericks

This comment has been minimized.

Copy link

commented Jul 23, 2018

@bradenmacdonald I feel like there needs to be some standard equivalent of the problem_check event and the show_answer event for every grade-returning xblock. Otherwise, new graded xblocks will end up needing to have their own unique events incorporated one-by-one into a translator or into every adaptive engine.

If there's no equivalent to the hint event or the submit event that seems like less of a big deal to me, but I also don't 100% understand why the submit and check events are separate.

@nasthagiri

This comment has been minimized.

Copy link
Member Author

commented Jul 23, 2018

@bradenmacdonald @Colin-Fredericks

The edx.grades.problem.submitted event is fired for all scorable xBlocks - so that includes Drag and Drop, ORA, CAPA, etc. Since it is fired by the grading infrastructure, it is generic and doesn't include any problem-specific data. Does this address your questions?

The problem_check and show_answer events are legacy events that CAPA fires and we continue to support. Those seem aligned to xAPI's "problem interaction events" and hence I thought about aligning them.

Note that Caliper includes different profiles, including a Grading Profile and Assessment Profile, that may be better suitable for our events - but we will probably need to support both xAPI and Caliper in the long-run.

I agree that completion events will be super useful to adaptive engines. I would very much like to add them. I did not see them listed, however, in our docs. We should create these events if we don't already have them. I had listed edx.problem.completed and edx.video.completed in the spreadsheet as possible future things we will need.

@Colin-Fredericks

This comment has been minimized.

Copy link

commented Jul 23, 2018

@bradenmacdonald @nasthagiri

Thanks for the details. Having problem-specific data would be incredibly useful. One of the glaring gaps in our (HarvardX's) current adaptive approach is that we can't tell what answer students actually gave to a problem, which really limits us. For instance, a particular wrong answer might indicate that a learner is having a particular issue. If we had information about which answer was given, we could target that issue specifically. Without knowing why a learner got something wrong, we're more in the dark and can't adapt as well.

As far as problem completion events, I just want to caution people that problems may end up firing multiple completion events, so don't treat a completion event as "we can stop collecting data from this problem now."

@nasthagiri

This comment has been minimized.

Copy link
Member Author

commented Jul 23, 2018

Yes, agree that knowing the learner's answer is very important. In the spreadsheet, I included a response field in the results section of the problem_check (answered) event. Currently, I have it as returning submission[answer], assuming that's the best value from that event.

For other non-CAPA problems, we may need to enhance the Scorable interface to provide additional information in the edx.grades.problem.submitted event. I'd rather not just blindly include the student-state data from CoursewareStudentModule as I don't view it as a public interface that we want to expose to adaptive engines. Thoughts?

Example
^^^^^^^

Here is an example of an **Actor** JSON value that we would generate:

This comment has been minimized.

Copy link
@roishillo

roishillo Jul 30, 2018

When there is a guest user (we have some open activities that can be reached by guess) we are sending "null" as the user id. let me know if it works for you - or if you want to have another solution,

This comment has been minimized.

Copy link
@nasthagiri

nasthagiri Nov 21, 2018

Author Member

Interesting. We don't have that situation ourselves today - although we do plan to support "anonymous access to courseware". Though, wouldn't you want to distinguish events from one anonymous user from another anonymous user?

oeps/oep-0026-arch-xapi-integration.rst Outdated Show resolved Hide resolved
oeps/oep-0026-arch-xapi-integration.rst Outdated Show resolved Hide resolved
::

"context": {
"registration": "openedx.org/enrollments/enrollment-v1:<anonymized-enrollment-id>",

This comment has been minimized.

Copy link
@roishillo

roishillo Jul 30, 2018

We are using the registration to determine the student's sessions.

"context": {
"registration": "openedx.org/enrollments/enrollment-v1:<anonymized-enrollment-id>",
"contextActivities": {
“parent”: [{

This comment has been minimized.

Copy link
@roishillo

roishillo Jul 30, 2018

One of the challenges with the parent is the array. analyze array is compute intensive. We keep the most important data in the extension. we want to keep in the parent the actual parent of the object (the unit or the sequence) but it will be hard to extract that later.

This comment has been minimized.

Copy link
@nasthagiri

nasthagiri Nov 21, 2018

Author Member

So in which field do you send the "course" information for the event? Did you take a look at the specific event types in the spreadsheet? Currently, we proposed sending only 1 value in the parent - rather than a list. But the parent is typically either the course or the parent module.

for specific Activities, Verbs, Contexts, etc used by Open edX need to be contractually
maintained.

Router

This comment has been minimized.

Copy link
@roishillo

roishillo Jul 30, 2018

What queue are you going to use?
maybe the first consumer of the queue is the LRS.

This comment has been minimized.

Copy link
@nasthagiri

nasthagiri Nov 21, 2018

Author Member

Please see the updated version of the spec - as this has changed thinking about scalability concerns.

@bryanlandia

This comment has been minimized.

Copy link

commented Oct 25, 2018

Hey, FYI, a bit off-topic in terms of a PR review, but this seems to be where most xAPI discussion is happening, and I think this might be helpful to look at another style of implementation....

A customer of ours needed an xAPI solution ASAP so I started with ADLNet's archived repo for converting tracking log events to xAPI statements. It's a separate python process that monitors the tail of the tracking log and queuing/publishing xAPI statements. We have it running successfully in a test/staging environment with LearningLocker as an LRS, covering course enrollment/unenrollment, certificate-based course completion, problem answers/attempts, link/section/tab navigation, and most video events. Our next phase of work will be toward making it production-ready.

Our fork is here https://github.com/appsembler/edx-xapi-bridge. I've used the Open edX Google Spreadsheet for the choice of verbs, activities, results, etc. as much as possible and made use of the TinCanPython library for making Statement classes, LRS client, etc.. There's also an Ansible role, currently on a Ficus branch, for installing it.

@nasthagiri

This comment has been minimized.

Copy link
Member Author

commented Nov 9, 2018

Hey everyone, I'm going to spend some time focusing on this OEP again. There is interest in moving this initiative forward from many different angles and by multiple development groups in the open edX community. My plan is to:

  • Generalize the OEP so it is not specific to just xAPI - but still include the specifics of xAPI as an example (perhaps moving xAPI details to a sub-document).
  • Respond to and update the OEP based on later comments from @roishillo, @sampaccoud, and @bryanlandia that I missed earlier.
  • Send the OEP out for review.

@nasthagiri nasthagiri force-pushed the arch/xapi-oep branch from 43d8d4f to d904ba8 Nov 21, 2018

@nasthagiri

This comment has been minimized.

Copy link
Member Author

commented Nov 21, 2018

@bryanlandia Thanks for the info. It's good to know about the progress on that effort. For this particular OEP, since our focus is on a real-time eventing API, we chose not to integrate as a downstream client of tracking log persistence. It would be great to hear your feedback on the mapping of Open edX events to xAPI events - perhaps we can share notes and even consolidate on a common solution as a community.

@nasthagiri

This comment has been minimized.

Copy link
Member Author

commented Nov 21, 2018

All, I have finally pushed an update to this OEP that generalizes this feature, adds use cases, and responds to outstanding comments.

I am looking for someone from the Open edX community who is willing to help as an Arbiter on this OEP. Let me know (via Slack) if you would be interested in taking on this role. Thanks.

@nasthagiri nasthagiri changed the title OEP-0026: xAPI Integration OEP-0026: Real-time Events Nov 21, 2018

@nasthagiri nasthagiri force-pushed the arch/xapi-oep branch from ea31b72 to e657474 Nov 21, 2018

@nasthagiri nasthagiri force-pushed the arch/xapi-oep branch 2 times, most recently from b7f9dbb to 79f27e3 Nov 29, 2018

@nasthagiri

This comment has been minimized.

Copy link
Member Author

commented Dec 12, 2018

Here are the notes (thanks @robrap) from an internal review of this PR.

Action Items:

  1. Update the eventing subsystem diagram and separate filter (with access checks) from the router component.
  2. Check-in with @mulby on event-tracking’s pluggability model and see whether a new processor makes better sense than a new backend.
  3. See whether Canvas has an xAPI schema defined.

Notes:

  • Reviewed use cases.
  • xAPI details to be reviewed by the community.
  • Is this used for data-synchronization internally?
    • No - meant for integrations with third parties.
    • Could be a stand-in solution, but meant to be out-of-scope.
    • OEP is ok as-is, but Steve noted that Data Engineering may look at batching vs streaming.
  • Is there a plan to naildown the technical details to deal with events at scale?
    • Delivery guarantees? Identifier questions.
      • Hoping that Translator/Validator are separated out enough that as long as there are no bottlenecks, this can keep the system scalable.
      • Looking at Kinesis and Kafka for data synchronization. Hopefully we can make the routing more interchangeable. Makes sense to not have this as part of the OEP.
        • What parts of these decisions should be shareable across other similar types of problems?
        • Event-driven APIs should be something someone could experiment. For example, 2-microfrontends with back-ends communicating. Might have public vs private services across services. Maybe 3rd-party Public apis.
        • We’ve got very few success stories for asynchronous communication.
    • “Tracking backend” is like the Router, and “Processors” like the Translator/Validator, you might have different Processors.
      • Very similar to our current eventing. Take event, fork and transform it. Segment is called the “Tracking backend”, but is the router.
    • Discussions of different technologies:
      • Maybe celery is good enough for now for data sync.
      • Kafka provides data-replay, not just speed.
  • Routing vs processing:
    • Can I translate my events afterwards?
    • Where does filtering come in?
      • Admin/router - course restrictions, future restrictions.
      • Maybe access control layer should be separate from router. Filtering before routing. Third-parties could have their own anti-corruption layer and formatting.
  • Does there need to be replay capabilities of events?
    • good question for product.
    • replay events for training purposes for adaptive learning -> via an LRS as shown on the xAPI sub-page.
  • Requires a new ID for users.
    • Should this be persisted? Just hash the lms userid to not leak it.
    • What about integration of events from other sources. In the event itself, you’d have the source. Source + anonymous id would be unique.
  • A past edX employee was adamant that we shouldn’t use xAPI for event log for some reason. Concerns about choosing the verbs, etc. for xAPI. Does community have experience?
    • Got useful feedback from community about details of events that weren’t needed and could be dropped for a smaller payload.
    • Caliper is more opinionated around schema, which would be a good thing for us regarding integrations.
    • Look at Canvas xAPI format. Adaptive engines - what do they expect?
    • The validators are a good idea.
  • Maybe make it more explicit that we could introduce a new verb and not be blocked on recommending it gets added.
  • Future: may want to re-run events against a store like 40+ terabytes in S3.

@nasthagiri nasthagiri force-pushed the arch/xapi-oep branch from 79f27e3 to 7bdbd7d Jan 16, 2019

@nasthagiri

This comment has been minimized.

Copy link
Member Author

commented Jan 16, 2019

@mulby and @brianhw I finally got a chance to update the OEP based on our notes from several weeks ago. Can you both review the latest version? If all is good, I can merge this with the "Provisional" OEP status.

@mulby This commit in particular relates to your input and feedback.

Thanks!

@brianhw
Copy link
Member

left a comment

LGTM. 👍

@nasthagiri nasthagiri force-pushed the arch/xapi-oep branch from 2ae42de to 09d4f65 Jan 19, 2019

@nasthagiri nasthagiri changed the title OEP-0026: Real-time Events OEP-0026: Real-time Events (xAPI/Caliper) Jan 19, 2019

@nasthagiri

This comment has been minimized.

Copy link
Member Author

commented Jan 19, 2019

Thank you all for your input and review on this OEP. I am now merging it with "Provisional" status. We are looking forward to having the community help us with the implementation efforts.

@nasthagiri nasthagiri merged commit 59f545d into master Jan 19, 2019

@nasthagiri nasthagiri deleted the arch/xapi-oep branch Jan 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.