-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OEP-0026: Real-time Events (xAPI/Caliper) #73
Conversation
fc1fab0
to
ff21a32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping us bootstrapping the design of this project. I've initiated two threads about the Open edX User UUID and the router layer.
|
||
* User restriction - certain consumers can access all events for certain users. | ||
* Site restriction - certain consumers are limited to accessing events of certain sites. | ||
* Activity type restriction - certain consumers can access only certain types of events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it must be a layer of the app? IMO it must be a separated component acting as a proxy to forward the right event to the right consumer with appropriate permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, when I use the term "layer" here, I simply mean a separable logical part of the plugin. I do not mean to imply a specific way of how it is implemented.
Would it be clearer if I use the term "component" instead of "layer"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO yes. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. thanks.
Thanks for having add the list of events @nasthagiri 🎉 |
Sorry I'm late to the party, but what were the performance concerns about having a new table with a UUID in it? |
I might be wrong, but don't you think that this relationship could increase requests performance over the |
@jmaupetit: I don't think it will really make that much of a difference. But since we'd expect the user:UUID association to never change, if it ever does become an issue, we can throw caching at it pretty easily. |
Adding a new column on a huge table should not require down time if it is done in several steps:
|
I haven't been a participant in a DB-related outage RCA in a while, but my understanding is that we've suffered long locks on table migrations, even for adding an optional field, on the order of 1 min per million rows. @jibsheet or @feanil might have more recent edX operational perspective to offer. (Updated comment because I couldn't remember what version of MySQL we were running at any given time, and the devops folks would know better than my fuzzy recollection.) |
Migrations have been one of the leading cause of operational issues on edx.org. There are definitely ways to make MySQL add a column without downtime, but I think it's easier to just make a separate table and join it to |
Hi, so sorry again for joining you in the middle of the discussion. I just understood from Roi today the great progress that was done here with the Spec. I'll review the whole spec later on but I want to join to @ormsbee concern about the addition of a column vs new table. From my previous experience addition of new column to existing table might by tricky and might need a more complex upgrade process |
@AnnaCampus Thank you for chiming in. I agree with you. We recently ran into unforeseen migration issues when we tried to replace an out-of-box OAuth Application class to our own custom class. Apparently, others on StackOverflow had faced similar issues. We eventually backed out of that approach and embraced an alternative data model change. The StackOverflow articles referred to similar issues when replacing Django's user model with a custom one (in a live production environment). While I also agree with @sampaccoud and @jmaupetit on the merit of updating the user table, I believe there is a lot else to be done with this initiative. For example, we need to enhance the platform to expose good URNs for users, courses, and enrollments. We may also need to update existing openedX events to include additional data that we're not already including. Given this, the current proposal of sending a hash of the LMS user-id should be sufficient for our immediate needs. (Apparently, hash-of-user-id is also the value that we send with our edX data packages. @brianhw can correct me.) So we can look into customizing the user table at a later point. For now, let's focus our efforts on the very many other enhancements to our platform as required by this feature. |
The concern with adding a new table is that we would make a product design decision based on operational issues. It may make sense for edx.org (and a few others like us maybe), but it's hard to justify for the mass of all the other users of Open edX. If not impacting performances, it complicates the design, the code and will generate more technical debt. So +1 for @nasthagiri's pragmatic approach. |
I guess as long as we're sure we don't need to do reverse-lookups to map the anonymous ID back to a real user, the hash works for me too. Even more so if we specifically do not want the ability to do a reverse lookup, for privacy reasons. @sampaccoud: FWIW, even from a code perspective, I would favor a separate model over tacking on an extra field to the user model. Anonymous user mapping, whether more generally or for xAPI in particular, is distinct enough where I think it belongs in its own space. User models can become a catch-all space for all kinds of random things, and I especially wouldn't want to tack it onto the standard Django contrib auth_user model, given it's owned by an entirely separate space. Granted, we could make a custom user model, point it to the same table via config, and do a little fudging to make that transition work smoothly -- but I think all that combines to more technical debt and confusion than just making a separate model and joining it. It's not as compact, but it's pretty simple. |
@nasthagiri one last remark: in the perspective of a sha256 hash of the user ID "long-term temporary" solution (:trollface:), can we explicitly mention that this hash will be at least salted? and generated with |
+ `edx.problem.hint.demandhint_displayed <http://edx.readthedocs.io/projects/devdata/en/latest/internal_data_formats/tracking_logs/student_event_types.html#edx-problem-hint-demandhint-displayed>`_. | ||
Whenever a learner requests a hint to a problem. | ||
|
||
- **Video events** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few video-related events missing from the list. The ones I can think of right now are changing the transcript language, changing the video speed, changing the volume, opening/closing the captions or the transcript, downloading the videos, and downloading the transcripts. We might not need all of those for this iteration, but I think getting the language could be important.
Very nice proposal! What about grade-producing XBlocks other than capa? For example, Drag and Drop, or even ORA2. Such XBlocks don't emit any of the "Problem interaction events" listed herein, but they do emit standard grade and/or completion events and could be considered of importance for adaptive learning. Should the translator provide an xAPI event for generic XBlock grade events? |
@bradenmacdonald I feel like there needs to be some standard equivalent of the problem_check event and the show_answer event for every grade-returning xblock. Otherwise, new graded xblocks will end up needing to have their own unique events incorporated one-by-one into a translator or into every adaptive engine. If there's no equivalent to the hint event or the submit event that seems like less of a big deal to me, but I also don't 100% understand why the submit and check events are separate. |
@bradenmacdonald @Colin-Fredericks The The Note that Caliper includes different profiles, including a Grading Profile and Assessment Profile, that may be better suitable for our events - but we will probably need to support both xAPI and Caliper in the long-run. I agree that completion events will be super useful to adaptive engines. I would very much like to add them. I did not see them listed, however, in our docs. We should create these events if we don't already have them. I had listed |
Thanks for the details. Having problem-specific data would be incredibly useful. One of the glaring gaps in our (HarvardX's) current adaptive approach is that we can't tell what answer students actually gave to a problem, which really limits us. For instance, a particular wrong answer might indicate that a learner is having a particular issue. If we had information about which answer was given, we could target that issue specifically. Without knowing why a learner got something wrong, we're more in the dark and can't adapt as well. As far as problem completion events, I just want to caution people that problems may end up firing multiple completion events, so don't treat a completion event as "we can stop collecting data from this problem now." |
Yes, agree that knowing the learner's answer is very important. In the spreadsheet, I included a For other non-CAPA problems, we may need to enhance the |
Example | ||
^^^^^^^ | ||
|
||
Here is an example of an **Actor** JSON value that we would generate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there is a guest user (we have some open activities that can be reached by guess) we are sending "null" as the user id. let me know if it works for you - or if you want to have another solution,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. We don't have that situation ourselves today - although we do plan to support "anonymous access to courseware". Though, wouldn't you want to distinguish events from one anonymous user from another anonymous user?
:: | ||
|
||
"context": { | ||
"registration": "openedx.org/enrollments/enrollment-v1:<anonymized-enrollment-id>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using the registration to determine the student's sessions.
"context": { | ||
"registration": "openedx.org/enrollments/enrollment-v1:<anonymized-enrollment-id>", | ||
"contextActivities": { | ||
“parent”: [{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the challenges with the parent is the array. analyze array is compute intensive. We keep the most important data in the extension. we want to keep in the parent the actual parent of the object (the unit or the sequence) but it will be hard to extract that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in which field do you send the "course" information for the event? Did you take a look at the specific event types in the spreadsheet? Currently, we proposed sending only 1 value in the parent - rather than a list. But the parent is typically either the course or the parent module.
for specific Activities, Verbs, Contexts, etc used by Open edX need to be contractually | ||
maintained. | ||
|
||
Router |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What queue are you going to use?
maybe the first consumer of the queue is the LRS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the updated version of the spec - as this has changed thinking about scalability concerns.
Hey, FYI, a bit off-topic in terms of a PR review, but this seems to be where most xAPI discussion is happening, and I think this might be helpful to look at another style of implementation.... A customer of ours needed an xAPI solution ASAP so I started with ADLNet's archived repo for converting tracking log events to xAPI statements. It's a separate python process that monitors the Our fork is here https://github.com/appsembler/edx-xapi-bridge. I've used the Open edX Google Spreadsheet for the choice of verbs, activities, results, etc. as much as possible and made use of the TinCanPython library for making Statement classes, LRS client, etc.. There's also an Ansible role, currently on a Ficus branch, for installing it. |
Hey everyone, I'm going to spend some time focusing on this OEP again. There is interest in moving this initiative forward from many different angles and by multiple development groups in the open edX community. My plan is to:
|
43d8d4f
to
d904ba8
Compare
@bryanlandia Thanks for the info. It's good to know about the progress on that effort. For this particular OEP, since our focus is on a real-time eventing API, we chose not to integrate as a downstream client of tracking log persistence. It would be great to hear your feedback on the mapping of Open edX events to xAPI events - perhaps we can share notes and even consolidate on a common solution as a community. |
All, I have finally pushed an update to this OEP that generalizes this feature, adds use cases, and responds to outstanding comments. I am looking for someone from the Open edX community who is willing to help as an Arbiter on this OEP. Let me know (via Slack) if you would be interested in taking on this role. Thanks. |
ea31b72
to
e657474
Compare
b7f9dbb
to
79f27e3
Compare
Here are the notes (thanks @robrap) from an internal review of this PR. Action Items:
Notes:
|
79f27e3
to
7bdbd7d
Compare
@mulby and @brianhw I finally got a chance to update the OEP based on our notes from several weeks ago. Can you both review the latest version? If all is good, I can merge this with the "Provisional" OEP status. @mulby This commit in particular relates to your input and feedback. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
2ae42de
to
09d4f65
Compare
Thank you all for your input and review on this OEP. I am now merging it with "Provisional" status. We are looking forward to having the community help us with the implementation efforts. |
Open edX Proposal for Real-time Events, primarily for Adaptive Learning capabilities.
Review Period
November 29, 2018 -> December 20, 2018
Arbiter
@brianhw
Motivation
To satisfy emerging use cases that require notifying external systems of LMS events in real-time in standardized formats, supporting real-time events is a natural evolution of Open edX's eventing and API capabilities and its impact on connecting users, organizations, and learning services.