Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a JupyterHub "exchange" service to replace the exchange directory #659

Open
jhamrick opened this issue Jan 15, 2017 · 26 comments
Open

Comments

@jhamrick
Copy link
Member

I would eventually like to replace the nbgrader exchange directory with a more robust solution, namely, a JupyterHub "exchange" service that manages released assignments, submissions, and feedback. This has the drawback that people won't be able to use nbgrader's file management capabilities unless they are using JupyterHub. In practice, I don't think anyone is using nbgrader's file management capabilities without JupyterHub anyway, though (but if I am wrong about this, someone should correct me!). Here's how I will imagine this working.

@lgpage @minrk @ellisonbg @willingc @dsblank I would appreciate any feedback you have on this proposal!

Permissions

All authentication will be handled by JupyterHub, which will tell the service who the current user is and what group(s) they are a part of. The exchange service will handle any number of courses on the same machine, and for each course, will require that there are two groups specified: one for instructors (who are allowed to release, fetch, submit, and collect assignments and return feedback) and one for students (who are allowed to fetch and submit assignments and download feedback). This would be configured something like this:

c.ExchangeApp.groups = {
    'course1': dict(instructors='instructors_course1', students='students_course1'),
    'course2': dict(instructors='instructors_course2', students='students_course2'),
    ...
}

API

The exchange service will define a REST api that the nbgrader commands (release, fetch, submit, collect, etc.) can access.

/api/assignments

  • GET /api/assignments/<course_id> -- list all assignments for a course (students+instructors)

/api/assignment

  • GET /api/assignment/<course_id>/<assignment_id> -- download a copy of an assignment (students+instructors)
  • POST /api/assignment/<course_id>/<assignment_id> -- release an assignment (instructors only)

/api/submissions

  • GET /api/submissions/<course_id>/<assignment_id> -- list all submissions for an assignment from all students (instructors only)
  • GET /api/submissions/<course_id>/<assignment_id>/<student_id> -- list all submissions for an assignment from a particular student (instructors+students, though students are restricted to only viewing their own submissions)

/api/submission

  • POST /api/submission/<course_id>/<assignment_id>/<student_id> -- submit a copy of an assignment (students+instructors)
  • GET /api/submission/<course_id>/<assignment_id>/<student_id> -- download a student's submitted assignment (instructors only)

/api/feedback

  • POST /api/feedback/<course_id>/<assignment_id>/<student_id> -- upload feedback on a student's assignment (instructors only)
  • GET /api/feedback/<course_id>/<assignment_id>/<student_id> -- download feedback on a student's assignment (instructors+students, though students are restricted to only viewing their own feedback)

Exchange implementation

Under the hood, the exchange service will continue to store files directly on the filesystem, but they will all have the same permissions (read and write only for the user running the exchange service). I think this is a better option that doing it with a database because we don't really need any fancy relational features here and this also makes it easier for instructors to inspect files in the exchange manually. If someone feels strongly that a database should be used then I might be able to be convinced otherwise, though.

Regardless, I do want to implement some form of checksumming, though, because I have noticed at least in the current implementation that sometimes if the system is under heavy load that the submissions are occasionally incomplete or corrupted (e.g. missing timestamp.txt or something).

Existing nbgrader apps

The existing nbgrader apps will be reworked to make requests to the exchange API rather than copying to and from the exchange directory.

One thing I am not quite sure of is how the command line apps get properly authenticated, because the authentication is normally happening in the browser, not the command line. I see two possible solutions:

  • One solution to this is to say that these commands can only be used through the server extension, and then have that extension pass the authentication information to the command line apps. This is probably the easiest but then it means you can't just run the commands from the command line anymore.
  • The other solution is to require some how that users re-authenticate from the command line. I am not really sure how to this in a general way that handles all the forms of authentication that JupyterHub uses. Maybe @minrk can weigh in on the feasibility of this, but from what I know about how this works it doesn't seem like a particularly feasible option to me?
@ellisonbg
Copy link
Contributor

ellisonbg commented Jan 15, 2017 via email

@jhamrick
Copy link
Member Author

No, I don't think so...

@jhamrick
Copy link
Member Author

Here is a link to it: https://github.com/jupyterhub/hubshare/blob/master/specification.md

That will definitely be really nice, and will make some of this stuff unnecessary for sure. Do you know what the timeline is for that?

@ellisonbg
Copy link
Contributor

ellisonbg commented Jan 15, 2017 via email

@minrk
Copy link
Member

minrk commented Feb 6, 2017

One thing I am not quite sure of is how the command line apps get properly authenticated

I'd use authentication tokens for this. HubAuth did recently get support for API tokens in the Authorization header, not just cookies. You can store these in a file once generated. I think we do need to have a page for requesting a new token in the Hub UI to complete the loop, though. It would look like:

  1. request an API token (this is fiddly right now, but I'll add a page for it)
  2. save that somewhere like ~/.nbgrader/token
  3. CLI apps look for this file, use it in Authorization header. If not present, point to Hub page where they can get one.

Spawners could request and install this token at launch, to make it easy to do it from the single-user-server terminal.

@ellisonbg
Copy link
Contributor

ellisonbg commented Feb 7, 2017 via email

@minrk
Copy link
Member

minrk commented Feb 8, 2017

Page for a token: jupyterhub/jupyterhub#971

@bbhopesh
Copy link

bbhopesh commented Oct 27, 2017

Is someone already working on implementing this? I'd like to contribute if another person is needed for this task or some part of it.

@jhamrick
Copy link
Member Author

@minrk would be the person to ask to see what the current status of HubShare is. All the development for that will happen at https://github.com/jupyterhub/hubshare so that is a good repo to subscribe to if you're interested in contributing to it!

@perllaghu
Copy link
Contributor

I'm interested in this, however have two complications to throw into the mix:

  1. In our environment, the jupyterhub service spawns notebooks onto different VMs in a swarm. This means that persistent storage is done via NFS (the Notebooks tree view does not use the ContentsManager plugin), so each notebook runs as the same user (and we use Docker labels to distinguish things for accounting)

  2. We connect via LTI - so course ID & role details are not stored in the hub anywhere... so having a configuration dictionary is..... problematic.

(but I'm going to subscribe to hubshare too)

@jhamrick
Copy link
Member Author

@perllaghu The idea with the HubShare service will be to alleviate the issue with your first point. The main idea is that HubShare will have control over some sort of file store for the exchange, with permissions determined based on JupyterHub users rather than local process users. This means you could definitely launch notebooks as all the same user (as long as the JupyterHub users are different) and HubShare would appropriately manage access to the exchange.

Is there a way you can programmatically get the course id and role through LTI? e.g. if you can get the course id through an environment variable, then I think that shouldn't end up being a problem.

@perllaghu
Copy link
Contributor

Absolutely - this is the whole point of LTI: it does the authentication/authorisation part (basically OAuth), and gives you course, user, and role - so you know if the user is an instructor (gets FormGrader) or a student (gets Assignment)

@perllaghu
Copy link
Contributor

Just to let people know - We've started a [currently private] version of this.....

Not using HubShare, as that's too generic, and doesn't have the authentication/authorisation stuff in there.

I hope to be able to persuade people to make it generic enough to work for our [kubenetes behind a proxy server] environment as well as more generic setups.

@nthiery
Copy link
Contributor

nthiery commented Nov 12, 2019

Hi @perllaghu

Just to let people know - We've started a [currently private] version of this.....
Not using HubShare, as that's too generic, and doesn't have the authentication/authorisation stuff in there.

I hope to be able to persuade people to make it generic enough to work for our [kubenetes behind a proxy server] environment as well as more generic setups.

Has there been progress on this front? We would be interested for multiple courses next spring where it gets annoying to have to have to tweak the JupyterHub configuration for each new course.

Thanks in advance!

@perllaghu
Copy link
Contributor

Yes there is..... I'm in the wrong place to give you the Pull Request numbet for this - but it's hopefully not far.

@BertR
Copy link
Contributor

BertR commented Nov 12, 2019

Hi @nthiery , this is the pull request: #1238
@lzach has been working on the documentation on how to write an exchange plugin
and we're also planning to push our own implementation to a public GitHub repository.

@nthiery
Copy link
Contributor

nthiery commented Nov 13, 2019

Thanks for the quick feedback! I am interested in beta testing whenever this is out.

@nthiery
Copy link
Contributor

nthiery commented Feb 20, 2020

Hi @BertR,

and we're also planning to push our own implementation to a public GitHub repository.

I am looking forward to it! Has there been progress on this side?

@BertR
Copy link
Contributor

BertR commented Mar 6, 2020

Yes! Today @perllaghu pushed our exchange to https://github.com/edina/nbexchange
very rough around the edges, but we will clean it up and add some examples of how it can be used.

@nthiery
Copy link
Contributor

nthiery commented Mar 10, 2020

Ah ah! Will check this out today! Thank you.

@perllaghu
Copy link
Contributor

Be delighted with any critique/observations...

@nthiery
Copy link
Contributor

nthiery commented Mar 10, 2020 via email

@nthiery
Copy link
Contributor

nthiery commented Mar 10, 2020 via email

@perllaghu
Copy link
Contributor

There are two different things going on here:

  1. The user has a current course thing (defined by an LTI connection, jupyterhub config, or some other means)
  2. The exchange can be asked for details on a course, any course. The exchange needs to check that the user making the request has access to the course being asked about - which may not be the current course.... this allows the exchange to handle users subscribed to multiple courses

@perllaghu
Copy link
Contributor

Currently, the exchange service does not provide a mean to share nbgrader's grade database among several instructors, right?

Correct - the instructors nbgrader database (the one used by formgrader) is still the sqlite database in the instructors home directory.

I can definitely see a piece of work to move the formgrader database to a central database - which would immediately allow multiple instructors to manage a single course.... but there are a whole raft of things to work through for that:

  • Where to released and generated notebooks live?
  • Where's the [autho]grading done?
  • Is there a distinction between Instructor and TeachingAssistant?

@perllaghu
Copy link
Contributor

I believe we can close this.

In response to:

Currently, the exchange service does not provide a mean to share nbgrader's grade database among several instructors, right?

We have the following solution:

Each course gets it's own database in a central database server, and a directory on a central FileStore server.

When an instructor starts their notebook server in our system, the database URL is calculated & set for that course, and the directory in the central FileStore is mounted.
We also set c.CourseDirectory.root to a path specific for that course.

Thus all instructors have access to the same database, and the same course files: source, release, submitted, autograded, feedback.

.... Oh, and we found it useful to set c.CourseDirectory.directory_structure = '{nbgrader_step}/{assignment_id}/{student_id}' - but that's just us....

[How 10 markers manage the 200 submissions is not in the solution: they all see the same dataset]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants