Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pyodide support for jupyter-lite for files opened via HTTP #868

Merged
merged 4 commits into from
Apr 3, 2023

Conversation

ioanaif
Copy link
Collaborator

@ioanaif ioanaif commented Mar 31, 2023

No description provided.

@ioanaif ioanaif linked an issue Mar 31, 2023 that may be closed by this pull request
Comment on lines 567 to 573
try:
shell = get_ipython().__class__.__name__
# True if running in a jupyter lite notebook
# False if running in a jupyter noteboook ('ZMQInteractiveShell') or IPython ('TerminalInteractiveShell')
return shell == "Interpreter"
except NameError:
return False # Python interpreter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should use

Suggested change
try:
shell = get_ipython().__class__.__name__
# True if running in a jupyter lite notebook
# False if running in a jupyter noteboook ('ZMQInteractiveShell') or IPython ('TerminalInteractiveShell')
return shell == "Interpreter"
except NameError:
return False # Python interpreter
return sys.platform == 'emscripten'

This would not detect IPython (or notebooks), but I think we specifically care about the emscripten part more than IPython?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have the same result, the function would only return True if it is running in a jupyter lite environment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion, I swapped out the function with only this check as it will be less code.

@ioanaif ioanaif changed the title feat: add pyodide support for jupyter-lite feat: add pyodide support for jupyter-lite for files opened via HTTP Mar 31, 2023
@ioanaif ioanaif requested a review from jpivarski March 31, 2023 10:02
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly minimal change, and looks like a clean switch based on sys.platform == "emscripten". (I don't know how general that is, though you've already talked about it with @agoose77.)

This only modifies the HTTPSource, and it does so internally, in a way that users can't inspect. However, the choice it's making is between a single background thread (CPython) and a non-thread (Pyodide), which are roughly equivalent, just a concession for Pyodide not supporting threads.

What about when the HTTP server doesn't support multi-part GET requests? In that case, HTTPSource falls back to MultithreadedHTTPSource (the _fallback attribute of HTTPSource). That launches $n$ workers to distribute the load of requesting a lot of byte ranges for TBaskets when an HTTP server can only respond to one contiguous byte range per request. Presumably, that would have to be serialized into one ResourceTrivialExecutor and become much slower, or maybe async could be used, but that's a more radical change to the MultithreadedSource, and should be a separate PR, if at all.

This happens automatically when an HTTP server doesn't support multi-part GET, but it can also be forced manually by setting the http_handler in uproot.open.

We're also not addressing local files, but I saw that you tried and had some troubles with this. As I understand from @agoose77, that part of Pyodide is in flux and maybe we should wait for it to settle down. I highly doubt that memory-mapping will work (the default), and the alternative is MultithreadedFileSource. The MultithreadedHTTPSource and MultithreadedFileSource (and MultithreadedXRootDSource) share a lot of code through a superclass, so converting it to use async or writing an alternative that uses async could happen in one place.

An async alternative to the multithreaded sources would be welcome in CPython as well. What the multithreaded sources are doing is essentially building async functions by hand (so it would work in Python 2). @nsmith- built an Uproot subset based entirely on Python 3 async, and if I remember right, it worked better in high-latency network environments. Oh, I just noticed that the Python asyncio package is not available for Pyodide. That's odd—that's how I would have expected a JavaScript-like environment to work. (JavaScript is very asyncy!)

None of these things are holding back this PR. I'm just pointing out that if people are using this on Jupyter-Lite, there's a chance we'll see these other issues eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pyodide support for running in jupyter-lite browser-based notebooks
3 participants