feat: add pyodide support for jupyter-lite for files opened via HTTP #868

ioanaif · 2023-03-31T08:30:54Z

No description provided.

agoose77 · 2023-03-31T08:43:50Z

src/uproot/source/http.py

+        try:
+            shell = get_ipython().__class__.__name__
+            # True if running in a jupyter lite notebook
+            # False if running in a jupyter noteboook ('ZMQInteractiveShell') or IPython ('TerminalInteractiveShell')
+            return shell == "Interpreter"
+        except NameError:
+            return False  # Python interpreter


I wonder whether we should use

Suggested change

try:

shell = get_ipython().__class__.__name__

# True if running in a jupyter lite notebook

# False if running in a jupyter noteboook ('ZMQInteractiveShell') or IPython ('TerminalInteractiveShell')

return shell == "Interpreter"

except NameError:

return False # Python interpreter

return sys.platform == 'emscripten'

This would not detect IPython (or notebooks), but I think we specifically care about the emscripten part more than IPython?

It would have the same result, the function would only return True if it is running in a jupyter lite environment.

Thank you for the suggestion, I swapped out the function with only this check as it will be less code.

for more information, see https://pre-commit.ci

jpivarski

This is a fairly minimal change, and looks like a clean switch based on sys.platform == "emscripten". (I don't know how general that is, though you've already talked about it with @agoose77.)

This only modifies the HTTPSource, and it does so internally, in a way that users can't inspect. However, the choice it's making is between a single background thread (CPython) and a non-thread (Pyodide), which are roughly equivalent, just a concession for Pyodide not supporting threads.

What about when the HTTP server doesn't support multi-part GET requests? In that case, HTTPSource falls back to MultithreadedHTTPSource (the _fallback attribute of HTTPSource). That launches $n$ workers to distribute the load of requesting a lot of byte ranges for TBaskets when an HTTP server can only respond to one contiguous byte range per request. Presumably, that would have to be serialized into one ResourceTrivialExecutor and become much slower, or maybe async could be used, but that's a more radical change to the MultithreadedSource, and should be a separate PR, if at all.

This happens automatically when an HTTP server doesn't support multi-part GET, but it can also be forced manually by setting the http_handler in uproot.open.

We're also not addressing local files, but I saw that you tried and had some troubles with this. As I understand from @agoose77, that part of Pyodide is in flux and maybe we should wait for it to settle down. I highly doubt that memory-mapping will work (the default), and the alternative is MultithreadedFileSource. The MultithreadedHTTPSource and MultithreadedFileSource (and MultithreadedXRootDSource) share a lot of code through a superclass, so converting it to use async or writing an alternative that uses async could happen in one place.

An async alternative to the multithreaded sources would be welcome in CPython as well. What the multithreaded sources are doing is essentially building async functions by hand (so it would work in Python 2). @nsmith- built an Uproot subset based entirely on Python 3 async, and if I remember right, it worked better in high-latency network environments. Oh, I just noticed that the Python asyncio package is not available for Pyodide. That's odd—that's how I would have expected a JavaScript-like environment to work. (JavaScript is very asyncy!)

None of these things are holding back this PR. I'm just pointing out that if people are using this on Jupyter-Lite, there's a chance we'll see these other issues eventually.

feat: add pyodide support for jupyter-lite

0fef4e3

ioanaif linked an issue Mar 31, 2023 that may be closed by this pull request

Pyodide support for running in jupyter-lite browser-based notebooks #854

Closed

agoose77 reviewed Mar 31, 2023

View reviewed changes

ioanaif and others added 3 commits March 31, 2023 10:57

Use sys.platform to check if the code is run in a jupyter lite env.

07da4c8

[pre-commit.ci] auto fixes from pre-commit.com hooks

498e11d

for more information, see https://pre-commit.ci

Add comment to explain the check for platform=emscripten.

a7f3e5d

ioanaif changed the title ~~feat: add pyodide support for jupyter-lite~~ feat: add pyodide support for jupyter-lite for files opened via HTTP Mar 31, 2023

ioanaif requested a review from jpivarski March 31, 2023 10:02

jpivarski approved these changes Mar 31, 2023

View reviewed changes

ioanaif merged commit 04e7e2b into main Apr 3, 2023

ioanaif deleted the ioanaif/add-support-for-jupyter-lite-854 branch April 3, 2023 07:29

ioanaif mentioned this pull request Jul 27, 2023

Pyodide support for running in jupyter-lite browser-based notebooks #854

Closed

agoose77 mentioned this pull request Sep 13, 2023

fix: add minimimal emscripten support via non-HTTP sources #956

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add pyodide support for jupyter-lite for files opened via HTTP #868

feat: add pyodide support for jupyter-lite for files opened via HTTP #868

ioanaif commented Mar 31, 2023

agoose77 Mar 31, 2023

ioanaif Mar 31, 2023

ioanaif Mar 31, 2023

jpivarski left a comment

feat: add pyodide support for jupyter-lite for files opened via HTTP #868

feat: add pyodide support for jupyter-lite for files opened via HTTP #868

Conversation

ioanaif commented Mar 31, 2023

agoose77 Mar 31, 2023

Choose a reason for hiding this comment

ioanaif Mar 31, 2023

Choose a reason for hiding this comment

ioanaif Mar 31, 2023

Choose a reason for hiding this comment

jpivarski left a comment

Choose a reason for hiding this comment