Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tiled data sources to browse Bluesky runs #4106

Open
padraic-shafer opened this issue Apr 10, 2024 · 8 comments
Open

Add Tiled data sources to browse Bluesky runs #4106

padraic-shafer opened this issue Apr 10, 2024 · 8 comments

Comments

@padraic-shafer
Copy link

There has been a recent burst of interest and activity in integrating bluesky/tiled data sources into silx (and by extension into pyMCA). I'm summarizing some of those discussions here to get feedback from silx developers.

Background

Several light sources are looking into using pyMCA as a browser of data collected during Bluesky runs. From discussions with @linupi @vasole @t20100 it was suggested that modifying silx to accept a Tiled data catalog would be an elegant way to do this for pyMCA, silx-view, and any other apps depending on silx.

@t20100 has started a proof-of-concept branch that shows a pathway for adapting a Tiled Container to a HDF5-like interface.


Preliminary scope (to be refined)

Discussion on 2024-03-26 @whs92 @danielballan @AbbyGi @vshekar @padraic-shafer [...missing handles for more BESSY-II participants]

During a chat between several developers at NSLS-II and BESSY-II, we recognized a common interest in using pyMCA as a "bluesky-supported" visual explorer of Tiled datasets for beamline experimenters. We identified several preliminary goals for a development sprint.

  1. Connect to a tiled server over HTTP -- Accept a URL; handle Auth
  2. Browse contents, with ability to filter and sort
    • Should identify bluesky runs
    • Will likely need a per-endstation configuration of metadata "projections" (flattened subset of important metadata)
  3. View baseline data for selected run(s)
  4. Plot scan data using existing plot tools
    • Use hinted data by default
    • User can assign "any" channel to a plot axis
  5. "Live plot" of data being captured
    • More than one bluesky run may be active at once (nested scans)
    • Initially target a poling loop ~1 second
    • Leave a path open to tiled-stream / websocket
    • Must be able to resume viewing a scan-in-progress if client restarts

Refined goals

Discussion on 2024-04-09 @whs92 @danielballan @AbbyGi @vshekar @padraic-shafer

  1. Use an isolated "Open" dialog or similar entrypoint that can cope with paginated access to large Catalogs, generating a smaller dataset (a tiled client with filters applied) that can be passed down to the rest of the silx/PyMca stack. This can also hand down authentication state.
  2. Fit Tiled nodes into HDF5 abstraction up to some limit (~1000). Tabular data from Tiled is just a Group of 1-dimensional arrays.
  3. Focus on 'primary' and 'baseline' streams to start, with an eye on "tab per stream" and whether that fits.
  4. Have a switch for polling live data. (This can later be refactored to use websockets, once Tiled supports that.)
  5. Ensure HTTP I/O does not lock up or crash the app.
@padraic-shafer
Copy link
Author

Refined goals

Discussion on 2024-04-09 @whs92 @danielballan @AbbyGi @vshekar @padraic-shafer

  1. Use an isolated "Open" dialog or similar entrypoint that can cope with paginated access to large Catalogs, generating a smaller dataset (a tiled client with filters applied) that can be passed down to the rest of the silx/PyMca stack. This can also hand down authentication state.
  2. Fit Tiled nodes into HDF5 abstraction up to some limit (~1000). Tabular data from Tiled is just a Group of 1-dimensional arrays.
  3. Focus on 'primary' and 'baseline' streams to start, with an eye on "tab per stream" and whether that fits.
  4. Have a switch for polling live data. (This can later be refactored to use websockets, once Tiled supports that.)
  5. Ensure HTTP I/O does not lock up or crash the app.

@vasole @t20100 @linupi
Because we weren't able to find a suitable time yet for all of us to meet live--and it sounds like it might be a couple weeks until that's possible--what do you think about this approach? Do you foresee particular difficulties or incompatibilities in fitting this into the architecture of silx?

@t20100
Copy link
Member

t20100 commented Apr 11, 2024

Hi,

Thanks for the summary!

For the silx part, it makes sense to me and the proof-of-concept was very simple to implement. However, I still have a shallow understanding of tiled.
For now my main concern would be point 5 "Ensure HTTP I/O does not lock up or crash the app." since the hdf5-like API and silx view are built around synchronous access to the data, and I'm not convinced this is easy to change.

@t20100
Copy link
Member

t20100 commented Apr 11, 2024

BTW, you might want to have a look at h5web, a web-based HDF5 data viewer my colleagues @axelboc and @loichuder developed and maintain. It is available as a JupyterLab extension, a VSCode extension and powers HDF5 online viewing of the ESRF "data portal" and the https://myhdf5.hdfgroup.org/ online viewer (thanks to h5wasm).
This again aims at supporting HDF5 files but the access to the data is abstracted through Providers (for now there's 3 for the HDFGroup's HSDS server, h5wasm and our h5grove a small server tailored for h5web), so there may be a way to adapt it to tiled.
As opposed to silx view, it's natively asynchronous.

@danielballan
Copy link

Thanks @t20100. I agree that the blocking I/O sounds like the hard part. We may have to live with a synchronous I/O for now and just make sure that timeouts return control to the user in the event of connection issues.

Adding an h5web Provider for Tiled is also interesting. This has been on our radar since we opened an Issue in Tiled in September 2021. It might be about time to do it. One perhaps unique capability this could add is the ability to view specfiles, TIFFs, and other formats, which Tiled can serve through a unified HDF5-ish abstraction.

I think PyMca is serving a particular cluster of requirements though, so we would pursue this in addition to PyMca integration.

@t20100
Copy link
Member

t20100 commented Apr 12, 2024

We may have to live with a synchronous I/O for now and just make sure that timeouts return control to the user in the event of connection issues.

Sounds good to me.

@t20100
Copy link
Member

t20100 commented May 3, 2024

I just made some update to the silx branch with basic tiled support, and opened PR #4121.

Compared to the previous poc version:

  • The tiled: prefix no longer works (changed to tiled- to avoid URL parsing issues): This is not compatible with current support in pymca (Recover spech5 functionality vasole/pymca#1074)
  • But, this prefix is no longer needed and should be removed IMO
  • There is a way to limit the number of retrieved entries per container.

Feedbacks welcomed!

@vasole
Copy link
Member

vasole commented May 4, 2024

Just to comment that if the prefix is removed, it would simplify things at the PyMca side too because I had already foreseen to handle URLs exclusively via the silx abstraction.

@t20100
Copy link
Member

t20100 commented May 13, 2024

tiled- prefix removed.
Also reworked the TiledDataset to inherit directly from commonh5.Dataset and added a tile Cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants