Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Reading DIALS refl Files #78

Open
kmdalton opened this issue Aug 18, 2021 · 4 comments
Open

Add Support for Reading DIALS refl Files #78

kmdalton opened this issue Aug 18, 2021 · 4 comments
Assignees
Labels
API Interface Decisions MTZDtypes Issues related to custom dtypes wishlist New feature or request for the future

Comments

@kmdalton
Copy link
Member

As we discussed extensively on the DIALS Slack channel, it is now relatively easy to parse DIALS .refl files without cctbx/DIALS. Newer versions of DIALS encode reflection tables using msgpack which seems a relatively innocuous dependency to add.

To this end @ndevenish has built a parser that decodes refl tables using numpy. It's nearly complete but may be missing column types. We can find a full list of types in this block. It should be easy to build this into the rs.io submodule as I've done here for example.

There remains the issue of DIALS reflection tables potentially containing some fairly exotic objects (shoeboxes, vectors, matrices). The safest (sadly slowest) thing to do for a first pass is to just default them to objects. We can think about clever solutions later.

Parsing legacy pickle based reflection tables is an open question. For the time being, I think we just can't support them. @ndevenish suggests looking here for clues though.

@JBGreisman, let's chat about this early next week and get it up and running. I think this is already mostly there!

@kmdalton kmdalton added the enhancement Improvement to existing feature label Aug 18, 2021
@kmdalton
Copy link
Member Author

@PrinceWalnut , you may want to help here too.

@ndevenish
Copy link

I've updated it to handle the missing types, and a basic pytest test (it can be run refl_loader.py --write-test inside a cctbx environment to write it's test file). std::string is also supposed to be handled by the msgpack writer but is rather broken - dials/dials#1858 - so am pretty sure that's not "in the wild" anywhere.

@kmdalton
Copy link
Member Author

wacky. we'll certainly be skipping any string columns for now at least...

@JBGreisman
Copy link
Member

This looks like a pretty solid start. There are a few columns that can be stored in a DataFrame/DataSet, but cannot be written to an MTZ file. I think a decision we will have to make is whether those columns should be skipped, or whether they should be parsed and included in the DataSet.

It is always possible to add new dtypes if that would improve the behavior of anything (for strings, pandas already has us covered: StringDtype). It is also possible to add multidimensional numpy arrays as columns, but they seem to really end up more as "lists of arrays"

@JBGreisman JBGreisman added API Interface Decisions wishlist New feature or request for the future MTZDtypes Issues related to custom dtypes and removed enhancement Improvement to existing feature labels Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Interface Decisions MTZDtypes Issues related to custom dtypes wishlist New feature or request for the future
Projects
None yet
Development

No branches or pull requests

4 participants