Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uproot.open(filename, "w") and uproot.open(filename, mode="w") fail #758

Open
HDembinski opened this issue Oct 18, 2022 · 5 comments
Open
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged

Comments

@HDembinski
Copy link
Member

HDembinski commented Oct 18, 2022

uproot 4.3.7

When opening a root file for writing, the call to uproot.open(filename, mode="w") fails with this error message

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [15], line 2
      1 with uproot.open("output.root", mode="w") as f:
----> 2     f.mktree("tree", {
      3         "px": "var * float32",
      4     })
      5     f["tree"].show()

AttributeError: 'ReadOnlyDirectory' object has no attribute 'mktree'

It looks like mode="w" is ignored. This is not pythonic. By providing uproot.open you are suggesting that you mimic the interface of the open command, which is used in several places in the Python stdlib (builtins open, gzip.open, lzma.open, ...). But then you should also mimics the signature of the stdlib version as much as possible. Calling uproot.open with mode="w" should forward to uproot.recreate.

uproot.open(filename, "w") fails with a different error, because the next positional argument after path is not mode, but object_cache. To fix this, mode should be added as the second argument after path or uproot.open should reject positional arguments after path with the signature open(path, *, object_cache=...)

@HDembinski HDembinski added the bug (unverified) The problem described would be a bug, but needs to be triaged label Oct 18, 2022
@HDembinski
Copy link
Member Author

HDembinski commented Oct 18, 2022

I labeled this as a bug from the API design perspective. At the very least, if you do not accept the keyword mode, then uproot.open should raise an error for every unknown keyword that is passed as **options.

@Moelf
Copy link
Collaborator

Moelf commented Oct 18, 2022

You're looking for up.recreate?

@jpivarski
Copy link
Member

I labeled this as a bug from the API design perspective. At the very least, if you do not accept the keyword mode, then uproot.open should raise an error for every unknown keyword that is passed as **options.

This is true. Unhandled options ought to raise an error, and I think it was originally designed to do that (always popping that dict and dealing with len != 0 at the end). What happened here is unintended behavior.


On handling mode, we have to make choices between being Python-like and being ROOT-like: see also the handling of cycle numbers in TDirectories. For opening files, there's such a vast difference between opening a ROOT file for reading and opening a ROOT file for writing, that we took the ROOT-like names for these, update and recreate, and made them separate functions instead of arguments to a single function.

To clarify "such a vast difference between opening a ROOT file for reading and opening a ROOT file for writing," I specifically mean ROOT files because of the way that they act as filesystems in a file. (The same is true of HDF5, and h5py also does not follow Python's open semantics; not exactly, anyway.) The "I" and the "O" implementations in Uproot have almost zero overlap, and so driving them with the same object would be inviting bugs: the object would have to remember whether it's a reader or a writer and never get these functionalities crossed. Thus, we have ReadOnlyDirectory and WritableDirectory, etc.

Although now that I think about it, uproot.open is just a function, not a class constructor. We could have it return different object types, depending on the value of a mode parameter:

  • mode="r" (default) would do what it currently does, returning a ReadOnlyDirectory,
  • mode="w" would effectively call uproot.recreate, returning a WritableDirectory, and
  • mode="a" (or mode="r+w"?) would effectively call uproot.update. However, I don't know whether this should be "a" or "r+w" because we're neither talking about appending nor overwriting: the access is higher-level than the stream of bytes (or Unicode code-points) that open usually returns.

On the third hand, I'm thinking this wouldn't be a good interface after all. It looks like Python's open, but uproot.open takes a lot more arguments and most of those arguments have no meaning for writable files. Adding this way of calling uproot.open to get uproot.recreate introduces ambiguity about what to do with the http_handler and such (writable files are not writable over a network).

Since readers and writers have such different feature sets, creating them with different functions more clearly separates the ways you can configure those different feature sets.

But maybe, in addition to raising errors on unrecognized options, we should recognize mode and return a helpful error message: like, if the user passes "w", we can say to use uproot.recreate. A user unfamiliar with ROOT wouldn't have guessed that name.

@HDembinski
Copy link
Member Author

HDembinski commented Oct 19, 2022

On the third hand

:-)

I'm thinking this wouldn't be a good interface after all. It looks like Python's open, but uproot.open takes a lot more arguments and most of those arguments have no meaning for writable files. Adding this way of calling uproot.open to get uproot.recreate introduces ambiguity about what to do with the http_handler and such (writable files are not writable over a network).

I can only disagree from my naive user perspective.

We want to avoid surprise, no? uproot.open looks at first glance like the familiar builtin open. Yes, I am ignoring all the complicated keywords that it has, which I never needed so far. You are telling me, that the familiar pattern that I learned - I can use open to read and write files with mode="r" and mode="w" - only works halfway. Now I have to remember that this familiar pattern does not work with uproot. The pattern is broken and that's extra mental load for me.

The Zen of Python says: There should be one-- and preferably only one --obvious way to do it. For a ROOT user, it may be obvious to call recreate, but using "recreate" for writing was already artificial and pattern breaking in the ROOT days. Using open for reading and writing was already established standard in C and C++ when ROOT was written, see e.g. https://cplusplus.com/reference/cstdio/fopen. Unfortunately, the ROOT authors ignored all established standards when designing ROOT.

In general, I believe that design should be guided - but not dictated - by implementation. The implementation follows the design, not the other way round. I think it is fair to raise an error in uproot.open when users try to pass options that don't work for writing files. I haven't looked into the details, maybe you are right and mixing all the options of recreate into open would be bad, but maybe you don't have to do that. Maybe you can have a basic uproot.open that works like the builtin open and covers 99% of simple use cases and a read or load (strawman name) which has all customization options specific for reading files. In the Python stdlib, they often offer an expert-level function with many options and full control, and a simple interface for the masses.

If that still does not convince you, then please use the signature uproot.open(path, *, ...) with the star so that positional arguments after path are an error and raise an error when the user passes mode as a keyword.

@jpivarski
Copy link
Member

On Uproot's open/recreate split: I believe this is a case that was guided by implementation. The separation between read-code and write-code is pretty stark; arguably, they could have been two libraries. (Uproot and Downroot?) Maybe the original problem was naming the entry function open, though Uproot existed for several years before it had any writing capabilities at all, or any plans to add them.

On ROOT's original TFile: I'm not convinced that a function that opens a filesystem-like database should follow the semantics of a byte- or codepoint-level open function; they're operating on very different levels of abstraction. Looking around at some libraries, I see that quite a few of them do, though mode="a" is a strange thing to think about when the interface is an unsorted set of key-value pairs. Maybe it means you can't del objects, but can add new ones.

If that still does not convince you, then please use the signature uproot.open(path, *, ...) with the star so that positional arguments after path are an error and raise an error when the user passes mode as a keyword.

We should consider marking some arguments as keyword-only across the whole codebase. It wasn't done before because it was written to be compatible with Python 2, but that support was removed a long time ago. Turning arguments into keyword-only arguments is a backward-incompatible change, but that's allowed before the release of version 5.0.0 in December.

And we should find out why extra arguments are not raising errors. I haven't forgotten that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged
Projects
None yet
Development

No branches or pull requests

3 participants