Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add dask_to_root #1085

Merged
merged 28 commits into from Jan 23, 2024
Merged

feat: add dask_to_root #1085

merged 28 commits into from Jan 23, 2024

Conversation

zbilodea
Copy link
Collaborator

No description provided.

@lgray
Copy link
Contributor

lgray commented Jan 19, 2024

Just a stylistic comment but I think the function would be better situated within uproot as uproot.dask.to_root, thoughts?

I think it would match dask-awkward and awkward better in that regard.

@jpivarski

@jpivarski
Copy link
Member

You mean the function would be an attribute of another function, uproot.dask?

I've gone back and forth on that—there used to be an ak.to_parquet.dataset for converting already-written Parquet files into a dataset (the directory with metadata convention). But by not being in the same level of hierarchy as the rest of the ak.* functions, it was hard to discover.

I understand the logic of writing-from-Dask being associated with reading-from-Dask. Uproot doesn't already have the flat bundle of functions that Awkward Array has, so there's less to maintain. But still, locating functions as attributes on other functions is unusual enough in Python as to be an antipattern—if someone is looking for it, they're unlikely to think of looking there.

Perhaps it should just be a naming convention: uproot.dask_write? From a typing point of view, that only turns a dot into an underscore, but from an organizational point of view, there's no nesting of functions. (I don't think it needs "to ROOT" because this is Uproot—it's only ever going to read from or write to ROOT.)

tests/test_dask_to_root.py Outdated Show resolved Hide resolved
pyproject.toml Outdated
"typing_extensions>=4.1.0; python_version < \"3.11\""
"typing_extensions>=4.1.0; python_version < \"3.11\"",
"dask >=2023.04.0",
"dask-awkward>=2023.12.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dask shouldn't be a top level dependency of the uproot package? @jpivarski

I suppose below we should import dask within dask_write such that it complains when dask is not available, but doesn't impede more standard functioning of uproot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct: dask and dask-awkward should not be strict dependencies for Uproot. These two lines should be removed.

@lgray
Copy link
Contributor

lgray commented Jan 19, 2024

@jpivarski ok that naming scheme makes sense, and I am fine with it!

@jpivarski
Copy link
Member

In Uproot and Awkward, name the tests by PR number:

tests/test_1085_dask_to_root.py

@lgray
Copy link
Contributor

lgray commented Jan 19, 2024

OK I checked it with nanoevents, repartitioning, concatenating, all using a distributed Client and it seems to be robust. It touches data correctly and fills the output TTree well.

Looks good to me :-)

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good; just be sure to

  • remove version.py from this PR, now that it's auto-generated
  • remove dask and dask-awkward from the list of dependencies

After that, request a review from me and this will be mergeable.

src/uproot/version.py Outdated Show resolved Hide resolved
pyproject.toml Outdated
"typing_extensions>=4.1.0; python_version < \"3.11\""
"typing_extensions>=4.1.0; python_version < \"3.11\"",
"dask >=2023.04.0",
"dask-awkward>=2023.12.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct: dask and dask-awkward should not be strict dependencies for Uproot. These two lines should be removed.

tests/test_dask_to_root.py Outdated Show resolved Hide resolved
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove the 7 output files before merging:

src/uproot/my-output/*.root

But, after they're removed (as well as the if __name__ == "__main__" part), this is ready to merge!

tests/test_1085_dask_write.py Outdated Show resolved Hide resolved
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind; I was looking at the wrong diffs. The temporary output files have already been removed.

tests/test_1085_dask_write.py Outdated Show resolved Hide resolved
zbilodea and others added 2 commits January 22, 2024 21:18
Co-authored-by: Jim Pivarski <jpivarski@users.noreply.github.com>
Co-authored-by: Jim Pivarski <jpivarski@users.noreply.github.com>
@zbilodea zbilodea merged commit 24c8487 into main Jan 23, 2024
21 checks passed
@zbilodea zbilodea deleted the feat-add-dask-to-root branch January 23, 2024 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants