Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: rename util functions #2316

Merged
merged 15 commits into from
Mar 16, 2023
Merged

refactor: rename util functions #2316

merged 15 commits into from
Mar 16, 2023

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Mar 16, 2023

My long-term plan (as discussed with Jim) is to move towards free symbols for function usage over ak._ imports.

The benefits of this are:

  • Easier use of type hints & improved readability
  • Separation of concerns / decoupling of code (otherwise imports break!)

The concrete motivations for this PR are that ak._util couples a lot of modules right now, making refactoring more fiddly than it needs to be.

Although the L1/L2 APIs will remain "REPL-friendly" (don't require many imports), the L3 API can be import-only. Gradually we are moving towards this approach, with this PR making a big jump. Eventually I'd like all of awkward to use imports internally.

I've also renamed functions to better reflect their purpose.

@agoose77 agoose77 requested a review from jpivarski March 16, 2023 12:29
@agoose77
Copy link
Collaborator Author

agoose77 commented Mar 16, 2023

@jpivarski to review this, I would benefit from the following considerations:

  • are you happy with the new filenames?
  • are you OK with the move towards imports?
  • would you prefer any additional organisational structure?
    • Right now, I've added a flat hierarchy of files. We could nest them if needed.
  • Are you OK with the fact that L2 operations modules contain some L3 symbols in the global namespace? Users can only see these if they do evaluate e.g. ak.operations.ak_fill_none.cpu, i.e. not from ak.cpuorak.operations.cpu`. If you're against this (which already happens), we could hide these inside the function definitions. 1

Footnotes

  1. personally I think it's already enough trouble to get to these symbols that it's not worth the extra hassle for us

@agoose77
Copy link
Collaborator Author

agoose77 commented Mar 16, 2023

@henryiii could I get a second opinion on the use of from ak.operations.ak_xxx import * here? I think it's better to control the exports in a single place (with __all__) rather than __all__ and having the ak.operations module also define these names with from ak.operations.ak_xxx import xxx.

Are there any significant downsides to this? Ruff and other linters already can't see ak.xxx symbols because they're already from awkward.operations import * imported. This just means our internal ak.operations functions are also not visible. However, in the long run I'm planning on making awkward itself use from mod import name for all names, not just L3 ones.

@agoose77
Copy link
Collaborator Author

@jpivarski with this PR, I'm introducing a new rule:

  • any symbol used by > 1 module should have a "public" name, i.e. direct_content vs _direct_content
  • L1/L2 modules should only contain public symbols that are exported

Under this rule, I plan to move the _merge_parameters functions out of form.py into some private module where they can become public symbols. I think this helps readability.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize, the biggest thing that happened here is that _util.py has exploded into new, well-named, hidden modules, and functions that are only used in one module are defined in that module.

That's great!

I'm also happy that _util.py still exists. When we need to add a new helper function quickly (and know that it's not used in only one module), we'll need to find a place to put it and it's likely that it won't quite fit one of these names. _util.py can be a staging area for new helper functions, which may find another home later. Very general functions that are needed in ~all modules should continue to live in _util.py, because it's at one of the roots of the dependency DAG.

  • are you happy with the new filenames?

Yes, they're very clear.

  • are you OK with the move towards imports?

Yes, for any L3/L4, we can do that. It will make the dependencies between modules more clear.

  • would you prefer any additional organisational structure?
  • Right now, I've added a flat hierarchy of files. We could nest them if needed.

Flatness is usually easier to deal with when searching for a particular file. It only gets to be a problem when the list is very long.

  • Are you OK with the fact that L2 operations modules contain some L3 symbols in the global namespace? Users can only see these if they do evaluate e.g. ak.operations.ak_fill_none.cpu, i.e. not from ak.cpuorak.operations.cpu`. If you're against this (which already happens), we could hide these inside the function definitions.

It would be better to hide them, but maybe we should do so by hiding the operations module instead:

src/awkward/__init__.py
src/awkward/_operations/
src/awkward/_operations/ak_fill_none.py

and then the __init__.py does

from awkward._operations.ak_fill_none import *

with

__all__ = ("fill_none",)

in ak_fill_none.py.

This can be behind a deprecation cycle by adding

src/awkward/operations/
src/awkward/operations/__init__.py

that does

warn(deprecation_warning("this breaks in 2.3!"))

from awkward._operations import *

Then (unless someone uses hidden modules), there would be one place to get a function like ak.fill_none from—no ambiguity. Some tests don't respect that rule because they were written early, but that would be easy to find-and-replace.

@agoose77
Copy link
Collaborator Author

agoose77 commented Mar 16, 2023

It would be better to hide them, but maybe we should do so by hiding the operations module instead:

I like this idea.

_util.py can be a staging area for new helper functions,

Agreed!!

@agoose77 agoose77 enabled auto-merge (squash) March 16, 2023 22:45
@codecov
Copy link

codecov bot commented Mar 16, 2023

Codecov Report

Merging #2316 (63af956) into main (70d6bae) will increase coverage by 0.51%.
The diff coverage is 94.04%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/_connect/rdataframe/from_rdataframe.py 0.00% <0.00%> (ø)
src/awkward/contents/unmaskedarray.py 72.35% <60.00%> (+0.11%) ⬆️
src/awkward/operations/ak_nan_to_none.py 38.09% <60.00%> (+10.31%) ⬆️
src/awkward/_regularize.py 63.04% <63.04%> (ø)
src/awkward/operations/ak_from_cupy.py 66.66% <66.66%> (+16.66%) ⬆️
src/awkward/forms/form.py 83.27% <73.68%> (-0.82%) ⬇️
src/awkward/contents/emptyarray.py 75.00% <75.00%> (+0.12%) ⬆️
src/awkward/contents/indexedarray.py 78.11% <80.00%> (+0.05%) ⬆️
src/awkward/operations/ak_unzip.py 96.15% <80.00%> (+0.50%) ⬆️
src/awkward/_layout.py 83.50% <83.50%> (ø)
... and 140 more

@agoose77 agoose77 temporarily deployed to docs-preview March 16, 2023 22:59 — with GitHub Actions Inactive
@agoose77 agoose77 merged commit 7ea0183 into main Mar 16, 2023
@agoose77 agoose77 deleted the agoose77/refactor-split-util branch March 16, 2023 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants