-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Discussion: Use of masked arrays internally #31008
Copy link
Copy link
Open
Labels
Description
Chatting with @timhoffm in #31004 (comment), I'd like to more widely discuss our use of masked arrays in internal code. My position is that they add a lot of maintenance headache and fragility, and we would be best off excising them from the codebase altogether.
Currently we have 185 counts of np.ma in the core library. Most of them are pass-through checks, or are immediately .filled() with nans.
Problems
- Internal APIs do not usually document whether they expect or can handle masked arrays. It's easy to miss needing to handle the masked array case, contributors may not be familiar with them, and numpy functions still work so we can fail silently
- Lots of code clutter and specific paths needed to handle masked arrays
- Poor test coverage of masked array inputs/outputs
- Performance degradation from passing around extra data and not being able to rely on more efficient array operations.
Note also thata = np.ma.masked_array(a, mask=mask).filled()is slower thana = np.where(mask, np.nan, a)is slower thana[mask] = np.nan.
Proposal
- Replace all instances of masked arrays with float arrays filled with NaNs, or in instances where the underlying data needs to be preserved (eg when clipping) then use an explicitly defined boolean mask variable
- Sanitize all user inputs to pure numpy arrays and do not return masked arrays
- Document a dev policy to minimize internal use of masked arrays, and to never return them
Migration issues
cbook.safe_masked_invalid()is used a few places and causes some other functions to return a masked array. I haven't traced this all the way through to see where those come out- All
colors.Normalizesubclasses preserve input masks in their output (this is the only docstring I see with a masked array output documented for the public API)
Deprecation warnings might be difficult to raise without spamming regular uses of these functions, since we can't tell when users use the mask on returned arrays. We may need to accept some silent breakage and just make the release notes clear.
Reactions are currently unavailable