Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: ENH: Add nd-support to trim_zeros #15181

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

lagru
Copy link
Contributor

@lagru lagru commented Dec 26, 2019

Add support for trimming nd-arrays with trim_zeros while preserving the old behavior for 1D input. The new parameter axis can specify a single dimension to be trimmed (reducing all other dimensions to the envelope of absolute values). If None or multiple values are specified, all or the selected dimensions are trimmed iteratively.

Additionally provide the atol, rtol and return_lengths parameters. The first two control what is considered a "zero" to be trimmed, the latter provides the user with the on how much was trimmed.

I recently needed this behavior myself and think this makes the function applicable to more use cases.

This is currently marked as work in progress, as I wasn't sure about the new API. I tried to preserve the old API but I think it's worth having a discussion about refactoring and additional API changes. See also the review-comments below.

Add support for trimming nd-arrays with trim_zeros while preserving the
old behavior for 1D input. The new parameter `axis` can specify a single
dimension to be trimmed (reducing all other dimensions to the envelope
of absolute values). If None or multiple values are specified, all or
the selected dimensions are trimmed iteratively.

This should make the function applicable to more use cases.

Additionally provide the `atol`, `rtol` and `return_lengths` parameters.
The first two control what is considered  a "zero" to be trimmed, the
latter provides the user with the on how much was trimmed.
def trim_zeros(
filt,
trim='fb',
axis=-1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the proper default value here. None (trim all dimensions), 0 or 1 (trim first/last dimension) are candidates that would preserve the old behavior.

numpy/lib/function_base.py Outdated Show resolved Hide resolved
numpy/lib/function_base.py Outdated Show resolved Hide resolved
start = stop = absolutes.shape[current_axis]
if "f" not in trim:
# except when only the backside is to be sliced
stop = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what you mean here - are you saying to slice leaving only the front, or slice removing the front?

When asked to slice both ends of an all zero array, I think I'd expect either:

  • Only the back to be sliced
  • The front to be sliced if 'fb' is passed, and the back to be sliced if 'bf' is passed

Copy link
Contributor Author

@lagru lagru Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I prefer the latter behavior and have modified the code accordingly.

The old function was really liberal with what it accepted as input for the trim parameter (e.g. only checking "f" in trim). I think restricting what is allowed to be passed in would make that parameter easier to understand and document.

Edit: But I've left it unrestricted for now.

In case the user passes in an all-zero array and string to `trim` that
starts with "b", the last dimension should be sliced first. In all other
cases the first dimension takes precedence.
It's easy to emulate this behavior by assigning zeros appropriately
beforehand.
@eric-wieser
Copy link
Member

An alternate spelling with vectorization:

nonzero = np.argwhere(filt)
if len(nonzero) == 0:
    if trim.startswith("b"):
        start = stop = np.zeros(filt.ndim)
    else:
        start = stop = np.array(filt.shape)
else:
    start = nonzero.min(axis=0)
    stop = nonzero.max(axis=0)

# TODO: make a function that just returns `start` and `stop`?

sl = tuple(
    slice(a, b) if ax in axis else slice(None)
    for ax, (a, b) in enumerate(zip(start, stop))
)
return filt[sl]

as it's own function. trim_zeros uses its output to newly support the
nd-case.
@lagru
Copy link
Contributor Author

lagru commented Jan 13, 2020

@eric-wieser Concerning the discussion in #15181 (comment) and your other suggestions: You seem to be in favor of adding a new function (e.g. arg_trim_zeros) that returns indices to the zero area. I've pushed a draft so that we can discuss that API...

@lagru
Copy link
Contributor Author

lagru commented Feb 8, 2020

@eric-wieser what do you think of the currently proposed API? I'll add missing tests if you're happy with it and give a notice (required I think?) on the mailing list.

An alternative to arg_trim_zeros would be to remove it but document its implementation as a lightweight example inside trim_zeros's Note section. It basically amounts to "use nonzero and call max, min on the appropriate dimension of the output".

Base automatically changed from master to main March 4, 2021 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Awaiting a code review
Development

Successfully merging this pull request may close these issues.

None yet

3 participants