ENH: add `inverse` parameter to numpy.in1d(). #3005

Merged
merged 1 commit into from Apr 9, 2013

Conversation

Projects
None yet
3 participants
@jphalip
Contributor

jphalip commented Feb 20, 2013

This is equivalent to doing a 'not in' operation in a way that is faster than doing np.invert(in1d(a, b)).

If you think that adding a new notin1d() function would be better than adding this new inverse parameter to in1d(), let me know and I can update the patch (although that will probably require duplicating a lot of code from in1d()).

Thanks!

@charris

This comment has been minimized.

Show comment Hide comment
@charris

charris Mar 1, 2013

Owner

Looks generally good, but inverse seems a bit off. Maybe complement would be a better choice.

This needs to be discussed on the list because it adds an argument, but I may have missed that.

Owner

charris commented Mar 1, 2013

Looks generally good, but inverse seems a bit off. Maybe complement would be a better choice.

This needs to be discussed on the list because it adds an argument, but I may have missed that.

@njsmith

This comment has been minimized.

Show comment Hide comment
@njsmith

njsmith Mar 1, 2013

Owner

Is this actually faster than doing
c = in1d(a, b)
invert(c, out=c)
? How much?
On 1 Mar 2013 03:30, "Charles Harris" notifications@github.com wrote:

Looks generally good, but inverse seems a bit off. Maybe complement would
be a better choice.

This needs to be discussed on the list because it adds an argument, but I
may have missed that.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/3005#issuecomment-14271624
.

Owner

njsmith commented Mar 1, 2013

Is this actually faster than doing
c = in1d(a, b)
invert(c, out=c)
? How much?
On 1 Mar 2013 03:30, "Charles Harris" notifications@github.com wrote:

Looks generally good, but inverse seems a bit off. Maybe complement would
be a better choice.

This needs to be discussed on the list because it adds an argument, but I
may have missed that.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/3005#issuecomment-14271624
.

@jphalip

This comment has been minimized.

Show comment Hide comment
@jphalip

jphalip Mar 6, 2013

Contributor

@njsmith I've written the following benchmark:

import timeit

setup = """
import numpy as np
a = np.array([2, 3, 6, 23, 34, 57, 68, 24, 35, 56, 96, 36, 47, 58, 42, 63, 89, 70])
b = np.array([5, 2, 57, 35, 36, 47, 67, 28, 67, 79, 53, 64, 42, 63, 95, 82])
"""
print min(timeit.Timer('c = np.in1d(a, b); np.invert(c, out=c)', setup=setup).repeat(10, 10000))
print min(timeit.Timer('c = np.in1d(a, b, inverse=True)', setup=setup).repeat(10, 10000))

The results:

$ python numpy-test.py 
1.13440489769
0.816557884216

When running the same script before applying the patch (and without the last line of the script), the result is:

$ python numpy-test.py 
1.13118910789

So my suggested patch does make the overall operation slightly faster (by avoiding to run the invert() function) without altering the performance of the previous behavior of the in1d() function. Do you consider this speed increase significant?

Contributor

jphalip commented Mar 6, 2013

@njsmith I've written the following benchmark:

import timeit

setup = """
import numpy as np
a = np.array([2, 3, 6, 23, 34, 57, 68, 24, 35, 56, 96, 36, 47, 58, 42, 63, 89, 70])
b = np.array([5, 2, 57, 35, 36, 47, 67, 28, 67, 79, 53, 64, 42, 63, 95, 82])
"""
print min(timeit.Timer('c = np.in1d(a, b); np.invert(c, out=c)', setup=setup).repeat(10, 10000))
print min(timeit.Timer('c = np.in1d(a, b, inverse=True)', setup=setup).repeat(10, 10000))

The results:

$ python numpy-test.py 
1.13440489769
0.816557884216

When running the same script before applying the patch (and without the last line of the script), the result is:

$ python numpy-test.py 
1.13118910789

So my suggested patch does make the overall operation slightly faster (by avoiding to run the invert() function) without altering the performance of the previous behavior of the in1d() function. Do you consider this speed increase significant?

@charris

This comment has been minimized.

Show comment Hide comment
@charris

charris Apr 3, 2013

Owner

I still don't like the inverse name, I think a verb, 'invert' or 'complement', would be better, or maybe an command return_inverse like in unique.

Owner

charris commented Apr 3, 2013

I still don't like the inverse name, I think a verb, 'invert' or 'complement', would be better, or maybe an command return_inverse like in unique.

@jphalip

This comment has been minimized.

Show comment Hide comment
@jphalip

jphalip Apr 8, 2013

Contributor

This parameter doesn't have the same behavior as unique's return_inverse, so to avoid any confusion I've opted for using the verb invert instead, like you've suggested. What do you think?

Thanks!

Contributor

jphalip commented Apr 8, 2013

This parameter doesn't have the same behavior as unique's return_inverse, so to avoid any confusion I've opted for using the verb invert instead, like you've suggested. What do you think?

Thanks!

@charris

View changes

numpy/lib/arraysetops.py
+ If True, the values in the returned array are inverted (that is,
+ False where an element of `ar1` is in `ar2` and True otherwise).
+ Default is False. ``np.in1d(a, b, invert=True)`` is equivalent
+ to (but is faster than) ``np.invert(in1d(a, b))``.

This comment has been minimized.

Show comment Hide comment
@charris

charris Apr 8, 2013

Owner

With blank lines above and below, put

        .. versionadded:: 1.8.0

If you grep the sources you can find lots of examples.

@charris

charris Apr 8, 2013

Owner

With blank lines above and below, put

        .. versionadded:: 1.8.0

If you grep the sources you can find lots of examples.

@charris

This comment has been minimized.

Show comment Hide comment
@charris

charris Apr 8, 2013

Owner

Looks good to me. Do the versionadded bit I commented on and add a note in the doc/release/1.8.0-notes.rst file that the option has been added. You can find examples in other release notes if you need them. Then this can go in.

Owner

charris commented Apr 8, 2013

Looks good to me. Do the versionadded bit I commented on and add a note in the doc/release/1.8.0-notes.rst file that the option has been added. You can find examples in other release notes if you need them. Then this can go in.

@jphalip

This comment has been minimized.

Show comment Hide comment
@jphalip

jphalip Apr 8, 2013

Contributor

Great, thanks for the feedback! I've just added the directive to the doc.

Contributor

jphalip commented Apr 8, 2013

Great, thanks for the feedback! I've just added the directive to the doc.

@charris

View changes

numpy/lib/arraysetops.py
@@ -295,6 +295,14 @@ def in1d(ar1, ar2, assume_unique=False):
If True, the input arrays are both assumed to be unique, which
can speed up the calculation. Default is False.
+ .. versionadded:: 1.8.0

This comment has been minimized.

Show comment Hide comment
@charris

charris Apr 8, 2013

Owner

Needs to be moved after the new argument and with an extra indent so it lines up with the explanation. See for instance the documentation of the axis parameter of polyder in numpy/polynomial/polynomial.

@charris

charris Apr 8, 2013

Owner

Needs to be moved after the new argument and with an extra indent so it lines up with the explanation. See for instance the documentation of the axis parameter of polyder in numpy/polynomial/polynomial.

@jphalip

This comment has been minimized.

Show comment Hide comment
@jphalip

jphalip Apr 9, 2013

Contributor

Here's another shot. Let me know if that works. Thanks!

Contributor

jphalip commented Apr 9, 2013

Here's another shot. Let me know if that works. Thanks!

@charris

This comment has been minimized.

Show comment Hide comment
@charris

charris Apr 9, 2013

Owner

Looks good. Are you going to add to the release notes?

Owner

charris commented Apr 9, 2013

Looks good. Are you going to add to the release notes?

@jphalip

This comment has been minimized.

Show comment Hide comment
@jphalip

jphalip Apr 9, 2013

Contributor

Release notes added!

Contributor

jphalip commented Apr 9, 2013

Release notes added!

@charris

This comment has been minimized.

Show comment Hide comment
@charris

charris Apr 9, 2013

Owner

Notes look good, but release notes have changed format in master. I can fix things, or you can try a rebase master and push origin --force <branch name>. You will probably get a merge error in the rebase and need to fix things. Which way do you want to go?

Owner

charris commented Apr 9, 2013

Notes look good, but release notes have changed format in master. I can fix things, or you can try a rebase master and push origin --force <branch name>. You will probably get a merge error in the rebase and need to fix things. Which way do you want to go?

@jphalip

This comment has been minimized.

Show comment Hide comment
@jphalip

jphalip Apr 9, 2013

Contributor

Ok, I'll try to rebase on origin/master and then squash all commits into one for cleanliness. Hang tight!

Contributor

jphalip commented Apr 9, 2013

Ok, I'll try to rebase on origin/master and then squash all commits into one for cleanliness. Hang tight!

@jphalip

This comment has been minimized.

Show comment Hide comment
@jphalip

jphalip Apr 9, 2013

Contributor

Ok, rebasing and squashing are done!

Contributor

jphalip commented Apr 9, 2013

Ok, rebasing and squashing are done!

charris added a commit that referenced this pull request Apr 9, 2013

Merge pull request #3005 from jphalip/in1d-inverse
ENH: add `inverse` parameter to numpy.in1d().

@charris charris merged commit 230db77 into numpy:master Apr 9, 2013

1 check passed

default The Travis build passed
Details
@charris

This comment has been minimized.

Show comment Hide comment
@charris

charris Apr 9, 2013

Owner

Pushed. Thanks.

Owner

charris commented Apr 9, 2013

Pushed. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment