New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in1d speed up (Trac #1603) #2199
Comments
Attachment added by trac user nhmc on 2010-09-02: timings.png |
Attachment added by trac user nhmc on 2010-09-02: setmember.py |
Attachment added by trac user nhmc on 2010-09-02: patch.diff |
@rc wrote on 2010-09-03 Hi Neil, the patch looks good to me (very nice timing plot!), thanks for writing it. However I cannot try to run it or apply it right now as I am just installing myself on a stay abroad. I may be able to do it next week, but somebody else could commit it IMHO faster :). |
@rc wrote on 2010-09-21 ok, so I have tested it now, all tests pass. I just wonder, whether the test_in1d() tests in numpy/lib/tests/test_arraysetops.py test both branches, i.e. the old branch and the new one for the case when len(ar2) is very small? |
trac user nhmc wrote on 2010-09-22 Yes the tests should be updated. The masked array version of in1d could also be changed. I'll try to take a look at these over the weekend. |
Attachment added by trac user nhmc on 2010-11-06: patch2.diff |
trac user nhmc wrote on 2010-11-06 Ok, I've attached a 2nd patch, patch2.diff, that adds new tests which check both paths through the new in1d(). All tests are ok for me on Python 3.1 and 2.6 after applying the patch. I've left the masked array versions of in1d unchanged. Someone more familiar with the masked array code could tweak those if they'd like to spend time on it. |
Original ticket http://projects.scipy.org/numpy/ticket/1603 on 2010-09-02 by trac user nhmc, assigned to unknown.
Robert Kern mentioned that in1d(ar1, ar2) can be slow for the case when len(ar2) is very small:
http://article.gmane.org/gmane.comp.python.numeric.general/40077
I've attached a script that compares timings for the existing version of in1d and the kern_in function described in the thread above; I use this to work out which algorithm is fastest for given lengths of ar1 and ar2 (see also the attached plot).
Also attached is a patch that changes in1d to use the kern_in algorithm when it results in a speed up. I think the speedup, which can be > 10x for very large ar1 and very small ar2, is worth the minor increase in code complexity.
The text was updated successfully, but these errors were encountered: