Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: linear interpolation of complex values in lib.interp #6872

Merged
merged 1 commit into from
May 13, 2016

Conversation

pec27
Copy link
Contributor

@pec27 pec27 commented Dec 21, 2015

I wanted to be able to do linear interpolation between complex values, i.e.

y = np.interp(mu, theta, m)

rather than

y = np.interp(mu, theta, m.real) + 1.0j*interp(mu, theta, m.imag)

This could be just achieved with a wrapper for the above, detecting the complex case, but I thought I would reduce the double cache-misses by covering the complex case in compiled_base.c

This is my first pull request, please be gentle :-).

@pec27
Copy link
Contributor Author

pec27 commented Dec 22, 2015

Choosing complex vs. real interpolation via cast attempts on fp (using PyArray_ContiguousFromAny) was not catching the cast error and failing on some of the builds. I switched to passing in a flag (to compiled_interp), to choose the behaviour.

@charris charris added this to the 1.12.0 release milestone Apr 16, 2016
@ahaldane
Copy link
Member

ahaldane commented May 7, 2016

Thanks @pec27 for the PR, and sorry it's taken a while to get to it.

It looks like this gives a speedup of about 2x over the simple version you pasted, which is pretty good for a function I suspect people call repeatedly. I tested the PR with:

N = 100000
mu = np.arange(N) + np.random.rand(N)
theta = np.arange(N)
m = np.random.rand(N) + 1.0j*np.random.rand(N)
%timeit np.interp(mu, theta, m)
100 loops, best of 3: 2.7 ms per loop
%timeit np.interp(mu, theta, m.real) + 1.0j*np.interp(mu, theta, m.imag)
100 loops, best of 3: 5.61 ms per loop

The code looks correct, however arr_interp seems a bit too big, and it feels like there is duplicated code. Plus, this code only generalizes for DOUBLE and CDOUBLE, but misses FLOAT16, FLOAT32 and the CFLOAT equivalents.

What I think would be the "ideal" version of this PR is a little bit more complicated than what you have done, but I think it wouldn't be too hard to do if you are interested, and I am happy to give pointers. It involves using numpy's "repeat" preprocessing, which is essentially a big macro system: It makes repeated copies of a block of code but varying the data types.

To get an idea of what this means look at PR #7464 which generalize arr_bincount using repeat ,or also by looking at the use of repeat in the file numpy/core/src/multiarray/arraytypes.c.src.

So essentially your work would be to take the two code paths you've already written, but instead of splitting them up using if statements you would split them up using some combination of switch and repeat (I think).

@pec27
Copy link
Contributor Author

pec27 commented May 7, 2016

Thanks @ahaldane for the looking over this! Suggestions on improving code re-use are always welcome, I imagine I may have missed some tricks with the numpy type system. Having said that my intention with this PR was to extend interpolation from the reals to the complex field whilst introducing no regressions, i.e. the interpolation precision is still float64 if you are doing interpolation with real values (which is also the precision of the interpolation of the components of the complex numbers).

I can see the use-case of allowing different precisions, e.g. float16, float32 etc for interpolation, but potentially users were relying both on the accuracy and on the return type being float64, and then there is potential ambiguity with what to do if say x.dtype is float32 and fp.dtype is float16 (or even one or both being integer type - should integer interpolation of an integer be an integer? I think not but others may disagree). Given the potential regressions perhaps handling different precisions should go in a separate PR?

@ahaldane
Copy link
Member

ahaldane commented May 7, 2016

That's a good point, users might be expecting double for the result. I think the ambiguity problems you mentioned can be avoided, but I'll drop that idea for now.

In the case we only support double and cdouble, I still would like to think about whether there is a tidier way of writing this up. I feel like arr_interp has become long and overly nested.

There are two alternate possibilities I am thikning about (besides leaving it as it is):

  1. Have two functions, arr_interp and arr_interp_complex which are very similar, but use different types and slope calculations. There would be a bunch of duplicate code, but since it's just one function maybe it's OK. Anyway the code is already mostly divided into two big if-statements.
  2. Use repeat to duplicate the code, but only repeating over DOUBLE and CDOUBLE. On the plus side we reduce code duplication and can easily extend to more types later, on the downside code using repeat can look a bit spaghetti-ish and there's more mental overhead to it.

Or maybe there is a better way of tidying it using appropriate helper functions?

I am leaning towards option 1 now - it is probably the easiest to do. Once it is written up in that form it it shouldn't be too much work if we ever want to adapt or generalize it in the future. Does that seem reasonable?

@pec27
Copy link
Contributor Author

pec27 commented May 10, 2016

Option 1 is pretty straightforward, as you point out the code is basically just switched in one big if statement so two functions is probably a little easier to read. I’ll update the PR and you can see what you think. If we had demand for a 3rd case then I’d definitely prefer something like repeat, but as it stands probably perfect is the enemy of good and all that.

@ahaldane
Copy link
Member

Yeah, I'm happier with how this looks. Let's go with two functions.

Some things to fix:

  • Always use spaces instead of tabs for indentation. It looks to me that, besides that, your code is in the correct style (discussed in the numpy C style guide and the links to the style guide PEPs in HOWTO DOCUMENT).
  • At some point would be good to clean up the commits by squashing/reordering as you see fit, and the commit messages should be structured according to the commit guidelines. That is, the first line should start with "ENH:" or similar, then a blank line, then a short description.
  • just for ease of reading it would be nice to leave arr_interp completely unchanged, unless you think the changes make it cleaner.
  • In the python interp function, I think we might get rid of some of the nested-if structure by first doing
if np.iscomplexobj(fp):
    interp_func = compiled_interp_complex
    input_dtype = np.complex128
else:
    interp_func = compiled_interp
    input_dtype = np.float64

and then using interp_func and input_dtype appropriately in the rest of the function. Incidentally, I am not sure why the call to fp = np.asarray(fp, dtype=np.complex128) is necessary, perhaps we can remove it and avoid needing input_dtype at all.

After that I think we are pretty much done.

@mhvk
Copy link
Contributor

mhvk commented May 10, 2016

The casting to array is needed to ensure the sorting with an index array a bit further down works. I must admit I also do not understand why the dtype needs to be given, though.

p.s. How I wish all these np.asarray calls were removed or at least turned into np.asanyarray; it makes subclasses so hard to get right, while if the code had just relied on ducktyping it would "just work". Anyway, outside of the scope of this PR (which is great!).

@charris charris added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label May 10, 2016
@charris
Copy link
Member

charris commented May 10, 2016

This looks to be getting towards the end. Could you add a note to doc/release/1.12.0-notes.rst.

@ahaldane
Copy link
Member

ahaldane commented May 10, 2016

@mhvk @pec27 Maybe at the very top of the function we can just do fp = np.asarray(fp). That's also a slight optimization since it avoids the cost of the call to asarray inside iscomplexobj for non-array inputs.

Py_XDECREF(ax);
Py_XDECREF(af);
return NULL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, add a blank line here between functions

@pec27
Copy link
Contributor Author

pec27 commented May 11, 2016

Thanks @ahaldane @mhvk I think I got all those changes in. @ahaldane - I’m not sure about squashing commits, is that still ok since they are pushed to github? Commit d371eb01509a3e632554c89ebe9323619a122449 was supposed to update and squash them but then of course I had to merge back in...

@charris is the release note request for me? Do I have access to that file?

@ahaldane
Copy link
Member

ahaldane commented May 11, 2016

@pec27 you're right that changing history (eg by squashing commits) is a bad idea if anyone has already pulled your code, but in this case nobody has. When making a PR it is acceptable to rewrite the history of your copy before it is merged into our copy.

Here's how I usually do it: After I've finished modifying the code, I do

git rebase -i HEAD~6

to squash/amend the commits I want. You need to change the "6" to the number of previous commits you want to rebase over. After you squash it will ask you to rewrite the commit messages (which you should do). Then I do

git push https://github.com/ahaldane/numpy --force

to push to by github fork. force is necessary to overwrite the old history. See about-git-rebase or look at the git docs for more info, and feel free to ask if you have trouble.

@ahaldane
Copy link
Member

ahaldane commented May 11, 2016

Also, @charris indeed did ask you to add a release note to that file (it's in the numpy file tree). I think it would go under "new features", and you can just copy the style of the other features already there.

Also, I don't think you've removed all the tabs yet, and it would be nice to undo any changes to arr_interp - right now it looks like you've moved some code around in that function.

@pec27
Copy link
Contributor Author

pec27 commented May 11, 2016

Ah, I think I removed all the tabs and then merged them back in. Doh. I will fix and recommit.

@ahaldane
Copy link
Member

Oh and one more comment: Choosing how to squash/amend your commits is up to your judgement in general. In this case the changes are self contained enough that I would be satisfied if you simply squashed everything into a single commit, for example.

@pec27
Copy link
Contributor Author

pec27 commented May 11, 2016

Thanks for the tips on rebasing. About the doc/release/1.12.0-notes.rst, I guess I checked out before that file was created - should I create a new file with just my release note?

@pv
Copy link
Member

pv commented May 11, 2016 via email

@charris
Copy link
Member

charris commented May 11, 2016

@pec27 To get the notes file, make sure your master branch is up to date, then checkout your branch and git rebase master. That should play your commits on top of current master and bring in the file for editing.

@@ -24,7 +24,7 @@
from numpy.lib.twodim_base import diag
from .utils import deprecate
from numpy.core.multiarray import _insert, add_docstring
from numpy.core.multiarray import digitize, bincount, interp as compiled_interp
from numpy.core.multiarray import digitize, bincount, interp as compiled_interp, interp_complex as compiled_interp_complex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is too long, it needs to be shortened like

from numpy.core.multiarray import (
    digitize, bincount, interp as compiled_interp, 
    interp_complex as compiled_interp_complex
    )

@ahaldane
Copy link
Member

ahaldane commented May 11, 2016

@pv thanks I didn't know that. @pec27 if you don't squash yourself, I can use that.

Besides the line note I just added and the release note it looks good to me.

@pec27
Copy link
Contributor Author

pec27 commented May 12, 2016

Hi @charris @ahaldane I rebased to master but github doesn’t seem happy - it thinks I am now trying to PR 693 commits. I’m not quite sure how to tell my github fork that I have rebased from upstream. Any advice from wiser heads welcome! If it helps I have so few changes that I should be able to re-apply everything manually...

@charris
Copy link
Member

charris commented May 12, 2016

Hmm, I have no idea what you did, although it looks like you did a merge in there somewhere. The first thing to do is to recover to something sane. For that, run git reflog, which will give you a list of hashes associated with actions you have taken. Find a good spot to return to, and then git reset --hard <hash>. You can also use the HEAD@{n} instead of the hash, but whichever. That will get you back to an earlier starting point.

@pec27
Copy link
Contributor Author

pec27 commented May 12, 2016

Thanks @charris, that restores me back to somewhere sane, I’ll try the rebase again tomorrow with fresh eyes.

@pec27 pec27 force-pushed the complex-interp branch 2 times, most recently from ccfeb6d to eaaa7e8 Compare May 12, 2016 20:32
@pec27 pec27 force-pushed the complex-interp branch 3 times, most recently from 333ee7c to 67295a0 Compare May 12, 2016 20:57
lib.interp function now allows interpolation of complex fp with
complex128 precision (i.e. equivalent to lib.interp on the real and
imaginary parts). Tests are added for the non-periodic and periodic
cases.
@pec27
Copy link
Contributor Author

pec27 commented May 12, 2016

The rebase went smoothly today, I blame bit rot :-). I think I have covered everything (release note, squashed commit, numpy style conventions etc), @ahaldane could you take a look to see if it is pull-able?

@ahaldane
Copy link
Member

Looks nice and clean to me now!

Thanks a lot for the PR @pec27, I'm going to go ahead and merge.

@ahaldane ahaldane merged commit f1a3631 into numpy:master May 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants