Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add constant function to ufunc? #13577

Closed
hanzckernel opened this issue May 16, 2019 · 7 comments
Closed

Add constant function to ufunc? #13577

hanzckernel opened this issue May 16, 2019 · 7 comments

Comments

@hanzckernel
Copy link

hanzckernel commented May 16, 2019

I was wondering if there is a reason that constant function not included in built-in universal function. (Or am I just blind?)

This can cause a problem when we want to force broadcasting a function to keep dimension. As for functions like lambda x: 5 and lambda x, y: y will automatically reduce dimension.

Currently this can only be achieved by methods like np.vectorize or np.frompyfunc which are both not very efficient. np.full or np.full_like surely does the trick, but then one needs give arguments like x.shape etc., and can cause a problem when later these constant function/projection function being rewritten.

by passing constant function we can set keepdims = False by default and those who wants to force broadcasting can overwrite it to be True.

If it's designed not to include constant function to ufunc, can someone enlighten me why is that so?

@mhvk
Copy link
Contributor

mhvk commented May 16, 2019

@hanzckernel - I'm a bit confused about what you want to achieve precisely. In what way is full_like insufficient?

@hanzckernel
Copy link
Author

hanzckernel commented May 16, 2019

@mhvk Sorry for not making it clear. The problem here is when others have potential to modify my function from constant function to non-constant function (or the other way around) it will create a great mess. for instance if I have:

def f(x, y): return 5 * x

there might be cases when f is x-homogeneous and then when I need to use the results of f(arr_x, arr_y) for other things (say feed into an array) this will lead to dimension mismatch.

I actually encounter this problem when writing some function while implementing a class, when I do not know if certain methods is function which like f(x,y) above. Nonetheless, those who like to inherit my attributes in subclass they might overwrite the methods by some g(x) or h(y). As I have the results feed into some sparse matrices, this problem might slip unnoticed until some very later phase.

As mentioned use np.vectorize can force the dimension to pass on and so as other universal functions, and it causes no problem when inheriting to subclass (by creating metaclass). But then np.vectorize is not efficient, and I am really hoping ufunc should include constant function to increase efficiency.

And also from a math perspective, as ufunc has already included all commonplace functions, I don't see why constant function should leave untouched.

@seberg
Copy link
Member

seberg commented May 16, 2019

np.broadcast_arrays(...)[1].copy("K") would be pretty much exactly that (although it will not cast to a common dtype)? The issue I see with the constant function is that we would need multiple versions for both arguments?

@hanzckernel
Copy link
Author

hanzckernel commented May 16, 2019

@seberg Thanks a lot! That seems the best solution I have other than customizing ufunc from beginning.

After digging a bit in performance:

arr = np.arange(1, 2, 0.0001).reshape(10, -1)
def master_f(x): return np.broadcast_arrays(x, f(x))[1].copy('K')
def master_f_nocopy(x): return np.broadcast_arrays(x, f(x))[1]
%timeit arr+1 # this takes about 10microsec
%timeit master_f(arr) # this takes about 40 mircrosec
%timeit master_f_nocopy(arr) # this takes about 20 microsec

But still a great improvement in comparison with plain np.vectorize.

Followup Question: is there any reason this broadcast is not built in np.vectorize? As for which dtype I suppose it makes more sense to use datatype of the output of function? Is there other problems regarding multiple versions of arguments? Hope you can explain a bit.

Nonetheless, I shall consider this case closed. My thanks again.

@hanzckernel
Copy link
Author

@seberg And just to make things complete np.broadcast_arrays(*args, f(*args))[-1] would be more generic, as f might take multiple variables

@seberg
Copy link
Member

seberg commented May 16, 2019

np.vectorize should broadcast the inputs, since it creates a ufunc. broadcast_arrays will have a lot of python overhead. You could probably get around it if you really wanted (e.g. using np.nditer magic directly), but it will not be as readable. I had the [1] just because I was not sure you always want the last argument.

You could use np.result_type() or similar to approximate the ufunc behaviour for the output dtype if you care about it...

Frankly, I am not sure that answered all your questions?

@hanzckernel
Copy link
Author

@seberg Thanks that's really helpful. My original intention was merely to write a thin wrapper for my function to force them broadcast, while maintaining the readability. That was why I dig in ufunc in the first place, as my first thought would be to create some subclass to overwrite _numpy_ufunc_reduce behaviour as numpy wiki suggests.

But after all it was just me being ignorant.

Moreover, though broadcast_arrays was more pythonic, it occurs more efficient than np.nditer, the later of which I made following the recipe of documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants