Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add one-hot special function or support broadcasting in signal.unit_impulse #20442

Closed
NeilGirdhar opened this issue Apr 10, 2024 · 9 comments

Comments

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Apr 10, 2024

The one-hot function is a useful function in machine learning. It would be really useful to have it in scipy.special, in particular, because of SciPy's commitment to support the Array API. Adding the function to scipy would mean that users of the Array API, and users of various machine learning libraries (pytorch, tensorflow, jax) could use the function.

I think this function is special because it is one of the only special functions not provided by scipy that I use in my Efax library, the others being abs_square (open issue in NumPy), softplus (being added to SciPy) and inverse_softplus (not proposed).

@NeilGirdhar NeilGirdhar added the enhancement A new feature or improvement label Apr 10, 2024
@NeilGirdhar NeilGirdhar changed the title ENH: Add one_hot ENH: Add the one-hot special function Apr 10, 2024
@ilayn
Copy link
Member

ilayn commented Apr 10, 2024

I don't want to re-ignite the discussion that we had over linalg.kron #20077 but this seems to me a better fit for libraries implementing themselves for two reasons;

  • Array creation and manipulations are fundamentally much more performant than us non-natively manipulating indices at Cython or Python level. For example NumPy, JAX and so on can do this during the array creation at the baremetal level.

  • Very often this is a sesqui-line operation or something like that

    inds, num_classes = [2, 6, 8], 10
    A = np.zeros([len(inds), num_classes])
    A[np.arange(len(inds)), inds] = 1

    and due to this fancy indexing (any high level indexing actually), this is bound to be thousands times slower than array libraries marking entries by 1. upon creation

Hence not quite sure why things are coming SciPy way. Array API in my understanding is not meant for adding all missing functions to SciPy but this should probably go into Array API spec. (That was my argument for kron too).

Also if we accept it, it should go to linalg.special_matrices

@NeilGirdhar
Copy link
Contributor Author

Array creation and manipulations are fundamentally much more performant than us non-natively manipulating indices at Cython or Python level.

I'm not sure if it was clear from the issue text, but the reason I'm proposing it for scipy is because of the Array API. Once it takes hold, for my library to be Array API compliant, it can no longer use the Jax library code. I will be forced to either write my own functions or use an Array API library. Since I'm already using SciPy, and since SciPy intends to port most of its special functions to the Array API, I thought it made sense to ask for the function to be added here.

A = np.zeros([len(inds), num_classes])
A[np.arange(len(inds)), inds] = 1

As far as I can tell, "fancy indexing" is not yet supported by the Array API. See discussions:
data-apis/array-api#177
data-apis/array-api#410
data-apis/array-api#669

Array API in my understanding is not meant for adding all missing functions to SciPy but this should probably go into Array API spec.

The Array API people have consistently said that these kinds of library functions belong outside the Array API. That may change one day, but for now, I didn't think it was worth even trying to get them to accept this.

Also if we accept it, it should go to linalg.special_matrices

Okay interesting, didn't know about this, thanks. Now that I think about it, I could do this with unit_impulse if unit-impulse supported broadcasting in idx--not sure if that would be more popular.

@ilayn
Copy link
Member

ilayn commented Apr 10, 2024

Once it takes hold, for my library to be Array API compliant, it can no longer use the Jax library code.

No it does. You can use exactly what we are doing with first getting the array namespace and check if it has linalg in it and push it there. SciPy is not the lynchpin of anything. This is something Array API folks really need to make it crystal clear in my opinion. SciPy should not (and could not) be the only library providing anything that is missing from the spec.

As far as I can tell, "fancy indexing" is not yet supported by the Array API.

Then honestly, I don't know how we can implement anything on our side without indexing. There must be more to that discussion but let's wait for other opinions first before I appear like I represent all SciPy folks.

@ev-br
Copy link
Member

ev-br commented Apr 10, 2024

This is something Array API folks really need to make it crystal clear in my opinion. SciPy should not (and could not) be the only library providing anything that is missing from the spec.

+1.
When it's in the spec, scipy may implement it, not the other way around.

Until it is the array API spec, we'd need a different argument as to why it's worth including in scipy.

@NeilGirdhar
Copy link
Contributor Author

No it does. You can use exactly what we are doing with first getting the array namespace and check if it has linalg in it and push it there.

Sorry, I'm not sure what you mean here. I can't check for one_hot in the namespace because it won't be there. I guess I guess I could switch on is_jax_array and other related functions, and then call the appropriate function. But that switch is exactly what I was hoping would be in SciPy.

When it's in the spec, scipy may implement it,

Sorry, I think we're talking past each other here. If it were in the spec, I wouldn't need SciPy to implement it since it would be in the Array API namespace, and I could call it there. Or else do you expect a special function extension to be specified one day?

Already, SciPy has a broad variety of useful functions (including the special functions) that are not in the spec, but will support the Array API. This suggestion is about adding one more useful function. As for why it's useful, it comes up quite a bit in machine learning.


What about making unit_impulse support broadcasting, and making it work with the Array API?

@ilayn
Copy link
Member

ilayn commented Apr 10, 2024

Yes, let me be a bit more specific. So this Array API is made for all supporting libraries right? Not just SciPy. That means all participating libraries should have some sort of a mechanism to distinguish the namespace. We do it typically with code like this (taken from #19260) ;

    xp = array_namespace(a, b)

    # Our own implementation if it is a NumPy array
    if is_numpy(xp):
        return _solve(a, b, lower=lower, overwrite_a=overwrite_a,
                      overwrite_b=overwrite_b, assume_a=assume_a, transposed=transposed)
    unsupported_args = {
        'lower': lower,
        'assume_a': assume_a != 'gen',
        'transposed': transposed
    }
    # something else is handled here 
    if any(unsupported_args.values()):
        xp_unsupported_args(unsupported_args)
    if hasattr(xp, 'linalg'):
        dtype = xp.result_type(a, b, xp.float32)
        a = xp.astype(a, dtype)
        b = xp.astype(b, dtype)
        return xp.linalg.solve(a, b)  # Array agnostic linalg.solve whatever the array type is 
    # object library does not support linalg then back to numpy
    a = np.asarray(a)
    b = np.asarray(b)
    return xp.asarray(_solve(a, b))

This is a mechanism all participating libraries have to do apparently because otherwise it's just SciPy that can do this. That is what we mean. So we are trying to support Array API but if something is not there, it shouldn't be automatically, "it should go to SciPy". That is I am trying to convey.

@lucascolley
Copy link
Member

lucascolley commented Apr 10, 2024

Or else do you expect a special function extension to be specified one day?

See data-apis/array-api#725.

I don't think that the question of whether this could be included in SciPy is unreasonable. For context @NeilGirdhar , see #19404. @steppi and @izaid are working on making the current implementations in scipy.special more portable (mainly for CUDA right now) via C++, with a vision to breaking the module out into a separate package which could see wider use. The goal (as far as I understand) would be for a subset of this module to materialise in the spec as an extension module, and array libraries would have a reference implementation in C++ which they could use or take inspiration from.


Already, SciPy has a broad variety of useful functions (including the special functions) that are not in the spec, but will support the Array API. This suggestion is about adding one more useful function.

The concern from SciPy's side is valid - we can't have "one more useful function" turn into "one more useful function" x 50 unfortunately.


This is a mechanism all participating libraries have to do apparently because otherwise it's just SciPy that can do this. That is what we mean. So we are trying to support Array API but if something is not there, it shouldn't be automatically, "it should go to SciPy". That is I am trying to convey.

Indeed, @ilayn is correct here that from the array library interoperability perspective, it doesn't matter whether such a function belongs in SciPy or in a downstream package. The real question is whether this best belongs within the scope of scipy.linalg.special_matrices, within the scope of scipy.special as it undergoes its current transformation, or downstream (or perhaps upstream, in an extension of the array API spec).

@lucascolley lucascolley changed the title ENH: Add the one-hot special function ENH: add one-hot special function or support broadcasting in signal.unit_impulse Apr 10, 2024
@NeilGirdhar
Copy link
Contributor Author

Makes sense. Didn't know about the special functions extension! But I guess from this discussion, this wouldn't belong there. Not sure where it belongs.

@izaid
Copy link
Contributor

izaid commented Apr 10, 2024

Makes sense. Didn't know about the special functions extension! But I guess from this discussion, this wouldn't belong there. Not sure where it belongs.

Yeah, it's a good question. I think this is really something that should be implemented at the Python level, essentially using what @ilayn put above. Pretty sure it could be done with the array API as it is now, including broadcasting. But where to put it, who knows!

@lucascolley lucascolley reopened this May 25, 2024
@lucascolley lucascolley closed this as not planned Won't fix, can't repro, duplicate, stale May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@izaid @NeilGirdhar @ilayn @ev-br @lucascolley and others