Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.pad(..., mode="reflect") on an empty array goes into an infinite loop #9560

Closed
BT123 opened this issue Aug 15, 2017 · 13 comments
Closed

np.pad(..., mode="reflect") on an empty array goes into an infinite loop #9560

BT123 opened this issue Aug 15, 2017 · 13 comments
Labels

Comments

@BT123
Copy link

BT123 commented Aug 15, 2017

When I test librosa library for speech recognition, I found a bug in librosa.stft and librosa.pad_center function which used numpy.pad function.

In my test, it will stick into infinite loop ,once accepted a empty list or ndarray, the numpy.pad function will stick into infinite loop.

Although the comment says that the function need a rank 1 array, but I think it still need to check the input array to avoid the above problem.

Look forward to your reply!

@njsmith
Copy link
Member

njsmith commented Aug 15, 2017

Can you give an example of a numpy.pad call that produces an infinite loop? Ideally something we can run by cutting and pasting :-)

@BT123
Copy link
Author

BT123 commented Aug 15, 2017

like this :
In [1]: import numpy as np

In [2]: mm = np.ndarray(0)

In [3]: mm
Out[3]: array([], dtype=float64)

In [4]: np.pad(mm,100,'reflect')

@BT123
Copy link
Author

BT123 commented Aug 15, 2017

And I print the parameters to a text file, so that we can easily confirm it.

This is the part of the text file:
iter_b: -1
iter_a: -1
before: 101
after: 101
safe_pad: -3
safe_0: -1

iter_b: -3
iter_a: -3
before: 104
after: 104
safe_pad: -9
safe_0: -1

iter_b: -9
iter_a: -9
before: 113
after: 113
safe_pad: -27
safe_0: -1

iter_b: -27
iter_a: -27
before: 140
after: 140
safe_pad: -81
safe_0: -1

iter_b: -81
iter_a: -81
before: 221
after: 221
safe_pad: -243
safe_0: -1

iter_b: -243
iter_a: -243
before: 464
after: 464
safe_pad: -729
safe_0: -1

In the text file, the safe_0 is the initial value of the safe_pad, and others are easily to know which parameters they represent.

@njsmith
Copy link
Member

njsmith commented Aug 15, 2017

When given a zero-dimensional array (also known as a scalar), pad seems to return the input unchanged:

In [1]: np.pad(np.array(0), 100, "reflect")
Out[1]: array(0)

The docs say that pad adds padding along each axis, and a scalar has zero axes, so returning the input unchanged looks correct to me.

Maybe this is a bug in how librosa is expecting pad to work? @bmcfee?

@BT123
Copy link
Author

BT123 commented Aug 15, 2017

The bug is happened when give a empty list or numpy.ndarray

ll=[]
or
mm=numpy.ndarray(0)

@njsmith
Copy link
Member

njsmith commented Aug 15, 2017

Whoops, I misread, sorry. Yeah, I can confirm that np.pad([], 100, 'reflect') never returns. It initializes safe_pad to the array's size - 1, which makes it negative, and then everything falls apart.

I think what's going on is that if you have, say, an array with 10 elements and you ask for 20 elements of reflected padding... it can't actually do that directly, because there are only 10 elements there to reflect. (And in fact you really only have 9, because the reflection of [1, 2, 3] is [1, 2, 3, 2, 1], you don't repeat the edge element.) So it tries to first reflect 9, to get an array with 19 elements, and then it tries again to reflect the remaining 11 elements, and this time it succeeds.

If you have an empty array, though, then... what does reflected padding even mean? There aren't any elements to reflect. I think the only thing to do in this case is to detect it and raise an error.

@BT123
Copy link
Author

BT123 commented Aug 15, 2017

Yeah, I read the comment and learn more deeply from your explaination!

When I use librosa.load function to read an audio file, it return an empty array in some cases, and this will cause the python process stuck but don't tell me the wrong place. And it takes some hours to find it.

Thank you!

@njsmith njsmith changed the title missing input validation np.pad(..., mode="reflect") on an empty array goes into an infinite loop Aug 15, 2017
@bmcfee
Copy link

bmcfee commented Aug 15, 2017

Maybe this is a bug in how librosa is expecting pad to work? @bmcfee?

Interesting. If it's a bug in librosa, I think it's going to be down to not checking for non-empty input, rather than violating expectations of np.pad output. I hadn't considered what happens when calling pad on scalars, but your explanation (iterating over an empty list of axes) makes sense.

I'll tack the array shape check onto our audio validation issue librosa/librosa#612 , which should result in early failure for these cases.

@BT123
Copy link
Author

BT123 commented Aug 15, 2017

Yes, I find this bug when I tested librosa library last week.

When I use librosa.load function to load an special constructed audio file, it returns a empty array. And in the stft function, it will pass the check of the util.valid_audio function and return true.

Thanks for your reply! @bmcfee

@sandrotosi
Copy link
Contributor

hey there! do you plan to cut a point release with this fix? it has currently assigned a CVE (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-12852) so it seems like the right way to go with an upstream release instead of just patching it for debian

@njsmith
Copy link
Member

njsmith commented Sep 5, 2017

@sandrotosi: AFAIK "assigned a CVE" just means "someone filed for a CVE number", it isn't a judgment of severity. Does Debian consider this high priority? It's theoretically possible that there's some piece of network-facing code out there that calls np.pad on untrusted input, but I'm not aware of any...?

@sandrotosi
Copy link
Contributor

i know i know, but still someone took the time to request a CVE and open a request to get this fixed in Debian - nothing urgent, i think the risk is rather low, still i would consider to include the fix in any upcoming release

@rgommers
Copy link
Member

Because I was pointed at this today, let me add a comment on the status here. Debian classifier this as Urgency: unimportant with comment "negligible security impact" (see here).

In general, I'd say we consider CVEs like this invalid. If you can call np.pad([]) (or any arbitrary function call really) and it causes an infinite loop, then you can also directly cause an infinite loop in Python (for _ in range(1000000): do_something_expensive(). So a DDoS is nonsense here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants