Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix segfault in np.random.shuffle for arrays of different length strings #7719

Merged
merged 1 commit into from
Jun 10, 2016

Conversation

simongibbons
Copy link
Contributor

np.random.shuffle will allocate a buffer based on the size of the first
element of an array of strings. If the first element is smaller than
another in the array this buffer will overflow, causing a segfault
when garbage is collected.

This fixes the issue by ensuring the buffer is allocated based upon
the dtype of the array rather than just that of the first element.

Fixes #7710

@rkern
Copy link
Member

rkern commented Jun 9, 2016

Good catch. LGTM.

@seberg
Copy link
Member

seberg commented Jun 9, 2016

Maybe it would be better to use arr[0, ...] instead? Avoid the roundtrip to potential scalars?

@simongibbons
Copy link
Contributor Author

@seberg you're not wrong, a quick benchmark makes it look like that is ~6x faster!

@mhvk
Copy link
Contributor

mhvk commented Jun 9, 2016

I'd use arr[:1] instead, which I think is a bit clearer (the ... suggest trailing dimensions that may not be there, so to me it needs a bit more braintwisting to see why this is useful). But really this is nitpicking!

@jaimefrio
Copy link
Member

What we should have for sure is a comment in the code explaining why this is needed.

@simongibbons
Copy link
Contributor Author

@jaimefrio Yes I agree, whilst the code was self explanatory when using the dtype argument it now certainly needs a comment.

@mhvk I think that version is easier on the old grey matter as well!

@@ -5064,7 +5064,8 @@ cdef class RandomState:
x_ptr = <char*><size_t>x.ctypes.data
stride = x.strides[0]
itemsize = x.dtype.itemsize
buf = np.empty_like(x[0]) # GC'd at function exit
# Ensure that buf has the same dtype as x
buf = np.empty_like(x[:1]) # GC'd at function exit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And arr[:1] leaves a dimension that has no meaning ;) (below doing there buf[0, ...] would somewhat make more sense then ;), at least to me). Anyway, don't care too much.
I don't like the comment though, it explains nothing unless you know the right question already. Rather explain that x[0] would be a scalar, and that this may also lose the dtype information e.g. string types (it actually gets even worse, x[0] could be another array, and things likely currently blow up (could be a good test).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think the object cases were probably already all fine (though I am not 100% sure, but oh wlel...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing speaks more loudly to you than to the rest of us ;) Agree about the comment, more explanation is needed.

Copy link
Member

@charris charris Jun 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

# The scalar x[0] has a dtype just big enough to hold that string,
# so use x[:1] to get a dtype big enough to hold any string in x.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add at start: "If x is one-dimensional, x[0] would be a scalar with a dtype just ..."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a reference counting bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course, the view will incref its elements, then they get shuffled under its feet and the decref goes to the wrong one. Easiest solution is likely to just avoid that 1D optimization for object arrays by checking arr.dtype.hasobject first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, well sighted, definitely needs a comment as well.

Copy link
Member

@charris charris Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit subtle, but buf is left containing a reference to some object in x, which is decrefed when the buffer is destroyed. Furthermore, buf begins containing a reference to None, so there is also a memory leak. The only other fix I can see is to define the buffer as int8 with length itemsize or simply use malloc and free, which avoids all that ref counting stuff..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is maybe more elegant. Should probably make sure the raw
shuffle does not release the GIL then (no idea if it does).

@charris charris added this to the 1.11.1 release milestone Jun 9, 2016
np.random.shuffle will allocate a buffer based on the size of the first
element of an array of strings. If the first element is smaller than
another in the array this buffer will overflow, causing a segfault
when garbage is collected.

Additionally if the array contains objects then one would be left
in the buffer and have it's refcount erroniously decrimented on
function exit, causing that object to be deallocated too early.

To fix this we change the buffer to be an array of int8 of the
the size of the array's dtype, which sidesteps both issues.

Fixes numpy#7710
@simongibbons
Copy link
Contributor Author

@charris Yes you're right that was a subtle one to track down. Very nice spot. I also like your solution, it's very elegant.

I've implemented that now and added a test for the object case.

Given that all of the swaps in the multidimensional case are done in python rather than doing memcpys so the object left in the buffer there will be correctly refcounted and the buffer was picking up the correct dtype anyway. I've left that part of the code unchanged.

@charris charris merged commit 5056c4c into numpy:master Jun 10, 2016
@simongibbons simongibbons deleted the segfault_permutation branch June 10, 2016 14:08
@charris charris removed this from the 1.11.1 release milestone Jun 10, 2016
@charris
Copy link
Member

charris commented Jun 10, 2016

Thanks @simongibbons .

@seberg
Copy link
Member

seberg commented Jun 10, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

numpy 1.11. Segfault: numpy.random.permutation on list of long strings
6 participants