Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

memory error with strings in structured arrays #473

Closed
jakevdp opened this Issue Oct 1, 2012 · 6 comments

Comments

Projects
None yet
4 participants
Contributor

jakevdp commented Oct 1, 2012

The following code writes to memory blocks beyond the limits of the array, resulting in memory errors in both released and dev version of numpy (I've tried 1.6.2 and 1.8.0.dev):

In [1]: import numpy as np

In [2]: dtype = [('I', int), ('S', str)]

In [3]: x = np.zeros(3, dtype=dtype)

In [4]: x['S'] = ['a', 'b', 'c']

The problem (I believe) is that passing str to dtype results in a zero-length string, and this is not checked on assignment.

glimmercn added a commit to glimmercn/numpy that referenced this issue Apr 4, 2013

BUG: solve issue #2649.
The issue is similar to #473, multipliaction of a matrix and a
column-vector returns a row-vector. That solution is to convert 1d
array, list, tuple to matrix before multiplying. What causes #2649 is that
matrix will use dot function which is inherited from ndarray. My solution is
to use __mul__ function instead. It will throw an error when user try to
multiply a squre matrix and a 1d array.

glimmercn added a commit to glimmercn/numpy that referenced this issue Apr 5, 2013

BUG: fix 2649.
The issue is caused by dot function inherited from ndarray. We can use
__mul__ instead, which has solved the 1-D vector problem in #473. One
test is added.
Contributor

embray commented Oct 8, 2015

By taking the example in this issue a little further we demonstrate the root of this issue. For some reason when you call np.zeros (and probably other functions) with a zero-width string type, it replaces the dtype with a one-width string (in the case of unicode this means 'U1', so 4 bytes):

>>> x = np.zeros(3, dtype='S0')
>>> x
array([b'', b'', b''], 
      dtype='|S1')

Unexpected, but whatever. If backward compatibility is a concern I wouldn't suggest changing this. However, when using a structured dtype that includes a zero width string no such conversion occurs:

>>> dt = np.dtype([('I', 'int64'), ('S', bytes)])
>>> z = np.zeros(3, dtype=dt)
>>> z
array([(0, b''), (0, b''), (0, b'')], 
      dtype=[('I', '<i8'), ('S', 'S')])
>>> z.itemsize
8

But when accessing the 'S' field, the array returned again promotes the dtype to 'S1':

>>> z['S']
array([b'', b'', b'Q'], 
      dtype='|S1')

This is definitely a bug and should be changed.

Owner

charris commented Oct 8, 2015

Looks like there will be a 1.10.1 real soon now on account of the reported problems with msvc 2008, so it would be good to get a fix in. I assume this impacts astropy?

Contributor

embray commented Oct 8, 2015

@charris Fortunately this impacts astropy only tangentially. I've never run into it myself. It could in principle though. Honestly #2196 is more impactful in that it's pretty inconvenient. Though I can imagine a case where this bug would bite too in certain FITS files (albeit obscure).

So I'll investigate how difficult this is to fix, and it would be great to get a fix in 1.10.1 if I can come up with a reasonable patch. But it's not that important either.

Member

ahaldane commented Oct 8, 2015

Well, this bug is involved in quite a few segfault issues which would be nice to fix.

@embray, that sounds like the problem, nice find. Do you happen to know where the type conversion happens? I think I looked for it once but didn't find it.

Contributor

embray commented Oct 8, 2015

It's in PyArray_NewFromDescr_int. There are a few different solutions to this I'm thinking of. Will try one and see how it pans out...

embray added a commit to embray/numpy that referenced this issue May 31, 2016

BUG: Fix numerous bugs related to zero-width string arrays (#473, #1901
…, #2196, #2585, #4955)

PyArray_NewFromDescr normally does not allow zero-width string dtypes (or rather, it automatically converts them to 1-width strings).  This affects any code that uses PyArray_NewFromDescr, which is a lot.  So we extend PyArray_NewFromDescr_int to allow disabling this functionality in a couple narrow cases where it's appropriate--one is when extracting a field from a structured array that has a zero-width string dtype (which, intentionally or not, has been allowed).  The other, which is related, is returning a view of an array that has a zero-width string dtype.  This shouldn't otherwise break or change any existing behavior.

Remove roadblocks to creating structured dtypes with zero-width fields using dict-based constructors (this was possible by other means such as the list-based constructor--with the previous fix in particular it should not be necessary to block this anymore).

Adds tests based on the tests cases given in the issues this is fixing.

Fix a bug with array to zero-width array assignment revealed by the tests.  I am slightly concerned that the blunt-force check for this in raw_array_assign_array may be masking a bug somewhere else though.
Member

ahaldane commented Jun 2, 2016

Fixed by #6430

@ahaldane ahaldane closed this Jun 2, 2016

embray added a commit to embray/numpy that referenced this issue Dec 12, 2016

BUG: Fix numerous bugs related to zero-width string arrays (#473, #1901
…, #2196, #2585, #4955)

PyArray_NewFromDescr normally does not allow zero-width string dtypes (or rather, it automatically converts them to 1-width strings).  This affects any code that uses PyArray_NewFromDescr, which is a lot.  So we extend PyArray_NewFromDescr_int to allow disabling this functionality in a couple narrow cases where it's appropriate--one is when extracting a field from a structured array that has a zero-width string dtype (which, intentionally or not, has been allowed).  The other, which is related, is returning a view of an array that has a zero-width string dtype.  This shouldn't otherwise break or change any existing behavior.

Remove roadblocks to creating structured dtypes with zero-width fields using dict-based constructors (this was possible by other means such as the list-based constructor--with the previous fix in particular it should not be necessary to block this anymore).

Adds tests based on the tests cases given in the issues this is fixing.

Fix a bug with array to zero-width array assignment revealed by the tests.  I am slightly concerned that the blunt-force check for this in raw_array_assign_array may be masking a bug somewhere else though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment