Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong dtype for unicode field in np.rec.fromarrays() #4201

Closed
taldcroft opened this issue Jan 14, 2014 · 5 comments
Closed

Wrong dtype for unicode field in np.rec.fromarrays() #4201

taldcroft opened this issue Jan 14, 2014 · 5 comments

Comments

@taldcroft
Copy link

In Python 2 or 3 with Numpy 1.7.1 there seems to be problem with np.rec.fromarrays creating a dtype format that is a factor of 4 too large:

In [5]: a = np.array([u'xyz'])

In [6]: a.dtype
Out[6]: dtype('<U3')

In [7]: a2 = np.rec.fromarrays([a], names=['a'])

In [8]: a2.dtype
Out[8]: dtype([('a', '<U12')])

In [9]: a3 = np.rec.fromarrays([a2['a']], names=['a'])

In [10]: a3.dtype
Out[10]: dtype([('a', '<U48')])

It looks like the problem is here, where here itemsize is 12 for a 3-character unicode string with UCS-4 encoding:

            if issubclass(obj.dtype.type, nt.flexible):
                formats += repr(obj.itemsize)
@charris
Copy link
Member

charris commented Feb 24, 2014

Hah. Numpy unicode is 4 bytes wide, I'll bet that is where the factor of 4 comes from.

@charris
Copy link
Member

charris commented Feb 24, 2014

I'm guessing this is an easy fix. If not, it will be a pretty hard fix.

@taldcroft
Copy link
Author

😄

@embray
Copy link
Contributor

embray commented Jun 16, 2015

Ah, somehow missed this one before, but this was fixed by #5251. This issue can be closed.

@eric-wieser
Copy link
Member

Thanks @embray for noting that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants