Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent integer conversion from strings (Trac #736) #1334

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 9 comments
Closed

Inconsistent integer conversion from strings (Trac #736) #1334

numpy-gitbot opened this issue Oct 19, 2012 · 9 comments

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/736 on 2008-04-12 by @stefanv, assigned to @teoliphant.

Pauli Virtanen noticed the following behaviour (mentioned as part of #1317 discussion):

In [8]: np.array([('123',), ('456',)], dtype=[('num', '<i8')])
Out[8]: 
array([(123L,), (456L,)], 
      dtype=[('num', '<i8')])

vs.

In [9]: np.array([('123',), ('456',)], dtype=[('num', '<i4')])

TypeError: expected a readable buffer object

I believe this is related to the following inconsistent integer-from-string conversions:

In [27]: np.int32('12')
Out[27]: 12

In [28]: np.int64('12')
Out[28]: array([1, 2], dtype=int64)
@numpy-gitbot
Copy link
Author

@stefanv wrote on 2008-04-12

Upon closing this ticket, please activate the test disabled in r5026.

@numpy-gitbot
Copy link
Author

@charris wrote on 2008-04-13

There is more to this problem:

In [5]: int8('1')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/home/charris/<ipython console> in <module>()

ValueError: setting an array element with a sequence.

In [6]: int16('1')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/home/charris/<ipython console> in <module>()

ValueError: setting an array element with a sequence.

In [7]: int32('1')
Out[7]: 1

In [8]: int64('1')
Out[8]: array([1], dtype=int64)

In [9]: uint8('1')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/home/charris/<ipython console> in <module>()

ValueError: setting an array element with a sequence.

In [10]: uint16('1')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/home/charris/<ipython console> in <module>()

ValueError: setting an array element with a sequence.

In [11]: uint32('1')
Out[11]: array([1], dtype=uint32)

In [12]: uint64('1')
Out[12]: array([1], dtype=uint64)

The ones that return arrays go though numpy routines that call PyNumber_Long on the string. I suspect int32 is a subtype of the python int so that it returns a number. We also have

In [10]: array('11',dtype=uint64)
Out[10]: array([1, 1], dtype=uint64)

In [11]: array('11',dtype=int8)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/home/charris/<ipython console> in <module>()

ValueError: setting an array element with a sequence.

In [12]: array('11').astype(int8)
Out[12]: array(11, dtype=int8)

In other words, we have an inconsistent mess on our hands. I think we need to decide what the behavior for all these functions should be, along with floats and complex, when presented with strings. And what to do when the strings are out of range. These problems arise in the setitem functions in arraytypes.inc.src and, I suspect, in the array function itself which treats the astype method and dtype keyword differently when strings are passed in.

@numpy-gitbot
Copy link
Author

@charris wrote on 2008-04-13

I've fixed the conversions so that all integers convert strings. All except int32 also have the to many items problem.

In [28]: np.int64('12')
Out[28]: array([1, 2], dtype=int64)

The root of this seems to lie in the array creation function.

@numpy-gitbot
Copy link
Author

@charris wrote on 2008-04-13

Travis,

I think you are the best person to finish off this bug.

@numpy-gitbot
Copy link
Author

@huard wrote on 2008-04-16

Replying to [comment:3 charris]:
I think this broke the timeseries scikit. On import, I get the following error:

/usr/local/lib64/python2.5/site-packages/scikits/timeseries/__init__.py in <module>()
     15 import tdates
     16 from tdates import *
---> 17 import tseries
     18 from tseries import *
     19 import trecords

/usr/local/lib64/python2.5/site-packages/scikits/timeseries/tseries.py in <module>()
   1113                       hard_mask=hard_mask,)
   1114 
-> 1115 tsmasked = TimeSeries(masked,dates=DateArray(Date('D',1)))
   1116 
   1117 ##### --------------------------------------------------------------------------

/usr/local/lib64/python2.5/site-packages/scikits/timeseries/tdates.pyc in __new__(cls, dates, freq, copy)
    179             _freq = check_freq(freq)
    180         # Get the dates ..........
--> 181         _dates = np.array(dates, copy=copy, dtype=int_, subok=1)
    182         if _dates.ndim == 0:
    183             _dates.shape = (1,)

TypeError: long() argument must be a string or a number, not 'timeseries.Date'

But

int_(dates)
1

Is it possible you added strict type checking ?

@numpy-gitbot
Copy link
Author

@charris wrote on 2008-04-16

The routine effectively calls PyNumber_Long where it used to call PyInt_AsLong. My guess is that PyNumber_Long doesn't recognize timeseries.Date as a long. Is timeseries.Date a subtype of long/int? What does long(timeseries.Date) do? I suspect adding a long method will fix the problem.

@numpy-gitbot
Copy link
Author

@charris wrote on 2008-04-16

Here's the problem in timeseries/src/c_dates.c

2246        (unaryfunc)DateObject___int__,       /* nb_int */
2247        (unaryfunc)0,                        /* nb_long */
2248        (unaryfunc)DateObject___float__,     /* nb_float */

Note the missing conversion to long. Should be easy to fix. Who needs to be notified?

@numpy-gitbot
Copy link
Author

@huard wrote on 2008-04-16

I notified Pierre G.M. this morning and he just made the change. Everything seems to work now. Thanks.

@numpy-gitbot
Copy link
Author

@charris wrote on 2008-04-25

Fixed in r5080. The original example looks like it came from a 64 bit OS where the <i8 was the same as the python int, which always worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant