COMPAT: infer larger than unit64 to object dtype #18584

qiaobz · 2017-12-01T03:05:26Z

Code Sample, a copy-pastable example if possible

# Your code here
querySql = "select col1 from Table1" # col1 is NUMBER(20) in the oracle databse, example: 20124400100030425447
df = pd.read_sql(querySql, db)  # db is the oracle connection

Problem description

raise "OverflowError: long too big to convert", it may be the uint64 problem?

In [1]: import cx_Oracle

In [2]: db = cx_Oracle.connect(user='TEST', password='0000', dsn='localhost:1521/orcl')

In [3]: import pandas as pd

In [4]: df = pd.read_sql('select col1 from TABLE1',db)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-4-318bf64d0a9b> in <module>()
----> 1 df = pd.read_sql('select col1 from TABLE1',db)

C:\Program Files\Anaconda\lib\site-packages\pandas\io\sql.pyc in read_sql(sql, con, index_col, coerce_float, params, parse_dates, colu
mns, chunksize)
    398             sql, index_col=index_col, params=params,
    399             coerce_float=coerce_float, parse_dates=parse_dates,
--> 400             chunksize=chunksize)
    401
    402     try:

C:\Program Files\Anaconda\lib\site-packages\pandas\io\sql.pyc in read_query(self, sql, index_col, coerce_float, params, parse_dates, c
hunksize)
   1456             frame = _wrap_result(data, columns, index_col=index_col,
   1457                                  coerce_float=coerce_float,
-> 1458                                  parse_dates=parse_dates)
   1459             return frame
   1460

C:\Program Files\Anaconda\lib\site-packages\pandas\io\sql.pyc in _wrap_result(data, columns, index_col, coerce_float, parse_dates)
    155
    156     frame = DataFrame.from_records(data, columns=columns,
--> 157                                    coerce_float=coerce_float)
    158
    159     _parse_date_columns(frame, parse_dates)

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.pyc in from_records(cls, data, index, exclude, columns, coerce_float, nr
ows)
   1140         else:
   1141             arrays, arr_columns = _to_arrays(data, columns,
-> 1142                                              coerce_float=coerce_float)
   1143
   1144             arr_columns = _ensure_index(arr_columns)

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.pyc in _to_arrays(data, columns, coerce_float, dtype)
   6249     if isinstance(data[0], (list, tuple)):
   6250         return _list_to_arrays(data, columns, coerce_float=coerce_float,
-> 6251                                dtype=dtype)
   6252     elif isinstance(data[0], collections.Mapping):
   6253         return _list_of_dict_to_arrays(data, columns,

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.pyc in _list_to_arrays(data, columns, coerce_float, dtype)
   6328         content = list(lib.to_object_array(data).T)
   6329     return _convert_object_array(content, columns, dtype=dtype,
-> 6330                                  coerce_float=coerce_float)
   6331
   6332

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.pyc in _convert_object_array(content, columns, coerce_float, dtype)
   6394         return arr
   6395
-> 6396     arrays = [convert(arr) for arr in content]
   6397
   6398     return arrays, columns

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.pyc in convert(arr)
   6390     def convert(arr):
   6391         if dtype != object and dtype != np.object:
-> 6392             arr = lib.maybe_convert_objects(arr, try_float=coerce_float)
   6393             arr = maybe_cast_to_datetime(arr, dtype)
   6394         return arr

pandas/_libs/src\inference.pyx in pandas._libs.lib.maybe_convert_objects()

OverflowError: long too big to convert

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 8.1
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.0
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

gfyoung · 2017-12-01T22:14:53Z

@qiaobz : Thanks reporting this! Unfortunately, this is a little hard to reproduce. Could you provide a reproducible example by creating a small table in memory (using Python's sqlite library) that will cause this error to surface?

qiaobz · 2017-12-02T02:40:20Z

sorry for troubling you

when using sqllite3, it's no problem

import sqlite3
import pandas as pd
conn = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute('''CREATE TABLE "TABLE1" ("COL1" NUMBER(20,0))''')
cur.execute('''INSERT INTO "TABLE1"("COL1") VALUES ("20124400100030425447")''')
pd.read_sql("select COL1 FROM TABLE1", conn)

and the ouput is, (no error)

           COL1
0  2.012440e+19

reproduce the problem:

oracle database, create table and insert data

my config  

user: Test  
password: 0000  
dsn: localhost:1521/orcl

CREATE TABLE "TEST"."TABLE1" ("COL1" NUMBER(20,0))
INSERT INTO "TABLE1"("COL1") VALUES ('20124400100030425447')

use cx_Oracle and pandas

import pandas as pd
import cx_Oracle
db = cx_Oracle.connect(user='TEST', password='0000', dsn='localhost:1521/orcl')
querysql = "SELECT COL1 FROM TABLE1"
df = pd.read_sql(querysql, db)

and then cause “OverflowError: long too big to convert”

gfyoung · 2017-12-02T02:43:13Z

@qiaobz : Interesting...can you provide a stacktrace for when you try to read from the Oracle DB? If so, can you add it to your original issue too? That will be useful for anyone in the future.

qiaobz · 2017-12-02T02:53:43Z

my fault, I add it now

In [1]: import cx_Oracle

In [2]: db = cx_Oracle.connect(user='TEST', password='0000', dsn='localhost:1521/orcl')

In [3]: import pandas as pd

In [4]: df = pd.read_sql('select col1 from TABLE1',db)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-4-318bf64d0a9b> in <module>()
----> 1 df = pd.read_sql('select col1 from TABLE1',db)

C:\Program Files\Anaconda\lib\site-packages\pandas\io\sql.pyc in read_sql(sql, con, index_col, coerce_float, params, parse_dates, colu
mns, chunksize)
    398             sql, index_col=index_col, params=params,
    399             coerce_float=coerce_float, parse_dates=parse_dates,
--> 400             chunksize=chunksize)
    401
    402     try:

C:\Program Files\Anaconda\lib\site-packages\pandas\io\sql.pyc in read_query(self, sql, index_col, coerce_float, params, parse_dates, c
hunksize)
   1456             frame = _wrap_result(data, columns, index_col=index_col,
   1457                                  coerce_float=coerce_float,
-> 1458                                  parse_dates=parse_dates)
   1459             return frame
   1460

C:\Program Files\Anaconda\lib\site-packages\pandas\io\sql.pyc in _wrap_result(data, columns, index_col, coerce_float, parse_dates)
    155
    156     frame = DataFrame.from_records(data, columns=columns,
--> 157                                    coerce_float=coerce_float)
    158
    159     _parse_date_columns(frame, parse_dates)

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.pyc in from_records(cls, data, index, exclude, columns, coerce_float, nr
ows)
   1140         else:
   1141             arrays, arr_columns = _to_arrays(data, columns,
-> 1142                                              coerce_float=coerce_float)
   1143
   1144             arr_columns = _ensure_index(arr_columns)

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.pyc in _to_arrays(data, columns, coerce_float, dtype)
   6249     if isinstance(data[0], (list, tuple)):
   6250         return _list_to_arrays(data, columns, coerce_float=coerce_float,
-> 6251                                dtype=dtype)
   6252     elif isinstance(data[0], collections.Mapping):
   6253         return _list_of_dict_to_arrays(data, columns,

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.pyc in _list_to_arrays(data, columns, coerce_float, dtype)
   6328         content = list(lib.to_object_array(data).T)
   6329     return _convert_object_array(content, columns, dtype=dtype,
-> 6330                                  coerce_float=coerce_float)
   6331
   6332

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.pyc in _convert_object_array(content, columns, coerce_float, dtype)
   6394         return arr
   6395
-> 6396     arrays = [convert(arr) for arr in content]
   6397
   6398     return arrays, columns

C:\Program Files\Anaconda\lib\site-packages\pandas\core\frame.pyc in convert(arr)
   6390     def convert(arr):
   6391         if dtype != object and dtype != np.object:
-> 6392             arr = lib.maybe_convert_objects(arr, try_float=coerce_float)
   6393             arr = maybe_cast_to_datetime(arr, dtype)
   6394         return arr

pandas/_libs/src\inference.pyx in pandas._libs.lib.maybe_convert_objects()

OverflowError: long too big to convert

gfyoung · 2017-12-02T03:36:19Z

Hmm, that's very strange. Here's what I suggest you do:

Create a new conda environment with pandas installed. Make sure you can reproduce the error.
Try installing the most updated version of pandas (we're at 0.21, but you are using 0.20.1) and see if the error persists still.
Locate the line where you initialize the returned DataFrame. That would correspond to this block in your stack trace (though the filepath is different because you would be in a different conda environment):

C:\Program Files\Anaconda\lib\site-packages\pandas\io\sql.pyc in _wrap_result(data, columns, index_col, coerce_float, parse_dates)
    155
    156     frame = DataFrame.from_records(data, columns=columns,
--> 157                                    coerce_float=coerce_float)
    158
    159     _parse_date_columns(frame, parse_dates)

Add a line above to print the parameters to DataFrame.from_records and rerun the code. I would be curious to see what the Oracle DB is giving you compared to in-memory.

qiaobz · 2017-12-02T04:50:03Z

thx, I try to new a conda environment. but my pandas in this issue is 0.21.0, you can see the INSTALLED VERSIONS(pd.show_versions() output)
btw, I found that 20124400100030425447 is beyond the range of uinit64, such like 10124420010000167598 will cause no error

gfyoung · 2017-12-02T05:08:28Z

btw, I found that 20124400100030425447 is beyond the range of uinit64

Right, which is why I was asking you what value the Oracle DB is returning compared to the in-memory version. Were you able to figure that out by following my instructions?

qiaobz · 2017-12-02T07:53:49Z

the parameters to DataFrame.from_records:

Oracle DB
[(20124400100030425447L,)]
in-memory version
[(2.0124400100030427e+19,)]

Hmm, so I complicate the problem.
We can easily reproduce the error

import pandas as pd
df = pd.DataFrame([(20124400100030425447L,)])

then casue error

jreback · 2017-12-02T15:53:51Z

If you really need integers this large then you have to explicitly set them with object dtype to avoid the conversions.

In [4]: df = pd.DataFrame([(20124400100030425447,)], dtype=object)

In [5]: df
Out[5]: 
                      0
0  20124400100030425447

In [6]: df.dtypes
Out[6]: 
0    object
dtype: object

In [7]: np.iinfo(np.int64).max
Out[7]: 9223372036854775807

jorisvandenbossche · 2017-12-03T08:25:09Z

Shouldn't we infer this? Numpy does:

In [9]: np.array([20124400100030425447])
Out[9]: array([20124400100030425447], dtype=object)

gfyoung · 2017-12-03T08:33:57Z

Shouldn't we infer this?

I don't see why not? Is there any reason why historically we didn't do it?

qiaobz · 2017-12-03T09:13:11Z

two little question

there is no dtype parameters to read_sql()
pd.DataFrame(cur.fetchall(), dtype=object) # cur is db.cursor() will set all columns with object dtype

jreback · 2017-12-03T15:24:42Z

yeah this is a bit of an unhandled case in maybe_convert_objects

@gfyoung would you have a look?

For integers larger than what uint64 can handle, we gracefully default to the object dtype instead of overflowing. Closes pandas-devgh-18584.

For integers larger than what uint64 can handle, we gracefully default to the object dtype instead of overflowing. For integers smaller than what int64 can handle, we gracefully default to the object dtype instead of overflowing. Closes gh-18584.

gfyoung added the IO SQL to_sql, read_sql, read_sql_query label Dec 1, 2017

jreback closed this as completed Dec 2, 2017

jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 2, 2017

jreback added this to the won't fix milestone Dec 2, 2017

jreback reopened this Dec 3, 2017

jreback modified the milestones: won't fix, Next Major Release Dec 3, 2017

jreback added Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate labels Dec 3, 2017

jreback changed the title ~~read_sql() Raise "OverflowError: long too big to convert"~~ COMPAT: infer larger than unit64 to object dtype Dec 3, 2017

gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 4, 2017

BUG: Don't overflow in DataFrame init with uint

3a53c7d

For integers larger than what uint64 can handle, we gracefully default to the object dtype instead of overflowing. Closes pandas-devgh-18584.

gfyoung mentioned this issue Dec 4, 2017

BUG: Don't overflow in DataFrame init #18624

Merged

gfyoung modified the milestones: Next Major Release, 0.22.0 Dec 4, 2017

gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 4, 2017

BUG: Don't overflow in DataFrame init with uint

9d39610

For integers larger than what uint64 can handle, we gracefully default to the object dtype instead of overflowing. Closes pandas-devgh-18584.

gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 5, 2017

BUG: Don't overflow in DataFrame init with uint

f9c8991

For integers larger than what uint64 can handle, we gracefully default to the object dtype instead of overflowing. Closes pandas-devgh-18584.

gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 5, 2017

BUG: Don't overflow in DataFrame init with uint

5f5d332

For integers larger than what uint64 can handle, we gracefully default to the object dtype instead of overflowing. Closes pandas-devgh-18584.

gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 5, 2017

BUG: Don't overflow in DataFrame init with uint

fdc2b93

For integers larger than what uint64 can handle, we gracefully default to the object dtype instead of overflowing. Closes pandas-devgh-18584.

gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 5, 2017

BUG: Don't overflow in DataFrame init with uint

1c11b72

For integers larger than what uint64 can handle, we gracefully default to the object dtype instead of overflowing. Closes pandas-devgh-18584.

jorisvandenbossche closed this as completed in #18624 Dec 5, 2017

jorisvandenbossche removed the Compat pandas objects compatability with Numpy or Python functions label Dec 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COMPAT: infer larger than unit64 to object dtype #18584

COMPAT: infer larger than unit64 to object dtype #18584

qiaobz commented Dec 1, 2017 •

edited

INSTALLED VERSIONS

gfyoung commented Dec 1, 2017

qiaobz commented Dec 2, 2017

gfyoung commented Dec 2, 2017 •

edited

qiaobz commented Dec 2, 2017

gfyoung commented Dec 2, 2017 •

edited

qiaobz commented Dec 2, 2017

gfyoung commented Dec 2, 2017

qiaobz commented Dec 2, 2017 •

edited

jreback commented Dec 2, 2017

jorisvandenbossche commented Dec 3, 2017

gfyoung commented Dec 3, 2017

qiaobz commented Dec 3, 2017

jreback commented Dec 3, 2017

COMPAT: infer larger than unit64 to object dtype #18584

COMPAT: infer larger than unit64 to object dtype #18584

Comments

qiaobz commented Dec 1, 2017 • edited

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

gfyoung commented Dec 1, 2017

qiaobz commented Dec 2, 2017

gfyoung commented Dec 2, 2017 • edited

qiaobz commented Dec 2, 2017

gfyoung commented Dec 2, 2017 • edited

qiaobz commented Dec 2, 2017

gfyoung commented Dec 2, 2017

qiaobz commented Dec 2, 2017 • edited

jreback commented Dec 2, 2017

jorisvandenbossche commented Dec 3, 2017

gfyoung commented Dec 3, 2017

qiaobz commented Dec 3, 2017

jreback commented Dec 3, 2017

qiaobz commented Dec 1, 2017 •

edited

Output of `pd.show_versions()`

gfyoung commented Dec 2, 2017 •

edited

gfyoung commented Dec 2, 2017 •

edited

qiaobz commented Dec 2, 2017 •

edited