Skip to content

BUG: read_csv is inconsistent with large exponent #62740

@Alvaro-Kothe

Description

@Alvaro-Kothe

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import io

n_digits = [11, 12, 13]
header = "h1,h2,h3"
data = ",".join("10e" + ("9" * nd) for nd in n_digits)
buf = io.StringIO(header + "\n" + data)
engines = {"c": [None, "high", "round_trip"], "python": [None], "pyarrow": [None]}

print("POSITIVE EXPONENT")
print("%-20s %-30s %-30s %-30s" % ("type", "11 digits", "12 digits", "13 digits"))
for engine, float_precisions in engines.items():
    for float_precision in float_precisions:
        df = pd.read_csv(buf, engine=engine, float_precision=float_precision)
        pos, neg1, neg2 = df.iloc[0, :]
        print("%-20s %-30s %-30s %-30s" % (f"{engine}-{float_precision}", pos, neg1, neg2))
        buf.seek(0)



data = ",".join("10e-" + ("9" * nd) for nd in n_digits)
buf = io.StringIO(header + "\n" + data)

print("\nNEGATIVE EXPONENT")
print("%-20s %-30s %-30s %-30s" % ("type", "11 digits", "12 digits", "13 digits"))
for engine, float_precisions in engines.items():
    for float_precision in float_precisions:
        df = pd.read_csv(buf, engine=engine, float_precision=float_precision)
        pos, neg1, neg2 = df.iloc[0, :]
        print("%-20s %-30s %-30s %-30s" % (f"{engine}-{float_precision}", pos, neg1, neg2))
        buf.seek(0)

Issue Description

This issue is related to #62617 and #38794

I am raising this issue because the c and python engines have problems due to overflow when it's parsing floats, where it may segfault (see #62617), may assign 0.0 (incorrectly if the exponent is positive) or read as string.

The output of the example above is

POSITIVE EXPONENT
type                 11 digits                      12 digits                      13 digits
c-None               10e99999999999                 0.0                            10e9999999999999
c-high               10e99999999999                 0.0                            10e9999999999999
c-round_trip         10e99999999999                 10e999999999999                10e9999999999999
python-None          10e99999999999                 0.0                            10e9999999999999
pyarrow-None         inf                            inf                            inf

NEGATIVE EXPONENT
type                 11 digits                      12 digits                      13 digits
c-None               0.0                            10e-999999999999               0.0
c-high               0.0                            10e-999999999999               0.0
c-round_trip         0.0                            0.0                            0.0
python-None          0.0                            10e-999999999999               0.0
pyarrow-None         0.0                            0.0                            0.0

The pyarrow engine is the only one that is consistent.

Expected Behavior

The c and python engines should either read the value as a string when an overflow occurs, or assign the correct float value (inf, -inf, 0.0).

Installed Versions

INSTALLED VERSIONS

commit : f7447cc
python : 3.13.7
python-bits : 64
OS : Linux
OS-release : 6.16.12-200.fc42.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Sun Oct 12 16:31:16 UTC 2025
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : pt_BR.UTF-8
LOCALE : pt_BR.UTF-8

pandas : 3.0.0.dev0+2555.gf7447cc05e.dirty
numpy : 2.3.4
dateutil : 2.9.0.post0
pip : 25.2
Cython : 3.1.4
sphinx : 8.2.3
IPython : 9.6.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.14.2
bottleneck : 1.6.0
fastparquet : 2024.11.0
fsspec : 2025.9.0
html5lib : 1.1
hypothesis : 6.141.1
gcsfs : 2025.9.0
jinja2 : 3.1.6
lxml.etree : 6.0.2
matplotlib : 3.10.7
numba : 0.62.1
numexpr : 2.14.1
odfpy : None
openpyxl : 3.1.5
psycopg2 : 2.9.11
pymysql : 1.4.6
pyarrow : 21.0.0
pyiceberg : 0.10.0
pyreadstat : 1.3.1
pytest : 8.4.2
python-calamine : None
pytz : 2025.2
pyxlsb : 1.0.10
s3fs : 2025.9.0
scipy : 1.16.2
sqlalchemy : 2.0.44
tables : 3.10.2
tabulate : 0.9.0
xarray : 2025.10.1
xlrd : 2.0.2
xlsxwriter : 3.2.9
zstandard : 0.25.0
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions