Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_json & read_ndjson do not support scientific notation with + symbol #5687

Closed
2 tasks done
StijnKas opened this issue Nov 30, 2022 · 4 comments
Closed
2 tasks done
Assignees
Labels
bug Something isn't working python Related to Python Polars

Comments

@StijnKas
Copy link

StijnKas commented Nov 30, 2022

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

It seems read_json and read_ndjson do not currently support scientific notation with + symbol. Notation with the - symbol is supported.

Reproducible example

#imports
import polars as pl
from io import StringIO

#works
pl.read_ndjson(StringIO('{"Value":1.1e-10}'))

shape: (1, 1)
┌────────────┐
│ Value      │
│ ---        │
│ f64        │
╞════════════╡
│ 1.1000e-10 │
└────────────┘

#does not work
pl.read_ndjson(StringIO('{"Value":1.1e+10}'))


thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ExternalFormat("MissingComa(43)")', /Users/runner/work/polars/polars/polars/polars-io/src/ndjson_core/ndjson.rs:161:90
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
/var/folders/bn/mzxdq739615fffnwsnzkjkgjfplznp/T/ipykernel_52657/1504185214.py in <cell line: 1>()
----> 1 pl.read_ndjson(StringIO('{"Value":1.1e+10}'))

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/polars/io.py in read_ndjson(file)
   1025 
   1026     """
-> 1027     return DataFrame._read_ndjson(file)
   1028 
   1029 

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/polars/internals/dataframe/frame.py in _read_ndjson(cls, file)
    835 
    836         self = cls.__new__(cls)
--> 837         self._df = PyDataFrame.read_ndjson(file)
    838         return self
    839 

PanicException: called `Result::unwrap()` on an `Err` value: ExternalFormat("MissingComa(43)")

Expected behavior

I would expect it to support the plus symbol while reading, seeing as it's supported by the official json spec https://www.json.org/json-en.html.

Installed versions

---Version info---
Polars: 0.15.1
Index type: UInt32
Platform: macOS-12.6.1-x86_64-i386-64bit
Python: 3.10.7 (v3.10.7:6cc6b13308, Sep  5 2022, 14:02:52) [Clang 13.0.0 (clang-1300.0.29.30)]
---Optional dependencies---
pyarrow: 10.0.1
pandas: 1.2.5
numpy: 1.23.3
fsspec: 2022.8.2
connectorx: 0.3.0
xlsx2csv: 0.8
matplotlib: 3.6.1
@StijnKas StijnKas added bug Something isn't working python Related to Python Polars labels Nov 30, 2022
@StijnKas StijnKas changed the title read_json & read_ndjson do not support scientific notation read_json & read_ndjson do not support scientific notation with + symbol Nov 30, 2022
@universalmind303 universalmind303 self-assigned this Nov 30, 2022
@universalmind303
Copy link
Collaborator

I can take a look at this one. I expect it is an upstream issue with simd-json

@universalmind303
Copy link
Collaborator

The root cause is actually from the infer schema which uses json-deserializer.

@universalmind303
Copy link
Collaborator

once arrow2 is updated with the latest json-deserializer version, this should be resolved. See jorgecarleitao/arrow2#1321

@StijnKas
Copy link
Author

Closed by #5781

StijnKas added a commit to pegasystems/pega-datascientist-tools that referenced this issue Dec 12, 2022
…ng (#63)

With pola-rs/polars#5687 closed, I've updated all import logic for pdstools to use polars over pyarrow. Since we make heavy use of ndjson, this issue was a blocker. All imports should now be much faster.

Additionally, I've added some basic logging to the file import logic.


* Fill SnapshotTime with nan if missing

* Changed all import logic to Polars from pyarrow

* Updated ValueFinder to use correct keyword for polars instead of pyarrow

Co-authored-by: Stijn Kas <stijn.kas@pega.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants