Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improved performance of deserializing JSON (2x) #1024

Merged
merged 2 commits into from Jun 5, 2022
Merged

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented May 30, 2022

This PR replaces serde_json by json_deserializer, improving the performance of reading json strings by 2x and boolean by -30%, and +10% for floats:

read i32 2^20           time:   [68.065 ms 68.256 ms 68.444 ms]                          
                        change: [-0.8660% +1.5241% +3.8068%] (p = 0.20 > 0.05)

read f64 2^20           time:   [90.453 ms 90.682 ms 90.904 ms]                          
                        change: [+10.496% +10.879% +11.269%] (p = 0.00 < 0.05)

read utf8 2^20          time:   [51.573 ms 51.699 ms 51.825 ms]                           
                        change: [-56.883% -56.714% -56.550%] (p = 0.00 < 0.05)

read bool 2^20          time:   [33.284 ms 33.483 ms 33.837 ms]                           
                        change: [-30.126% -29.623% -28.852%] (p = 0.00 < 0.05)

Note that there is an API change: we now require to pass json_deserializer::Value, not serde_json::Value, to the deserializer. json_deserializer is exposed under arrow2::io::json::read::json_deserializer for convenience / version matching.

@codecov
Copy link

codecov bot commented May 30, 2022

Codecov Report

Merging #1024 (68f0e51) into main (7014e28) will increase coverage by 0.03%.
The diff coverage is 56.25%.

@@            Coverage Diff             @@
##             main    #1024      +/-   ##
==========================================
+ Coverage   81.39%   81.43%   +0.03%     
==========================================
  Files         362      363       +1     
  Lines       34527    34910     +383     
==========================================
+ Hits        28103    28428     +325     
- Misses       6424     6482      +58     
Impacted Files Coverage Δ
src/io/avro/read/decompress.rs 86.20% <ø> (ø)
src/io/avro/write/compress.rs 100.00% <ø> (ø)
src/io/json/write/mod.rs 92.85% <ø> (ø)
src/io/json_integration/mod.rs 72.72% <0.00%> (-27.28%) ⬇️
src/io/json/read/deserialize.rs 72.62% <50.50%> (-6.67%) ⬇️
src/io/json/write/utf8.rs 56.89% <56.89%> (ø)
src/io/json/mod.rs 100.00% <100.00%> (ø)
src/io/json/read/infer_schema.rs 95.48% <100.00%> (ø)
src/io/json/write/serialize.rs 92.73% <100.00%> (ø)
src/io/ndjson/read/deserialize.rs 100.00% <100.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7014e28...68f0e51. Read the comment docs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant