-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Space leak in Data.Aeson.Parser
#37
Comments
Hi Herbert, I was wondering, have you also tried strictifying and unpacking the fields of the constructors of diff --git a/Data/Aeson/Types/Internal.hs b/Data/Aeson/Types/Internal.hs
index 67946e9..14653d5 100644
--- a/Data/Aeson/Types/Internal.hs
+++ b/Data/Aeson/Types/Internal.hs
@@ -207,10 +207,10 @@ type Object = Map Text Value
type Array = Vector Value
-- | A JSON value represented as a Haskell value.
-data Value = Object Object
- | Array Array
- | String Text
- | Number Number
+data Value = Object !Object
+ | Array {-# UNPACK #-} !Array
+ | String {-# UNPACK #-} !Text
+ | Number !Number
| Bool !Bool
| Null
deriving (Eq, Show, Typeable, Data) (Of course unpacking fields does not always improve performance because fields sometimes have to be reboxed which will in fact hurt performance. This has to be benchmarked.) |
Good point. I remember faintly having tried that, but I don't remember the results anymore. Unboxing the |
I have not found that @basvandijk's change actually fixes this problem. It does make a difference to the amount of memory allocated, but it's small. I ran with a 182KB sample JSON file that @hvr gave me a few months ago.
Lazy fields ran in 4.30 seconds.
Strict fields ran in 4.16 seconds.
|
I should take a moment to describe why I have resisted making When we're parsing, we don't know which fields in a Granted, the down side of the current somewhat-lazy approach is that it causes many thunks to be allocated. If the consumer really does need all the fields, then we've failed to avoid any work. Instead, we've caused ourselves extra work and intermediate allocation due to the thunking. I have not yet been able to come up with a satisfactory one-size-fits-all solution. I am quite fond of the additional performance we get by deferring complete evaluation, but I am unhappy about the additional allocation we perform when the deferred evaluation is not needed. The closest I can think of to a good solution is to have two parsing modules, one that's fully strict and one that's partly lazy. If you have any better ideas, please let me know :-) |
IMHO, that's a legitimate solution, i.e. to let the user chose if there really isn't a one-size-fits-it-all parser :-) I'm wondering which percentage (if that can be roughly estimated at all) of the JSON parse-tree has to be actually needed in order for the strict parser to become more/less efficient than the lazy one. |
This changeset builds on top of dab1302, and just strictifies the monadic `return`s used in the attoparsec `Parser`s monad to avoid leaving thunks behind during parsing. This addresses issue haskell#37
This is now fixed. |
This improves the running time of the AesonParse benchmarks. I don't think there's a use-case for lazy fields. Fixes haskell#37
This changeset builds on top of dab1302, and just strictifies the monadic `return`s used in the attoparsec `Parser`s monad to avoid leaving thunks behind during parsing. This addresses issue haskell#37
This issue is mostly for documenting a known issue and a possible fix/workaround
The current parser implementation in
Data/Aeson/Parser
creates many thunks for records and arrays during parsing. Those thunks cause measurable overhead in heap usage and performance, which is already visible for some of the JSON documents contained in the Aeson benchmark suite. For even larger JSON documents in the MiB range, the overhead becomes even more substantial. Compared to Python's parser, Aeson then is a few times slower.The benchmark suite itself doesn't evaluate those thunks, unless the following diff is applied:
The following modification removes the space-leak in
Data.Aeson.Parser
:This leads to the following benchmark results:
The text was updated successfully, but these errors were encountered: