You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is due to the sharing semantics of both strict and lazy text. (I believe the same applies to strict and lazy bytestrings.)
Having been bitten by this I'm struggling to come up with a scenario where we actually gain anything from sharing memory with the input stream. In the strict case that always keeps the entire input in memory, while in the lazy case we at least keep a whole chunk (of 16K) for each 'output' (which may be much smaller than that).
It seems to me we could get rid of the input-sharing behaviour be adding a few copy :: Text -> Text in the right places:
instance Stream T.Text where
...
takeN_ n s
| n <= 0 = Just (T.empty, s)
| T.null s = Nothing
| otherwise = Just (first T.copy $ T.splitAt n s)
takeWhile_ = first T.copy $ T.span
Why is that a bad idea, and/or what am I missing?
The text was updated successfully, but these errors were encountered:
This is a valid concern, thanks for bringing this up. I think I originally wanted combinators like takeN_ to be as fast as possible and for that reason I used splitAt. I was much more preoccupied by raw speed than by the potential danger of sharing long input streams. I'd be curious to know how adding copy affects the benchmarks. If it is not too bad, which it could well be, we might do better by just adding copy in the right places and avoid sharing.
Would you have time to add copy and run the benchmarks that are included with the library (before/after the change)?
I finally managed to hunt down a memory leak which arose from the following arguably surprising behaviour:
Running either of these parsers and holding on to their results will keep the entire input in memory:
This is due to the sharing semantics of both strict and lazy text. (I believe the same applies to strict and lazy bytestrings.)
Having been bitten by this I'm struggling to come up with a scenario where we actually gain anything from sharing memory with the input stream. In the strict case that always keeps the entire input in memory, while in the lazy case we at least keep a whole chunk (of 16K) for each 'output' (which may be much smaller than that).
It seems to me we could get rid of the input-sharing behaviour be adding a few
copy :: Text -> Text
in the right places:Why is that a bad idea, and/or what am I missing?
The text was updated successfully, but these errors were encountered: