Explicitly copy token sequences to avoid sharing input #495

EggBaconAndSpam · 2022-10-28T06:28:25Z

Closes #492

As mentioned in the issue, we can avoid leaking memory retaining the input in memory by explicitly copying token sequences as we read them.

The performance implications of adding an explicit copy to takeN_ and takeWhile_ are as follows:

The takeWhileP family of basic combinators end up consuming a lot more memory (since previously they didn't need to allocate the result buffer) and time (due to the copy call).
The decimal (octal etc) family of combinators are about 40% slower and consume about twice as much memory.
All other combinators (like many) have either stayed the same or got slightly (up to 10%) faster.

Sadly we don't have any real-world benchmarks (or do we?) so it's hard to tell if this would have any significant impact in practice.

Raw benchmark results

This PR: memory speed
Master: memory speed

mrkkrp · 2022-11-10T21:11:19Z

The way forward here is to first bring up to date the nix stuff that allows us to observe effects of such changes on dependent packages. I will do it soon. Once that is done we will be able to make an informed decision.

mrkkrp · 2022-11-18T17:55:59Z

Here are the results of benchmark before (old) and after (new) the changes for 3 packages (parsers-bench from this repo, mmark, and modern-uri):

While the impact is sometimes negligible, in certain cases it is quite noticeable (notably in the parsers from parsers-bench). Perhaps we should give the users a choice whether to perform copying and avoid sharing?

EggBaconAndSpam · 2022-11-21T11:36:42Z

Thanks for sorting out the benchmarks! I had completely overlooked that parsers-bench is part of this repo as well 😅

I agree the impact appears to be large enough that we probably should give people a choice.

I propose adding two newtype wrappers ShareInput and NoShareInput (names are up for debate 😄) to explicitly select one or the other Stream implementation. The unwrapped instances should coincide with their ShareInput counterparts at least for now, i.e. instance Stream a = instance Stream (ShareInput a) in handwavy terms.

In a future major release we could change the default behaviour to NoShareInput, with a short note on what input sharing is and to try ShareInput if performance is a concern.

What do you think? Shall I update the PR along those lines?

mrkkrp · 2022-11-21T12:23:08Z

This sounds good, please go ahead.

EggBaconAndSpam · 2022-11-27T12:14:05Z

I added ShareInput and NoShareInput. What does this look like to you? 🙂

mrkkrp · 2022-11-27T17:57:05Z

Thanks!

mrkkrp added this to the 9.3.0 milestone Nov 15, 2022

mrkkrp force-pushed the no-input-sharing branch 2 times, most recently from 023eab1 to 982c07c Compare November 16, 2022 13:57

mrkkrp force-pushed the no-input-sharing branch 2 times, most recently from 4933ce2 to 11ae3d7 Compare November 27, 2022 17:42

Add control over sharing of the input stream

3b2aed5

mrkkrp force-pushed the no-input-sharing branch from 11ae3d7 to 3b2aed5 Compare November 27, 2022 17:50

mrkkrp approved these changes Nov 27, 2022

View reviewed changes

mrkkrp merged commit 72e6dfe into mrkkrp:master Nov 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly copy token sequences to avoid sharing input #495

Explicitly copy token sequences to avoid sharing input #495

EggBaconAndSpam commented Oct 28, 2022 •

edited

Loading

mrkkrp commented Nov 10, 2022

mrkkrp commented Nov 18, 2022 •

edited

Loading

EggBaconAndSpam commented Nov 21, 2022 •

edited

Loading

mrkkrp commented Nov 21, 2022

EggBaconAndSpam commented Nov 27, 2022

mrkkrp commented Nov 27, 2022

Explicitly copy token sequences to avoid sharing input #495

Explicitly copy token sequences to avoid sharing input #495

Conversation

EggBaconAndSpam commented Oct 28, 2022 • edited Loading

Raw benchmark results

mrkkrp commented Nov 10, 2022

mrkkrp commented Nov 18, 2022 • edited Loading

EggBaconAndSpam commented Nov 21, 2022 • edited Loading

mrkkrp commented Nov 21, 2022

EggBaconAndSpam commented Nov 27, 2022

mrkkrp commented Nov 27, 2022

EggBaconAndSpam commented Oct 28, 2022 •

edited

Loading

mrkkrp commented Nov 18, 2022 •

edited

Loading

EggBaconAndSpam commented Nov 21, 2022 •

edited

Loading