Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly copy token sequences to avoid sharing input #495

Merged
merged 1 commit into from
Nov 27, 2022

Conversation

EggBaconAndSpam
Copy link
Contributor

@EggBaconAndSpam EggBaconAndSpam commented Oct 28, 2022

Closes #492

As mentioned in the issue, we can avoid leaking memory retaining the input in memory by explicitly copying token sequences as we read them.

The performance implications of adding an explicit copy to takeN_ and takeWhile_ are as follows:

  • The takeWhileP family of basic combinators end up consuming a lot more memory (since previously they didn't need to allocate the result buffer) and time (due to the copy call).
  • The decimal (octal etc) family of combinators are about 40% slower and consume about twice as much memory.
  • All other combinators (like many) have either stayed the same or got slightly (up to 10%) faster.

Sadly we don't have any real-world benchmarks (or do we?) so it's hard to tell if this would have any significant impact in practice.

Raw benchmark results

This PR: memory speed
Master: memory speed

@mrkkrp
Copy link
Owner

mrkkrp commented Nov 10, 2022

The way forward here is to first bring up to date the nix stuff that allows us to observe effects of such changes on dependent packages. I will do it soon. Once that is done we will be able to make an informed decision.

@mrkkrp mrkkrp added this to the 9.3.0 milestone Nov 15, 2022
@mrkkrp mrkkrp force-pushed the no-input-sharing branch 2 times, most recently from 023eab1 to 982c07c Compare November 16, 2022 13:57
@mrkkrp
Copy link
Owner

mrkkrp commented Nov 18, 2022

Here are the results of benchmark before (old) and after (new) the changes for 3 packages (parsers-bench from this repo, mmark, and modern-uri):

While the impact is sometimes negligible, in certain cases it is quite noticeable (notably in the parsers from parsers-bench). Perhaps we should give the users a choice whether to perform copying and avoid sharing?

@EggBaconAndSpam
Copy link
Contributor Author

EggBaconAndSpam commented Nov 21, 2022

Thanks for sorting out the benchmarks! I had completely overlooked that parsers-bench is part of this repo as well 😅

I agree the impact appears to be large enough that we probably should give people a choice.

I propose adding two newtype wrappers ShareInput and NoShareInput (names are up for debate 😄) to explicitly select one or the other Stream implementation. The unwrapped instances should coincide with their ShareInput counterparts at least for now, i.e. instance Stream a = instance Stream (ShareInput a) in handwavy terms.

In a future major release we could change the default behaviour to NoShareInput, with a short note on what input sharing is and to try ShareInput if performance is a concern.

What do you think? Shall I update the PR along those lines?

@mrkkrp
Copy link
Owner

mrkkrp commented Nov 21, 2022

This sounds good, please go ahead.

@EggBaconAndSpam
Copy link
Contributor Author

I added ShareInput and NoShareInput. What does this look like to you? 🙂

@mrkkrp mrkkrp force-pushed the no-input-sharing branch 2 times, most recently from 4933ce2 to 11ae3d7 Compare November 27, 2022 17:42
@mrkkrp mrkkrp merged commit 72e6dfe into mrkkrp:master Nov 27, 2022
@mrkkrp
Copy link
Owner

mrkkrp commented Nov 27, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected input-sharing behaviour of parsers
2 participants