-
-
Notifications
You must be signed in to change notification settings - Fork 2
Addresses issue #18: adding BytesReplacer interface to allow BytesReplacingReader to have a customized sizing/search strategy
#19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ReplacingReader` to have a customized sizing/search strategy Originally `BytesReplacingReader` is designed for a single pair of `search/replace` token replacement in a stream. Issue #18 reveals the need for multi-token replacement in a memory efficient way. Alternative to this solution can we ask user to nest `BytesReplacingReader` for each pair of `search/replace`. However that approach would allocate `r.buf` for each and every pair - if the # of pairs is large, which is evidently so in issue #8, the memory consumption is huge. Another alternative is to ask user to use `BytesReplacingReader` to finish one pair of `search/replace` replacement in a stream and repeat the process multiple times. That's equally undesirable, given high memory/disk demand for this approach. Instead, we now introduce a new interfacer `BytesReplacer` that allows `BytesReplacingReader` to do buf allocation sizing estimate customization as well as search customization. API is strictly backward compatible: `NewBytesReplacingReader` simply creates a simple single `search/replace` replacer and then uses the new `NewBytesReplacingReaderEx` underneath. We demonstrated the multi-token replacement strategy demanded in issue #18x in a test. There is no performance degradation[1]: Before change: ``` BenchmarkBytesReplacingReader_70MBLength_500Targets-8 62 18566817 ns/op 423499484 B/op 49 allocs/op BenchmarkRegularReader_70MBLength_500Targets-8 74 16334800 ns/op 423499325 B/op 49 allocs/op BenchmarkBytesReplacingReader_1KBLength_20Targets-8 1583468 756.7 ns/op 2864 B/op 4 allocs/op BenchmarkRegularReader_1KBLength_20Targets-8 3863762 309.0 ns/op 2864 B/op 4 allocs/op BenchmarkBytesReplacingReader_50KBLength_1000Targets-8 33760 35490 ns/op 210480 B/op 17 allocs/op BenchmarkRegularReader_50KBLength_1000Targets-8 95044 12714 ns/op 210480 B/op 17 allocs/op ``` After: ``` BenchmarkBytesReplacingReader_70MBLength_500Targets-8 61 18214781 ns/op 423499484 B/op 49 allocs/op BenchmarkRegularReader_70MBLength_500Targets-8 74 16589935 ns/op 423499329 B/op 49 allocs/op BenchmarkBytesReplacingReader_1KBLength_20Targets-8 1552221 772.6 ns/op 2864 B/op 4 allocs/op BenchmarkRegularReader_1KBLength_20Targets-8 3879327 308.9 ns/op 2864 B/op 4 allocs/op BenchmarkBytesReplacingReader_50KBLength_1000Targets-8 32160 37192 ns/op 210480 B/op 17 allocs/op BenchmarkRegularReader_50KBLength_1000Targets-8 95293 12419 ns/op 210480 B/op 17 allocs/op ``` [1] strictly speaking there is one extra allocation (for creating a single `search/replace` replace) if the existing `r.Reset` is used, thus if we really want to be pedantic, yes there is a minor perf degradation, if user of the API choose to now modify their code at all.
|
FYI @carterpeel |
Codecov Report
@@ Coverage Diff @@
## master #19 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 17 18 +1
Lines 515 633 +118
==========================================
+ Hits 515 633 +118
Continue to review full report at Codecov.
|
This looks great. Thanks a ton for adding this. 👍🏻 |
(fixes #18)
Originally
BytesReplacingReaderis designed for a single pair ofsearch/replacetoken replacement in a stream. Issue #18reveals the need for multi-token replacement in a memory efficient way.
Alternative to this solution can we ask user to nest
BytesReplacingReaderfor each pair ofsearch/replace. However thatapproach would allocate
r.buffor each and every pair - if the # of pairs is large, which is evidently so in issue #18, thememory consumption is huge.
Another alternative is to ask user to use
BytesReplacingReaderto finish one pair ofsearch/replacereplacement in astream and repeat the process multiple times. That's equally undesirable, given high memory/disk demand for this approach.
Instead, we now introduce a new interfacer
BytesReplacerthat allowsBytesReplacingReaderto do buf allocation sizingestimate customization as well as search customization.
API is strictly backward compatible:
NewBytesReplacingReadersimply creates a simple singlesearch/replacereplacer and then uses the newNewBytesReplacingReaderExunderneath.We demonstrated the multi-token replacement strategy demanded in issue #18 in a test.
There is no performance degradation[1]:
Before change:
After:
[1] strictly speaking there is one extra allocation (for creating a single
search/replacereplace) if the existingr.Resetis used, thus if we really want to be pedantic, yes there is a minor perf degradation, if user of the API choose to now modify
their code at all.
P.S. also split the
BytesReplacingReaderand its tests into a new file given the complexity