Addresses issue #18: adding `BytesReplacer` interface to allow `BytesReplacingReader` to have a customized sizing/search strategy #19

jf-tech · 2021-08-01T23:45:24Z

(fixes #18)

Originally BytesReplacingReader is designed for a single pair of search/replace token replacement in a stream. Issue #18
reveals the need for multi-token replacement in a memory efficient way.

Alternative to this solution can we ask user to nest BytesReplacingReader for each pair of search/replace. However that
approach would allocate r.buf for each and every pair - if the # of pairs is large, which is evidently so in issue #18, the
memory consumption is huge.

Another alternative is to ask user to use BytesReplacingReader to finish one pair of search/replace replacement in a
stream and repeat the process multiple times. That's equally undesirable, given high memory/disk demand for this approach.

Instead, we now introduce a new interfacer BytesReplacer that allows BytesReplacingReader to do buf allocation sizing
estimate customization as well as search customization.

API is strictly backward compatible: NewBytesReplacingReader simply creates a simple single search/replace replacer and then uses the new NewBytesReplacingReaderEx underneath.

We demonstrated the multi-token replacement strategy demanded in issue #18 in a test.

There is no performance degradation[1]:

Before change:

BenchmarkBytesReplacingReader_70MBLength_500Targets-8    	      62	  18566817 ns/op	423499484 B/op	      49 allocs/op
BenchmarkRegularReader_70MBLength_500Targets-8           	      74	  16334800 ns/op	423499325 B/op	      49 allocs/op
BenchmarkBytesReplacingReader_1KBLength_20Targets-8      	 1583468	       756.7 ns/op	    2864 B/op	       4 allocs/op
BenchmarkRegularReader_1KBLength_20Targets-8             	 3863762	       309.0 ns/op	    2864 B/op	       4 allocs/op
BenchmarkBytesReplacingReader_50KBLength_1000Targets-8   	   33760	     35490 ns/op	  210480 B/op	      17 allocs/op
BenchmarkRegularReader_50KBLength_1000Targets-8          	   95044	     12714 ns/op	  210480 B/op	      17 allocs/op

After:

BenchmarkBytesReplacingReader_70MBLength_500Targets-8    	      61	  18214781 ns/op	423499484 B/op	      49 allocs/op
BenchmarkRegularReader_70MBLength_500Targets-8           	      74	  16589935 ns/op	423499329 B/op	      49 allocs/op
BenchmarkBytesReplacingReader_1KBLength_20Targets-8      	 1552221	       772.6 ns/op	    2864 B/op	       4 allocs/op
BenchmarkRegularReader_1KBLength_20Targets-8             	 3879327	       308.9 ns/op	    2864 B/op	       4 allocs/op
BenchmarkBytesReplacingReader_50KBLength_1000Targets-8   	   32160	     37192 ns/op	  210480 B/op	      17 allocs/op
BenchmarkRegularReader_50KBLength_1000Targets-8          	   95293	     12419 ns/op	  210480 B/op	      17 allocs/op

[1] strictly speaking there is one extra allocation (for creating a single search/replace replace) if the existing r.Reset
is used, thus if we really want to be pedantic, yes there is a minor perf degradation, if user of the API choose to now modify
their code at all.

P.S. also split the BytesReplacingReader and its tests into a new file given the complexity

…ReplacingReader` to have a customized sizing/search strategy Originally `BytesReplacingReader` is designed for a single pair of `search/replace` token replacement in a stream. Issue #18 reveals the need for multi-token replacement in a memory efficient way. Alternative to this solution can we ask user to nest `BytesReplacingReader` for each pair of `search/replace`. However that approach would allocate `r.buf` for each and every pair - if the # of pairs is large, which is evidently so in issue #8, the memory consumption is huge. Another alternative is to ask user to use `BytesReplacingReader` to finish one pair of `search/replace` replacement in a stream and repeat the process multiple times. That's equally undesirable, given high memory/disk demand for this approach. Instead, we now introduce a new interfacer `BytesReplacer` that allows `BytesReplacingReader` to do buf allocation sizing estimate customization as well as search customization. API is strictly backward compatible: `NewBytesReplacingReader` simply creates a simple single `search/replace` replacer and then uses the new `NewBytesReplacingReaderEx` underneath. We demonstrated the multi-token replacement strategy demanded in issue #18x in a test. There is no performance degradation[1]: Before change: ``` BenchmarkBytesReplacingReader_70MBLength_500Targets-8 62 18566817 ns/op 423499484 B/op 49 allocs/op BenchmarkRegularReader_70MBLength_500Targets-8 74 16334800 ns/op 423499325 B/op 49 allocs/op BenchmarkBytesReplacingReader_1KBLength_20Targets-8 1583468 756.7 ns/op 2864 B/op 4 allocs/op BenchmarkRegularReader_1KBLength_20Targets-8 3863762 309.0 ns/op 2864 B/op 4 allocs/op BenchmarkBytesReplacingReader_50KBLength_1000Targets-8 33760 35490 ns/op 210480 B/op 17 allocs/op BenchmarkRegularReader_50KBLength_1000Targets-8 95044 12714 ns/op 210480 B/op 17 allocs/op ``` After: ``` BenchmarkBytesReplacingReader_70MBLength_500Targets-8 61 18214781 ns/op 423499484 B/op 49 allocs/op BenchmarkRegularReader_70MBLength_500Targets-8 74 16589935 ns/op 423499329 B/op 49 allocs/op BenchmarkBytesReplacingReader_1KBLength_20Targets-8 1552221 772.6 ns/op 2864 B/op 4 allocs/op BenchmarkRegularReader_1KBLength_20Targets-8 3879327 308.9 ns/op 2864 B/op 4 allocs/op BenchmarkBytesReplacingReader_50KBLength_1000Targets-8 32160 37192 ns/op 210480 B/op 17 allocs/op BenchmarkRegularReader_50KBLength_1000Targets-8 95293 12419 ns/op 210480 B/op 17 allocs/op ``` [1] strictly speaking there is one extra allocation (for creating a single `search/replace` replace) if the existing `r.Reset` is used, thus if we really want to be pedantic, yes there is a minor perf degradation, if user of the API choose to now modify their code at all.

jf-tech · 2021-08-01T23:45:49Z

FYI @carterpeel

codecov · 2021-08-01T23:47:09Z

Codecov Report

Merging #19 (7c284b0) into master (5e34763) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##            master       #19    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           17        18     +1     
  Lines          515       633   +118     
==========================================
+ Hits           515       633   +118

Impacted Files	Coverage Δ
ios/readers.go	`100.00% <ø> (ø)`
ios/bytesReplacingReader.go	`100.00% <100.00%> (ø)`
ios/files.go	`100.00% <0.00%> (ø)`
strs/strs.go	`100.00% <0.00%> (ø)`
times/util.go	`100.00% <0.00%> (ø)`
jsons/jsons.go	`100.00% <0.00%> (ø)`
maths/maths.go	`100.00% <0.00%> (ø)`
times/parse.go	`100.00% <0.00%> (ø)`
ios/scanners.go	`100.00% <0.00%> (ø)`
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e34763...7c284b0. Read the comment docs.

carterpeel · 2021-08-03T19:53:38Z

(fixes #18)

Originally BytesReplacingReader is designed for a single pair of search/replace token replacement in a stream. Issue #18
reveals the need for multi-token replacement in a memory efficient way.

Alternative to this solution can we ask user to nest BytesReplacingReader for each pair of search/replace. However that
approach would allocate r.buf for each and every pair - if the # of pairs is large, which is evidently so in issue #18, the
memory consumption is huge.

Another alternative is to ask user to use BytesReplacingReader to finish one pair of search/replace replacement in a
stream and repeat the process multiple times. That's equally undesirable, given high memory/disk demand for this approach.

Instead, we now introduce a new interfacer BytesReplacer that allows BytesReplacingReader to do buf allocation sizing
estimate customization as well as search customization.

API is strictly backward compatible: NewBytesReplacingReader simply creates a simple single search/replace replacer and then uses the new NewBytesReplacingReaderEx underneath.

We demonstrated the multi-token replacement strategy demanded in issue #18 in a test.

There is no performance degradation[1]:

Before change:
BenchmarkBytesReplacingReader_70MBLength_500Targets-8    	      62	  18566817 ns/op	423499484 B/op	      49 allocs/op
BenchmarkRegularReader_70MBLength_500Targets-8           	      74	  16334800 ns/op	423499325 B/op	      49 allocs/op
BenchmarkBytesReplacingReader_1KBLength_20Targets-8      	 1583468	       756.7 ns/op	    2864 B/op	       4 allocs/op
BenchmarkRegularReader_1KBLength_20Targets-8             	 3863762	       309.0 ns/op	    2864 B/op	       4 allocs/op
BenchmarkBytesReplacingReader_50KBLength_1000Targets-8   	   33760	     35490 ns/op	  210480 B/op	      17 allocs/op
BenchmarkRegularReader_50KBLength_1000Targets-8          	   95044	     12714 ns/op	  210480 B/op	      17 allocs/op
After:
BenchmarkBytesReplacingReader_70MBLength_500Targets-8    	      61	  18214781 ns/op	423499484 B/op	      49 allocs/op
BenchmarkRegularReader_70MBLength_500Targets-8           	      74	  16589935 ns/op	423499329 B/op	      49 allocs/op
BenchmarkBytesReplacingReader_1KBLength_20Targets-8      	 1552221	       772.6 ns/op	    2864 B/op	       4 allocs/op
BenchmarkRegularReader_1KBLength_20Targets-8             	 3879327	       308.9 ns/op	    2864 B/op	       4 allocs/op
BenchmarkBytesReplacingReader_50KBLength_1000Targets-8   	   32160	     37192 ns/op	  210480 B/op	      17 allocs/op
BenchmarkRegularReader_50KBLength_1000Targets-8          	   95293	     12419 ns/op	  210480 B/op	      17 allocs/op
[1] strictly speaking there is one extra allocation (for creating a single search/replace replace) if the existing r.Reset
is used, thus if we really want to be pedantic, yes there is a minor perf degradation, if user of the API choose to now modify
their code at all.

P.S. also split the BytesReplacingReader and its tests into a new file given the complexity

This looks great. Thanks a ton for adding this. 👍🏻

carterpeel approved these changes Aug 3, 2021

View reviewed changes

jf-tech merged commit 6f1a001 into master Aug 4, 2021

jf-tech deleted the multitoken branch August 4, 2021 02:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Addresses issue #18: adding `BytesReplacer` interface to allow `BytesReplacingReader` to have a customized sizing/search strategy #19

Addresses issue #18: adding `BytesReplacer` interface to allow `BytesReplacingReader` to have a customized sizing/search strategy #19

Uh oh!

jf-tech commented Aug 1, 2021

Uh oh!

jf-tech commented Aug 1, 2021

Uh oh!

codecov bot commented Aug 1, 2021

Uh oh!

carterpeel commented Aug 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Addresses issue #18: adding BytesReplacer interface to allow BytesReplacingReader to have a customized sizing/search strategy #19

Addresses issue #18: adding BytesReplacer interface to allow BytesReplacingReader to have a customized sizing/search strategy #19

Uh oh!

Conversation

jf-tech commented Aug 1, 2021

Uh oh!

jf-tech commented Aug 1, 2021

Uh oh!

codecov bot commented Aug 1, 2021

Codecov Report

Uh oh!

carterpeel commented Aug 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Addresses issue #18: adding `BytesReplacer` interface to allow `BytesReplacingReader` to have a customized sizing/search strategy #19

Addresses issue #18: adding `BytesReplacer` interface to allow `BytesReplacingReader` to have a customized sizing/search strategy #19