Skip to content

Conversation

@jf-tech
Copy link
Owner

@jf-tech jf-tech commented Aug 1, 2021

(fixes #18)

Originally BytesReplacingReader is designed for a single pair of search/replace token replacement in a stream. Issue #18
reveals the need for multi-token replacement in a memory efficient way.

Alternative to this solution can we ask user to nest BytesReplacingReader for each pair of search/replace. However that
approach would allocate r.buf for each and every pair - if the # of pairs is large, which is evidently so in issue #18, the
memory consumption is huge.

Another alternative is to ask user to use BytesReplacingReader to finish one pair of search/replace replacement in a
stream and repeat the process multiple times. That's equally undesirable, given high memory/disk demand for this approach.

Instead, we now introduce a new interfacer BytesReplacer that allows BytesReplacingReader to do buf allocation sizing
estimate customization as well as search customization.

API is strictly backward compatible: NewBytesReplacingReader simply creates a simple single search/replace replacer and then uses the new NewBytesReplacingReaderEx underneath.

We demonstrated the multi-token replacement strategy demanded in issue #18 in a test.

There is no performance degradation[1]:

Before change:

BenchmarkBytesReplacingReader_70MBLength_500Targets-8    	      62	  18566817 ns/op	423499484 B/op	      49 allocs/op
BenchmarkRegularReader_70MBLength_500Targets-8           	      74	  16334800 ns/op	423499325 B/op	      49 allocs/op
BenchmarkBytesReplacingReader_1KBLength_20Targets-8      	 1583468	       756.7 ns/op	    2864 B/op	       4 allocs/op
BenchmarkRegularReader_1KBLength_20Targets-8             	 3863762	       309.0 ns/op	    2864 B/op	       4 allocs/op
BenchmarkBytesReplacingReader_50KBLength_1000Targets-8   	   33760	     35490 ns/op	  210480 B/op	      17 allocs/op
BenchmarkRegularReader_50KBLength_1000Targets-8          	   95044	     12714 ns/op	  210480 B/op	      17 allocs/op

After:

BenchmarkBytesReplacingReader_70MBLength_500Targets-8    	      61	  18214781 ns/op	423499484 B/op	      49 allocs/op
BenchmarkRegularReader_70MBLength_500Targets-8           	      74	  16589935 ns/op	423499329 B/op	      49 allocs/op
BenchmarkBytesReplacingReader_1KBLength_20Targets-8      	 1552221	       772.6 ns/op	    2864 B/op	       4 allocs/op
BenchmarkRegularReader_1KBLength_20Targets-8             	 3879327	       308.9 ns/op	    2864 B/op	       4 allocs/op
BenchmarkBytesReplacingReader_50KBLength_1000Targets-8   	   32160	     37192 ns/op	  210480 B/op	      17 allocs/op
BenchmarkRegularReader_50KBLength_1000Targets-8          	   95293	     12419 ns/op	  210480 B/op	      17 allocs/op

[1] strictly speaking there is one extra allocation (for creating a single search/replace replace) if the existing r.Reset
is used, thus if we really want to be pedantic, yes there is a minor perf degradation, if user of the API choose to now modify
their code at all.

P.S. also split the BytesReplacingReader and its tests into a new file given the complexity

…ReplacingReader` to have a customized sizing/search strategy

Originally `BytesReplacingReader` is designed for a single pair of `search/replace` token replacement in a stream. Issue #18
reveals the need for multi-token replacement in a memory efficient way.

Alternative to this solution can we ask user to nest `BytesReplacingReader` for each pair of `search/replace`. However that
approach would allocate `r.buf` for each and every pair - if the # of pairs is large, which is evidently so in issue #8, the
memory consumption is huge.

Another alternative is to ask user to use `BytesReplacingReader` to finish one pair of `search/replace` replacement in a
stream and repeat the process multiple times. That's equally undesirable, given high memory/disk demand for this approach.

Instead, we now introduce a new interfacer `BytesReplacer` that allows `BytesReplacingReader` to do buf allocation sizing
estimate customization as well as search customization.

API is strictly backward compatible: `NewBytesReplacingReader` simply creates a simple single `search/replace` replacer and
then uses the new `NewBytesReplacingReaderEx` underneath.

We demonstrated the multi-token replacement strategy demanded in issue #18x in a test.

There is no performance degradation[1]:

Before change:
```
BenchmarkBytesReplacingReader_70MBLength_500Targets-8    	      62	  18566817 ns/op	423499484 B/op	      49 allocs/op
BenchmarkRegularReader_70MBLength_500Targets-8           	      74	  16334800 ns/op	423499325 B/op	      49 allocs/op
BenchmarkBytesReplacingReader_1KBLength_20Targets-8      	 1583468	       756.7 ns/op	    2864 B/op	       4 allocs/op
BenchmarkRegularReader_1KBLength_20Targets-8             	 3863762	       309.0 ns/op	    2864 B/op	       4 allocs/op
BenchmarkBytesReplacingReader_50KBLength_1000Targets-8   	   33760	     35490 ns/op	  210480 B/op	      17 allocs/op
BenchmarkRegularReader_50KBLength_1000Targets-8          	   95044	     12714 ns/op	  210480 B/op	      17 allocs/op
```
After:
```
BenchmarkBytesReplacingReader_70MBLength_500Targets-8    	      61	  18214781 ns/op	423499484 B/op	      49 allocs/op
BenchmarkRegularReader_70MBLength_500Targets-8           	      74	  16589935 ns/op	423499329 B/op	      49 allocs/op
BenchmarkBytesReplacingReader_1KBLength_20Targets-8      	 1552221	       772.6 ns/op	    2864 B/op	       4 allocs/op
BenchmarkRegularReader_1KBLength_20Targets-8             	 3879327	       308.9 ns/op	    2864 B/op	       4 allocs/op
BenchmarkBytesReplacingReader_50KBLength_1000Targets-8   	   32160	     37192 ns/op	  210480 B/op	      17 allocs/op
BenchmarkRegularReader_50KBLength_1000Targets-8          	   95293	     12419 ns/op	  210480 B/op	      17 allocs/op
```

[1] strictly speaking there is one extra allocation (for creating a single `search/replace` replace) if the existing `r.Reset`
is used, thus if we really want to be pedantic, yes there is a minor perf degradation, if user of the API choose to now modify
their code at all.
@jf-tech
Copy link
Owner Author

jf-tech commented Aug 1, 2021

FYI @carterpeel

@codecov
Copy link

codecov bot commented Aug 1, 2021

Codecov Report

Merging #19 (7c284b0) into master (5e34763) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##            master       #19    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           17        18     +1     
  Lines          515       633   +118     
==========================================
+ Hits           515       633   +118     
Impacted Files Coverage Δ
ios/readers.go 100.00% <ø> (ø)
ios/bytesReplacingReader.go 100.00% <100.00%> (ø)
ios/files.go 100.00% <0.00%> (ø)
strs/strs.go 100.00% <0.00%> (ø)
times/util.go 100.00% <0.00%> (ø)
jsons/jsons.go 100.00% <0.00%> (ø)
maths/maths.go 100.00% <0.00%> (ø)
times/parse.go 100.00% <0.00%> (ø)
ios/scanners.go 100.00% <0.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e34763...7c284b0. Read the comment docs.

@carterpeel
Copy link

(fixes #18)

Originally BytesReplacingReader is designed for a single pair of search/replace token replacement in a stream. Issue #18
reveals the need for multi-token replacement in a memory efficient way.

Alternative to this solution can we ask user to nest BytesReplacingReader for each pair of search/replace. However that
approach would allocate r.buf for each and every pair - if the # of pairs is large, which is evidently so in issue #18, the
memory consumption is huge.

Another alternative is to ask user to use BytesReplacingReader to finish one pair of search/replace replacement in a
stream and repeat the process multiple times. That's equally undesirable, given high memory/disk demand for this approach.

Instead, we now introduce a new interfacer BytesReplacer that allows BytesReplacingReader to do buf allocation sizing
estimate customization as well as search customization.

API is strictly backward compatible: NewBytesReplacingReader simply creates a simple single search/replace replacer and then uses the new NewBytesReplacingReaderEx underneath.

We demonstrated the multi-token replacement strategy demanded in issue #18 in a test.

There is no performance degradation[1]:

Before change:

BenchmarkBytesReplacingReader_70MBLength_500Targets-8    	      62	  18566817 ns/op	423499484 B/op	      49 allocs/op
BenchmarkRegularReader_70MBLength_500Targets-8           	      74	  16334800 ns/op	423499325 B/op	      49 allocs/op
BenchmarkBytesReplacingReader_1KBLength_20Targets-8      	 1583468	       756.7 ns/op	    2864 B/op	       4 allocs/op
BenchmarkRegularReader_1KBLength_20Targets-8             	 3863762	       309.0 ns/op	    2864 B/op	       4 allocs/op
BenchmarkBytesReplacingReader_50KBLength_1000Targets-8   	   33760	     35490 ns/op	  210480 B/op	      17 allocs/op
BenchmarkRegularReader_50KBLength_1000Targets-8          	   95044	     12714 ns/op	  210480 B/op	      17 allocs/op

After:

BenchmarkBytesReplacingReader_70MBLength_500Targets-8    	      61	  18214781 ns/op	423499484 B/op	      49 allocs/op
BenchmarkRegularReader_70MBLength_500Targets-8           	      74	  16589935 ns/op	423499329 B/op	      49 allocs/op
BenchmarkBytesReplacingReader_1KBLength_20Targets-8      	 1552221	       772.6 ns/op	    2864 B/op	       4 allocs/op
BenchmarkRegularReader_1KBLength_20Targets-8             	 3879327	       308.9 ns/op	    2864 B/op	       4 allocs/op
BenchmarkBytesReplacingReader_50KBLength_1000Targets-8   	   32160	     37192 ns/op	  210480 B/op	      17 allocs/op
BenchmarkRegularReader_50KBLength_1000Targets-8          	   95293	     12419 ns/op	  210480 B/op	      17 allocs/op

[1] strictly speaking there is one extra allocation (for creating a single search/replace replace) if the existing r.Reset
is used, thus if we really want to be pedantic, yes there is a minor perf degradation, if user of the API choose to now modify
their code at all.

P.S. also split the BytesReplacingReader and its tests into a new file given the complexity

This looks great. Thanks a ton for adding this. 👍🏻

@jf-tech jf-tech merged commit 6f1a001 into master Aug 4, 2021
@jf-tech jf-tech deleted the multitoken branch August 4, 2021 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for multiple old:new replace mappings

3 participants