Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer overflow with fixedlength2 + header/footer #213

Closed
paulstadler opened this issue Jul 21, 2023 · 5 comments · Fixed by #214
Closed

buffer overflow with fixedlength2 + header/footer #213

paulstadler opened this issue Jul 21, 2023 · 5 comments · Fixed by #214
Assignees
Labels
bug Something isn't working

Comments

@paulstadler
Copy link

We've found using a header/footer construct in fixedlength2 causes a buffer overflow within go-corelib.

To reproduce, construct an input such as:
<REC_HDR>
<Lots of lines here...
...
...
...
<REC_FTR>

Tracing into go-corelib it looks like there's a default buffer size of 4096, so the input record needs to be greater than that to reproduce.

It looks ilke reader.lineBuf[i] is getting set to a pointer inside the buffer. When the buffer wraps, the lineBuf gets corrupted.

See attached files to reproduce. This occurs (for these data), when r.linesRead is 118 (or shortly thereafter).

I think something like this could fix it, but not sure if it's the optimal solution (this works though...).

This is in: fileformat/flatfile/fixedlength/reader.go

func (r *reader) readLine() error {
	for {
		// note1: ios.ByteReadLine returns a ln with trailing '\n' (and/or '\r') dropped.
		// note2: ios.ByteReadLine won't return io.EOF if ln returned isn't empty.
		b, err := ios.ByteReadLine(r.r)
		switch {
		case err == io.EOF:
			return io.EOF
		case err != nil:
			return ErrInvalidFixedLength(r.fmtErrStr(r.linesRead+1, err.Error()))
		}
		r.linesRead++
		if len(b) > 0 {
			// Copy data out of the buffer into our local line buffer
			lineCopy := make([]byte, len(b))
			_ = copy(lineCopy, b)
			r.linesBuf = append(r.linesBuf, line{lineNum: r.linesRead, b: lineCopy})
			// This causes buffer overruns to appear in local data:
			// r.linesBuf = append(r.linesBuf, line{lineNum: r.linesRead, b: b})
			return nil
		}
	}
}

Cheers,
Paul

schema.txt
data.txt

@jf-tech
Copy link
Owner

jf-tech commented Jul 21, 2023

@paulstadler ack. Will take a look early next week.

jf-tech added a commit that referenced this issue Jul 23, 2023
…to bufio.Reader's internal buffer rollover.

BUG: #213

If we're dealing with multi-lined envelope (either by rows or by header/footer), readLine()
will be called several times, thus whatever ios.ByteReadLine, which uses bufio.Reader underneath,
returns in a previous call may be potentially be invalidated due to bufio.Reader's internal buf
rollover. If we read the previous line directly, it would cause corruption.

To fix the problem the easiest solution would be simply copying the return []byte from
ios.ByteReadLine every single time. But for files with single-line envelope, which are the vast
majority cases, this copy becomes unnecessary and burdensome on gc. So the trick is to has a flag
on reader.linesBuf's last element to tell if it contains a reference into the bufio.Reader's
internal buffer, or it's a copy. Every time before we call bufio.Reader read, we check
reader.liensBuf's last element flag, if it is not a copy, then we will turn it into a copy.

This way, we optimize for the vast majority cases without needing allocations, and avoid any potential
corruptions in the multi-lined envelope cases.
@jf-tech
Copy link
Owner

jf-tech commented Jul 23, 2023

@paulstadler the fix has been posted at #214. Do you mind sync the branch https://github.com/jf-tech/omniparser/tree/213 and give it a test to confirm it fixes your issue, before I merge the fix to master?

Also another question: once merged into master, do you mind using master for a few days / max 1 week or two. We're adding a new feature to the omniparser and would like to cut an official release once that's in, instead of too many tiny releases. What do you think?

Many thanks.

@paulstadler
Copy link
Author

Absolutely! I'll test here today.
Cheers,
Paul

@paulstadler
Copy link
Author

Confirmed working here and reviewed code. Looks good. Thanks!

jf-tech added a commit that referenced this issue Jul 25, 2023
…to bufio.Reader's internal buffer rollover. (#214)

BUG: #213

If we're dealing with multi-lined envelope (either by rows or by header/footer), readLine()
will be called several times, thus whatever ios.ByteReadLine, which uses bufio.Reader underneath,
returns in a previous call may be potentially be invalidated due to bufio.Reader's internal buf
rollover. If we read the previous line directly, it would cause corruption.

To fix the problem the easiest solution would be simply copying the return []byte from
ios.ByteReadLine every single time. But for files with single-line envelope, which are the vast
majority cases, this copy becomes unnecessary and burdensome on gc. So the trick is to has a flag
on reader.linesBuf's last element to tell if it contains a reference into the bufio.Reader's
internal buffer, or it's a copy. Every time before we call bufio.Reader read, we check
reader.liensBuf's last element flag, if it is not a copy, then we will turn it into a copy.

This way, we optimize for the vast majority cases without needing allocations, and avoid any potential
corruptions in the multi-lined envelope cases.
@jf-tech
Copy link
Owner

jf-tech commented Jul 25, 2023

@paulstadler Merged into master. Like I mentioned earlier, please use master @ latest for a short while - before we finish a new EDI feature and cut a release.

Also please consider sponsoring the project! Appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants