AddUTF8FromBytes modifies its argument utf8bytes resulting in a panic if you generate two PDFs wih the "same" font bytes. #316

vdobler · 2019-10-01T12:36:04Z

I've encountered a strange problem: I can generate a PDF with embedded UTF-8 font without any problems, but just once. Generating the same PDF once more results in a panic because the input utf8bytes to AddUTF8FontFromBytes is modified.

$ ./fpdfpanic
Making PDF
--> OK
Making PDF
panic: runtime error: index out of range [20] with length 12

goroutine 1 [running]:
github.com/jung-kurt/gofpdf/v2.(*utf8FontFile).getSymbols(0xc0000cc2a0, 0x14, 0xc0000c6e28, 0xc00006c210, 0xc00006c1b0, 0xc00006a120, 0xb, 0xb, 0xc0000c6e28, 0xc00006c210, ...)
	/Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/utf8fontfile.go:793 +0x522
github.com/jung-kurt/gofpdf/v2.(*utf8FontFile).parseSymbols(0xc0000cc2a0, 0xc0000a36e0, 0xb, 0xc00006c2a0, 0x3ff0000000000000, 0x0, 0x0, 0x0)
	/Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/utf8fontfile.go:563 +0x541
github.com/jung-kurt/gofpdf/v2.(*utf8FontFile).GenerateСutFont(0xc0000cc2a0, 0xc0000a36e0, 0x0, 0xc000014a3a, 0x3)
	/Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/utf8fontfile.go:672 +0x3eb
github.com/jung-kurt/gofpdf/v2.(*Fpdf).putfonts(0xc0002ce800)
	/Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/fpdf.go:4036 +0xb78
github.com/jung-kurt/gofpdf/v2.(*Fpdf).putresources(0xc0002ce800)
	/Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/fpdf.go:4511 +0x75
github.com/jung-kurt/gofpdf/v2.(*Fpdf).enddoc(0xc0002ce800)
	/Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/fpdf.go:4698 +0x121
github.com/jung-kurt/gofpdf/v2.(*Fpdf).Close(0xc0002ce800)
	/Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/fpdf.go:688 +0xa1
main.makePDF(0xc00006c150)
	/Users/m0252720/go/src/tmp/fpdfpanic/main.go:28 +0x215
main.main()
	/Users/m0252720/go/src/tmp/fpdfpanic/main.go:17 +0xd5

Here is (stripped) the code to reproduce above error:

package main

import (
	"bytes"
	"encoding/base64"
	"fmt"

	"github.com/jung-kurt/gofpdf/v2"
)

var fontBytes []byte

func main() {
	fontBytes, _ = base64.StdEncoding.DecodeString(dvscData)
	makePDF()
	makePDF()
}

func makePDF() *bytes.Buffer {
	fmt.Println("Making PDF")
	pdf := gofpdf.New("P", "mm", "A4", "")
	pdf.AddUTF8FontFromBytes("HNC", "", fontBytes)
	pdf.AddPage()
	pdf.SetXY(20, 20)
	pdf.SetFont("HNC", "", 12)
	pdf.MultiCell(0, 15, "abcABC1234", "", "L", false)
	pdf.Close()

	out := &bytes.Buffer{}
	pdf.Output(out)
	fmt.Println("--> OK")
	return out
}

// DejaVuSansCondensed.ttf in base64 encoding. (trimmed).
var dvscData = `AAEAAAAUAQAAB ..... `

The problem does not happen if the font is loaded from disk, e.g. with pdf.AddUTF8Font("HNC", "", "..../github.com/jung-kurt/gofpdf/font/DejaVuSansCondensed.ttf")

The panic does not happen on the first rendering but on all subsequent renderings (not show in demo code above).

The reason is that somehow AddUTF8FontFromBytes modifies its input utf8bytes. Makeing a copy beforehand avoids the problem and renders all PDFs without problems.

         // Working code:
	pdf := gofpdf.New("P", "mm", "A4", "")
	cpy := make([]byte, len(fontBytes))
	copy(cpy, fontBytes)
	pdf.AddUTF8FontFromBytes("HNC", "", cpy)

(Maybe this modification if the utf8bytes argument is intended but then it should be documented.)

The delta is not big, here the output of github.com/google/go-cmp/cmp.Diff'ing utf8bytes before and after:

[]uint8{
  	... // 55292 identical bytes
  	0x00, 0x9c, 0x04, 0xe1, 0x02, 0x66, 0x00, 0x8f, 0x01, 0x8d, 0x02, 0xf6, 0x00, 0xcd, 0x03, 0x44, //  |.....f.........D|
  	0x00, 0x29, 0x00, 0x66, 0x04, 0xee, 0x00, 0x73, 0x00, 0x00, 0x14, 0x00, 0x00, 0x96, 0x00, 0x00, //  |.).f...s........|
- 	0xb7,                                                                                           // -|.|
+ 	0x00,                                                                                           // +|.|
  	0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00, 0x2c, 0x20, 0x10, 0xb0, 0x02, 0x25, 0x49, 0x64, //  |........, ...%Id|
  	0xb0, 0x40, 0x51, 0x58, 0x20, 0xc8, 0x59, 0x21, 0x2d, 0x2c, 0xb0, 0x02, 0x25, 0x49, 0x64, 0xb0, //  |.@QX .Y!-,..%Id.|
  	... // 120 identical bytes
  	0x25, 0x49, 0x60, 0xb0, 0x20, 0x63, 0x68, 0x20, 0x8a, 0x10, 0x8a, 0x23, 0x3a, 0x8a, 0x10, 0x65, //  |%I`. ch ...#:..e|
  	0x3a, 0x2d, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x08, 0x00, 0x02, 0xff, 0xff, 0x00, 0x03, 0x00, //  |:-..............|
- 	0x02, 0x00,                                                                                     // -|..|
+ 	0x00, 0x00,                                                                                     // +|..|
  	0x5b, 0xfe, 0x96, 0x03, 0xf6, 0x05, 0xa4, 0x00, 0x03, 0x00, 0x07, 0x00, 0x00, 0x13, 0x11, 0x21, //  |[..............!|
  	0x11, 0x25, 0x21, 0x11, 0x21, 0x5c, 0x03, 0x99, 0xfc, 0xce, 0x02, 0xcc, 0xfd, 0x34, 0xfe, 0x96, //  |.%!.!\.......4..|
  	... // 479001 identical bytes
  	0xe8, 0xec, 0xbc, 0xb1, 0xc6, 0x15, 0xb8, 0xf8, 0xf8, 0x7d, 0xec, 0xbc, 0xb1, 0xc6, 0x15, 0xb8, //  |.........}......|
  	0xf7, 0xfc, 0xe8, 0xe3, 0xe3, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x02, 0x5e, 0xb8, //  |..............^.|
- 	0x5b, 0x12, 0x3e, 0xed,                                                                         // -|[.>.|
+ 	0x00, 0x00, 0x00, 0x00,                                                                         // +|....|
  	0x5f, 0x0f, 0x3c, 0xf5, 0x00, 0x1f, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd3, 0xc2, 0x2b, 0xf9, //  |_.<...........+.|
  	0x00, 0x00, 0x00, 0x00, 0xd3, 0xc2, 0x2b, 0xf9, 0xf8, 0xa7, 0xfc, 0x4c, 0x0c, 0xea, 0x09, 0xdc, //  |......+....L....|
  	0x00, 0x20, 0x00, 0x08, 0x00, 0x00, 0x00,                                                       //  |. .....|
- 	0x01,                                                                                           // -|.|
  	0x00, 0x00, 0x00, 0x00, 0x00,                                                                   //  |.....|
+ 	0x00,                                                                                           // +|.|
  	0x01, 0x00, 0x00, 0x07, 0x6d, 0xfe, 0x1d, 0x00, 0x00, 0x0d, 0x7e, 0xf8, 0xa7, 0xfa, 0xe3, 0x0c, //  |....m.....~.....|
  	0xea,                                                                                           //  |.|
- 	0x00, 0x01,                                                                                     // -|..|
+ 	0x00, 0x01, 0x00, 0x00, 0x00,                                                                   // +|.....|
  	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,                         //  |............|
- 	0x00, 0x00, 0x18, 0x5e, 0x04, 0x52,                                                             // -|...^.R|
+ 	0x0b, 0x00, 0x00,                                                                               // +|...|
  	0x00, 0x5b, 0x00, 0x00, 0x00, 0x00, 0x02, 0xaa, 0x00, 0x00, 0x02, 0x49, 0x00, 0x00, 0x02, 0xe2, //  |.[.........I....|
  	0x01, 0x16, 0x03, 0x4f, 0x00, 0xb1, 0x06, 0x08, 0x00, 0x8e, 0x04, 0x94, 0x00, 0x99, 0x06, 0xd7, //  |...O............|
  	... // 66316 identical bytes
  	0x4b, 0xcc, 0x00, 0x07, 0x4c, 0x50, 0x00, 0x07, 0x4c, 0xf0, 0x00, 0x07, 0x4d, 0x78, 0x00, 0x07, //  |K...LP..L...Mx..|
  	0x4e, 0x08, 0x00, 0x07, 0x4e, 0x94, 0x00, 0x07, 0x4e, 0xf4, 0x00, 0x07, 0x4f, 0x54, 0x00, 0x01, //  |N...N...N...OT..|
- 	0x00, 0x00, 0x18, 0x6d,                                                                         // -|...m|
+ 	0x00, 0x00, 0x00, 0x0b,                                                                         // +|....|
  	0x03, 0x54, 0x00, 0x2b, 0x00, 0x68, 0x00, 0x0c, 0x00, 0x02, 0x00, 0x10, 0x00, 0x99, 0x00, 0x08, //  |.T.+.h..........|
  	0x00, 0x00, 0x04, 0x15, 0x00, 0x00, 0x00, 0x08, 0x00, 0x04, 0x00, 0x00, 0x00, 0x1a, 0x01, 0x3e, //  |...............>|
  	... // 15717 identical bytes
  	0x6e, 0x73, 0x00, 0x00, 0x43, 0x00, 0x6f, 0x00, 0x6e, 0x00, 0x64, 0x00, 0x65, 0x00, 0x6e, 0x00, //  |ns..C.o.n.d.e.n.|
  	0x73, 0x00, 0x65, 0x00, 0x64, 0x00, 0x00, 0x43, 0x6f, 0x6e, 0x64, 0x65, 0x6e, 0x73, 0x65, 0x64, //  |s.e.d..Condensed|
- 	0x00, 0x00, 0x02,                                                                               // -|...|
+ 	0x00, 0x00, 0x00,                                                                               // +|...|
  	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0x7e, 0x00, 0x5a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //  |.......~.Z......|
  	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x18, 0x6d, //  |...............m|
  	... // 63402 identical bytes
  }

The text was updated successfully, but these errors were encountered:

jung-kurt · 2019-10-01T12:45:35Z

Thanks for the well-written, concise report and demonstration code, @vdobler. There is no reason why a cached copy of the font should be modified. I will look into this and hope others can provide some insights too.

ArtemKor · 2019-10-04T12:07:06Z

I fixed it and create pull reguest.

jung-kurt · 2019-10-07T11:22:39Z

@vdobler -- can you confirm that the problem has been corrected?

vdobler · 2019-10-07T12:34:17Z

I do not thing that the PR #318 is a proper fix for the underlying problem. Making a copy before modifying the byte slice hides the actual problem: somehow reading the font is not a read-only action.

This type of stopgap solution can be used without this "fix" (actually that's what I'm doing now: make a copy). But I think the actual modification should be fixed instead of hidden behind a copy.

jung-kurt · 2019-10-07T12:40:20Z

@vdobler -- all points well-taken.

jacobalberty · 2019-11-06T07:57:56Z

The font gets modified in two places within utf8fontfile.go, line 225 in the splice function. Because you are working with slices the arrays backing them get modified. splice gets called by the calls to insertUint16 at lines 782, 786, and 790. If you simply modify splice to make a copy there before proceeding you need to be aware it will make copies for the call at insertUint16 as line 736 (which might not be a bad idea as it ultimately may modify some of the backing font data, but didn't in my quick tests tonight) and another copy at the calls to splice at lines 261 and 1031.

The second place the data gets modified is at line 1025.
data = append(data, []byte{0, 0, 0}...) once again this is a slice backed by an array and you're modifying the underlying array.
I've come up with two solutions aptly named badfix and worsefix

https://github.com/jacobalberty/gofpdf/tree/badfix
Badfix creates a copy within the GenerateCutFont function by appending the data with an empty byte array, this creates a new backing array that separates those changes entirely from the underlying font data. It might need to be applied to the getTableData call at like 698 as well to prevent the call to insertUint16 at line 736 from modifying the underlying font data but I never encountered that as an issue in my testing.

https://github.com/jacobalberty/gofpdf/tree/worsefix
This one takes the brute force method of just copying the slices into new arrays in the splice function and the getMetrics function. It's pretty much guaranteed safe but I'm fairly certain it copies more than the fix in PR #318

I am not sure either of these fixes are truly the correct answer as such I have not opened a PR, though badfix is probably the best one to pursue without major refactoring.

jung-kurt · 2019-11-06T11:16:16Z

Excellent insights and report, @jacobalberty. I will study your branches. How did you go about figuring how the backing array was modified?

jacobalberty · 2019-11-06T15:43:55Z

@jung-kurt I modified a report generator I had to hash the font bytes before and after then backed out #318 to confirm that hashing the bytes would show the issue. Then went back to with #318 included and started by looking at the getTableData function, I checked everywhere it was called then looked at where the results went eventually tracing it down to those append() calls. When I saw the splice() function's append I asked in #go-nuts on freenode to get some insight on it.

Just FYI this very situation seems to be why they added 3 index slicing into go 1.2 from the design doc for 3 index slicing
The addition of append to the language made this somewhat more important, because append lets programmers overwrite entries between len and cap without realizing it or even mentioning cap.

jacobalberty · 2019-11-06T20:37:31Z

Ok after thinking about it for a when today. It occurs to me that my `worsefix` may actually be the ideal fix. Both of those functions have side effects right now in that they (may) modify the you pass into them. The worsefix branch removes, at least some of, those side effects I'm typing on my phone by email right now so I don't have my original analysis in front of me but worsefix should completely remove all side effects from `splice` while the other function there was at least one other loop that needs to be fixed to remove side effects (at this time it doesn't introduce harmful side effects but it could later if usage changes). I will probably check in a fix for the rest of that function later this evening. It is less efficient since it will duplicate more data than `badfix` but it will remove a potential source of bugs later on.

…

On Wed, Nov 6, 2019, 5:16 AM Kurt Jung ***@***.***> wrote: Excellent insights and report, @jacobalberty <https://github.com/jacobalberty>. I will study your branches. How did you go about figuring how the backing array was modified? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#316?email_source=notifications&email_token=AAPDNAR3QPIUEMRDVSQNKJ3QSKRQLA5CNFSM4I4I4C62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDGGC6I#issuecomment-550265209>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPDNAQPE27SJNUWIGOXTHLQSKRQLANCNFSM4I4I4C6Q> .

jung-kurt · 2019-11-06T20:46:00Z

Very impressive analysis, @jacobalberty. Thanks for delving into this and figuring out all of the implications.

jung-kurt · 2019-11-09T14:29:32Z

Many thanks, @jacobalberty for putting this issue to rest ( 47a2c24).

vdobler changed the title ~~panic: runtime error: index out of range when using AddUTF8FromBytes and generating 2 PDFs~~ AddUTF8FromBytes modifies its argument utf8bytes resulting in a panic if you generate two PDFs wih the "same" font bytes. Oct 1, 2019

jung-kurt added bug help wanted labels Oct 1, 2019

jacobalberty added a commit to jacobalberty/gofpdf that referenced this issue Nov 7, 2019

Add test case for issue jung-kurt#316

6d7461a

jacobalberty added a commit to jacobalberty/gofpdf that referenced this issue Nov 7, 2019

Add test case for issue jung-kurt#316

d800086

jung-kurt closed this as completed Nov 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AddUTF8FromBytes modifies its argument utf8bytes resulting in a panic if you generate two PDFs wih the "same" font bytes. #316

AddUTF8FromBytes modifies its argument utf8bytes resulting in a panic if you generate two PDFs wih the "same" font bytes. #316

vdobler commented Oct 1, 2019 •

edited

jung-kurt commented Oct 1, 2019

ArtemKor commented Oct 4, 2019

jung-kurt commented Oct 7, 2019

vdobler commented Oct 7, 2019

jung-kurt commented Oct 7, 2019

jacobalberty commented Nov 6, 2019

jung-kurt commented Nov 6, 2019

jacobalberty commented Nov 6, 2019

jacobalberty commented Nov 6, 2019 via email

jung-kurt commented Nov 6, 2019

jung-kurt commented Nov 9, 2019

AddUTF8FromBytes modifies its argument utf8bytes resulting in a panic if you generate two PDFs wih the "same" font bytes. #316

AddUTF8FromBytes modifies its argument utf8bytes resulting in a panic if you generate two PDFs wih the "same" font bytes. #316

Comments

vdobler commented Oct 1, 2019 • edited

jung-kurt commented Oct 1, 2019

ArtemKor commented Oct 4, 2019

jung-kurt commented Oct 7, 2019

vdobler commented Oct 7, 2019

jung-kurt commented Oct 7, 2019

jacobalberty commented Nov 6, 2019

jung-kurt commented Nov 6, 2019

jacobalberty commented Nov 6, 2019

jacobalberty commented Nov 6, 2019 via email

jung-kurt commented Nov 6, 2019

jung-kurt commented Nov 9, 2019

vdobler commented Oct 1, 2019 •

edited