extract: corrupted page tree #487

rocksun · 2022-06-18T03:40:34Z

CentOS Stream 8

The code is:


import (
	"fmt"

	pdfcpu "github.com/pdfcpu/pdfcpu/pkg/api"
)

func main() {
	err := pdfcpu.ExtractImagesFile("../testfiles/source.pdf", "../testfiles/images", []string{"1", "2"}, nil)
	fmt.Println(err)
}

Here is the pdf: removed.pdf

The go.mod is:

module github.com/rocksun/ocrtest

go 1.18

require (
	github.com/pdfcpu/pdfcpu v0.3.13
)

And the output is:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x5eb7c9]

goroutine 1 [running]:
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.(*Image).Read(0x10?, {0xc004622000?, 0xc006236790?, 0xabcce0?})
        <autogenerated>:1 +0x29
io.copyBuffer({0x887020, 0xc006236790}, {0x886a20, 0xc00024a900}, {0x0, 0x0, 0x0})
        /usr/lib/golang/src/io/io.go:426 +0x1b2
io.Copy(...)
        /usr/lib/golang/src/io/io.go:385
os.genericReadFrom(0xc00624f928?, {0x886a20, 0xc00024a900})
        /usr/lib/golang/src/os/file.go:162 +0x67
os.(*File).ReadFrom(0xc000010690, {0x886a20, 0xc00024a900})
        /usr/lib/golang/src/os/file.go:156 +0x1b0
io.copyBuffer({0x8864c0, 0xc000010690}, {0x886a20, 0xc00024a900}, {0x0, 0x0, 0x0})
        /usr/lib/golang/src/io/io.go:412 +0x14b
io.Copy(...)
        /usr/lib/golang/src/io/io.go:385
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.WriteReader({0xc00001a600?, 0x21?}, {0x886a20, 0xc00024a900})
        /home/vagrant/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/writeImage.go:820 +0x65
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.WriteImageToDisk.func1({{0x0, 0x0}, {0xc000369c0d, 0x3}, {0x0, 0x0}, 0x2, 0x19, 0x0, 0x0, ...}, ...)
        /home/vagrant/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/images.go:166 +0x4a9
github.com/pdfcpu/pdfcpu/pkg/api.ExtractImages({0x887c68, 0xc0000105d0}, {0xc000519f50, 0x2, 0x2}, 0xc000077f20, 0xc0000406b0?)
        /home/vagrant/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/api/extract.go:121 +0x44c
github.com/pdfcpu/pdfcpu/pkg/api.ExtractImagesFile({0x7eafea, 0x17}, {0x7e8d62, 0x13}, {0xc000109f50, 0x2, 0x2}, 0x40bfb9?)
        /home/vagrant/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/api/extract.go:139 +0x249
main.main()
        /vagrant/data/ocrtest/extractpdf/extractimages/extractimages.go:10 +0x7c
exit status 2

The text was updated successfully, but these errors were encountered:

hhrutter · 2022-06-19T14:24:40Z

I can't comment about the trace you are getting but I noticed that your file contains empty pagesDicts which were not handled correctly.

This is fixed with latest commit.

rocksun added the investigate label Jun 18, 2022

rocksun assigned hhrutter Jun 18, 2022

hhrutter changed the title ~~ExtractImagesFile return signal SIGSEGV error~~ extract: corrupted page tree Jun 19, 2022

hhrutter closed this as completed in 5d37c49 Jun 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract: corrupted page tree #487

extract: corrupted page tree #487

rocksun commented Jun 18, 2022

hhrutter commented Jun 19, 2022

extract: corrupted page tree #487

extract: corrupted page tree #487

Comments

rocksun commented Jun 18, 2022

hhrutter commented Jun 19, 2022