Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract: corrupted page tree #487

Closed
rocksun opened this issue Jun 18, 2022 · 1 comment
Closed

extract: corrupted page tree #487

rocksun opened this issue Jun 18, 2022 · 1 comment
Assignees

Comments

@rocksun
Copy link

rocksun commented Jun 18, 2022

  • CentOS Stream 8

The code is:


import (
	"fmt"

	pdfcpu "github.com/pdfcpu/pdfcpu/pkg/api"
)

func main() {
	err := pdfcpu.ExtractImagesFile("../testfiles/source.pdf", "../testfiles/images", []string{"1", "2"}, nil)
	fmt.Println(err)
}

Here is the pdf: removed.pdf

The go.mod is:

module github.com/rocksun/ocrtest

go 1.18

require (
	github.com/pdfcpu/pdfcpu v0.3.13
)

And the output is:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x5eb7c9]

goroutine 1 [running]:
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.(*Image).Read(0x10?, {0xc004622000?, 0xc006236790?, 0xabcce0?})
        <autogenerated>:1 +0x29
io.copyBuffer({0x887020, 0xc006236790}, {0x886a20, 0xc00024a900}, {0x0, 0x0, 0x0})
        /usr/lib/golang/src/io/io.go:426 +0x1b2
io.Copy(...)
        /usr/lib/golang/src/io/io.go:385
os.genericReadFrom(0xc00624f928?, {0x886a20, 0xc00024a900})
        /usr/lib/golang/src/os/file.go:162 +0x67
os.(*File).ReadFrom(0xc000010690, {0x886a20, 0xc00024a900})
        /usr/lib/golang/src/os/file.go:156 +0x1b0
io.copyBuffer({0x8864c0, 0xc000010690}, {0x886a20, 0xc00024a900}, {0x0, 0x0, 0x0})
        /usr/lib/golang/src/io/io.go:412 +0x14b
io.Copy(...)
        /usr/lib/golang/src/io/io.go:385
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.WriteReader({0xc00001a600?, 0x21?}, {0x886a20, 0xc00024a900})
        /home/vagrant/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/writeImage.go:820 +0x65
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.WriteImageToDisk.func1({{0x0, 0x0}, {0xc000369c0d, 0x3}, {0x0, 0x0}, 0x2, 0x19, 0x0, 0x0, ...}, ...)
        /home/vagrant/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/images.go:166 +0x4a9
github.com/pdfcpu/pdfcpu/pkg/api.ExtractImages({0x887c68, 0xc0000105d0}, {0xc000519f50, 0x2, 0x2}, 0xc000077f20, 0xc0000406b0?)
        /home/vagrant/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/api/extract.go:121 +0x44c
github.com/pdfcpu/pdfcpu/pkg/api.ExtractImagesFile({0x7eafea, 0x17}, {0x7e8d62, 0x13}, {0xc000109f50, 0x2, 0x2}, 0x40bfb9?)
        /home/vagrant/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/api/extract.go:139 +0x249
main.main()
        /vagrant/data/ocrtest/extractpdf/extractimages/extractimages.go:10 +0x7c
exit status 2
@hhrutter hhrutter changed the title ExtractImagesFile return signal SIGSEGV error extract: corrupted page tree Jun 19, 2022
@hhrutter
Copy link
Collaborator

I can't comment about the trace you are getting but I noticed that your file contains empty pagesDicts which were not handled correctly.

This is fixed with latest commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants