Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtractImagesRaw not working since v0.3.12 #353

Closed
renard opened this issue Jul 12, 2021 · 5 comments
Closed

ExtractImagesRaw not working since v0.3.12 #353

renard opened this issue Jul 12, 2021 · 5 comments
Assignees
Labels

Comments

@renard
Copy link

renard commented Jul 12, 2021

I use pdfcpu as a library for a pet project to convert PDF files to comic books. (See below for the relevant code).

If I use a previous version (go get github.com/pdfcpu/pdfcpu@v0.3.12-0.20210416123645-ac9adc6099fe) my snippet works correctly.

However with last release (go get github.com/pdfcpu/pdfcpu) (since v0.3.12) ExtractImagesRaw stopped working with 2 symptoms:

  • Either the program panic:
panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
image.(*YCbCr).YCbCrAt(0xc0001b4480, 0x0, 0x0, 0x1001)
	/usr/local/Cellar/go/1.16/libexec/src/image/ycbcr.go:81 +0x130
image.(*YCbCr).At(0xc0001b4480, 0x0, 0x0, 0xc0000ee000, 0x5a8)
	/usr/local/Cellar/go/1.16/libexec/src/image/ycbcr.go:71 +0x45
image/png.(*encoder).writeImage(0xc0001af400, 0x14e1780, 0xc0001f90c0, 0x14e7490, 0xc0001b4480, 0xe, 0xffffffffffffffff, 0x0, 0x0)
	/usr/local/Cellar/go/1.16/libexec/src/image/png/writer.go:473 +0x14a5
image/png.(*encoder).writeIDATs(0xc0001af400)
	/usr/local/Cellar/go/1.16/libexec/src/image/png/writer.go:531 +0xf0
image/png.(*Encoder).Encode(0xc00061eff0, 0x14e17e0, 0xc00143f140, 0x14e7490, 0xc0001b4480, 0x0, 0x0)
	/usr/local/Cellar/go/1.16/libexec/src/image/png/writer.go:632 +0x388
image/png.Encode(...)
	/usr/local/Cellar/go/1.16/libexec/src/image/png/writer.go:561
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.renderDCTEncodedImage(0xc000074000, 0xc0017fca10, 0x1015300, 0xc000272444, 0x4, 0x65, 0xc00143f050, 0xef00000000000000, 0x0, 0xc0001230e0, ...)
	/Users/renard/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/writeImage.go:784 +0x270
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.RenderImage(0xc000074000, 0xc0017fca10, 0x0, 0xc000272444, 0x4, 0x65, 0x1, 0xc00312f630, 0x0, 0x1, ...)
	/Users/renard/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/writeImage.go:804 +0x1c5
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.(*Context).ExtractImage(0xc0000723c0, 0xc0017fca10, 0x0, 0xc000272444, 0x4, 0x65, 0x0, 0x1, 0x0, 0x0)
	/Users/renard/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/extract.go:314 +0x310
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.(*Context).ExtractPageImages(0xc0000723c0, 0x46, 0x0, 0xc000134090, 0x1, 0x1, 0x0, 0x1)
	/Users/renard/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/extract.go:336 +0x154
github.com/pdfcpu/pdfcpu/pkg/api.ExtractImagesRaw(0x14e5680, 0xc00012e590, 0x0, 0x0, 0x0, 0xc00013a580, 0x1, 0xa5, 0x0, 0x0, ...)
	/Users/renard/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/api/extract.go:65 +0x2a5
main.readPDF(0x7ffeefbff952, 0x42, 0x0, 0x0)
	/Users/renard/Src/cbconvert/test/main.go:18 +0x165
main.main()
	/Users/renard/Src/cbconvert/test/main.go:34 +0xf3
exit status 2
  • Or the program enters an infinite loop (I can only assume since program get stuck at ExtractImagesRaw).

Seems to be related to either #329 or #323 (not sure).
I can provide by email some files having this issue.

Code I use:

package main

import (
	"fmt"
	"os"

	pdfapi "github.com/pdfcpu/pdfcpu/pkg/api"
)

func readPDF(file string) (err error) {
	f, err := os.Open(file)
	if err != nil {
		return err
	}
	defer f.Close()

	fmt.Printf("PDF: Opening %s\n", f)
	pages, err := pdfapi.ExtractImagesRaw(f, nil, nil)
	if err != nil {
		fmt.Printf("PDF: %s\n", err)
		return
	}
	for _, page := range pages {
		// Pre v0.3.12
		//fn := fmt.Sprintf("%.5d.%s", page.PageNr, page.Type)
		// Post v0.3.12
		fn := fmt.Sprintf("%s.%s", page.Name, page.FileType)
                // In real world, this function copies the image content:
		//buf := new(bytes.Buffer)
		//_, err = io.Copy(buf, page.Reader)
		fmt.Printf("Adding %s\n", fn)
	}
	fmt.Printf("PDF: Read %d pages\n", len(pages))
	return
}

# arg1 is the pdf file to inspect
func main() {
	err := readPDF(os.Args[1])
	if err != nil {
		fmt.Printf("Error: %s\n", err)
		panic(err)
	}
}
@hhrutter
Copy link
Collaborator

Please provide one small file that reproduces your symptoms so I can provide a fix.

@hhrutter hhrutter self-assigned this Jul 12, 2021
@hhrutter hhrutter added the bug label Jul 12, 2021
@renard
Copy link
Author

renard commented Jul 12, 2021

Looks like ExtractImagesRaw is not used any more. I changed my code with:

package main

import (
	"fmt"
	"os"

	pdfapi "github.com/pdfcpu/pdfcpu/pkg/api"
	"github.com/pdfcpu/pdfcpu/pkg/pdfcpu"
)

func readPDF(file string) (err error) {
	f, err := os.Open(file)
	if err != nil {
		return err
	}
	defer f.Close()

	fmt.Printf("PDF: Opening %s\n", f)
	err = pdfapi.ExtractImages(f, nil, PrintImg, nil)
	if err != nil {
		fmt.Printf("PDF: %s\n", err)
		return
	}
	return
}

func PrintImg(img pdfcpu.Image, singleImgPerPage bool, maxPageDigits int) error {
	fmt.Printf("s:%s, d:%d, %#v\n", singleImgPerPage, maxPageDigits, img)
	return nil

}

func main() {
	fmt.Printf("ARGS: %#v\n", os.Args)
	err := readPDF(os.Args[1])
	if err != nil {
		fmt.Printf("Error: %s\n", err)
		panic(err)
	}
}

And it does work until:

s:%!s(bool=true), d:2, pdfcpu.Image{Reader:(*bytes.Buffer)(0xc000f4c4b0), Name:"Im15", FileType:"png", pageNr:15, objNr:95, width:0, height:0, bpc:0, cs:"", comp:0, sMask:false, imgMask:false, thumb:false, interpol:false, size:0, filter:""}
panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
image.(*YCbCr).YCbCrAt(0xc000224400, 0x0, 0x0, 0x1001)
	/usr/local/Cellar/go/1.16/libexec/src/image/ycbcr.go:81 +0x130
image.(*YCbCr).At(0xc000224400, 0x0, 0x0, 0xc0039f7980, 0x5a8)
	/usr/local/Cellar/go/1.16/libexec/src/image/ycbcr.go:71 +0x45
image/png.(*encoder).writeImage(0xc000139900, 0x14d7880, 0xc000223080, 0x14dd5d0, 0xc000224400, 0xe, 0xffffffffffffffff, 0x0, 0x0)
	/usr/local/Cellar/go/1.16/libexec/src/image/png/writer.go:473 +0x14a5
image/png.(*encoder).writeIDATs(0xc000139900)
	/usr/local/Cellar/go/1.16/libexec/src/image/png/writer.go:531 +0xf0
image/png.(*Encoder).Encode(0xc00000f350, 0x14d78e0, 0xc000f4cb10, 0x14dd5d0, 0xc000224400, 0x0, 0x0)
	/usr/local/Cellar/go/1.16/libexec/src/image/png/writer.go:632 +0x388
image/png.Encode(...)
	/usr/local/Cellar/go/1.16/libexec/src/image/png/writer.go:561
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.renderDCTEncodedImage(0xc000202000, 0xc0014c1f80, 0x1015300, 0xc0002bbc44, 0x4, 0x65, 0xc000f4c4e0, 0x5f00000000000000, 0x0, 0xc0000a8f18, ...)
	/Users/renard/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/writeImage.go:784 +0x270
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.RenderImage(0xc000202000, 0xc0014c1f80, 0x0, 0xc0002bbc44, 0x4, 0x65, 0x1, 0xc003155710, 0x0, 0x1, ...)
	/Users/renard/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/writeImage.go:804 +0x1c5
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.(*Context).ExtractImage(0xc0002003c0, 0xc0014c1f80, 0x0, 0xc0002bbc44, 0x4, 0x65, 0xc001854000, 0x0, 0xc000145aa0, 0x10c71d1)
	/Users/renard/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/extract.go:314 +0x310
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.(*Context).ExtractPageImages(0xc0002003c0, 0x10, 0xc0001fb800, 0x4, 0x1452e6c, 0x3, 0xf, 0x5f)
	/Users/renard/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/pdfcpu/extract.go:336 +0x154
github.com/pdfcpu/pdfcpu/pkg/api.ExtractImages(0x14db740, 0xc0000b45a0, 0x0, 0x0, 0x0, 0x147c088, 0xc0000c2630, 0xa5, 0x0)
	/Users/renard/go/pkg/mod/github.com/pdfcpu/pdfcpu@v0.3.12/pkg/api/extract.go:115 +0x345
main.readPDF(0x7ffeefbff952, 0x42, 0x0, 0x0)
	/Users/renard/Src/cbconvert/test/main.go:19 +0x16d
main.main()
	/Users/renard/Src/cbconvert/test/main.go:45 +0xf3
exit status 2

@renard
Copy link
Author

renard commented Jul 12, 2021

Please provide one small file that reproduces your symptoms so I can provide a fix.

Sent by mail

@hhrutter
Copy link
Collaborator

Yes, there was some refactoring going on in that area in order ro make the reader containing image data optional for listing images where this is not needed.

Thanks for reporting this.
I'll keep you posted.

@hhrutter
Copy link
Collaborator

👍 This is fixed with the latest commit.
You can still use ExtractImagesRaw if you don't care about getting back ALL images in one memory chunk.

I encourage everybody to go get the latest commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants