Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for short dash array (invalid border array: [0 0 1 [3]]) #766

Closed
dpb587 opened this issue Jan 5, 2024 · 8 comments
Closed

Support for short dash array (invalid border array: [0 0 1 [3]]) #766

dpb587 opened this issue Jan 5, 2024 · 8 comments
Assignees

Comments

@dpb587
Copy link

dpb587 commented Jan 5, 2024

Hello - I was reading a PDF file and ran into the following error during an info/api.PDFInfo call:

invalid border array: [0 0 1 [3]]

tldr... seems like a spec-compliant representation and the package should support the case.

Reproduction

pdfcpu

$ git clone https://github.com/pdfcpu/pdfcpu
$ git checkout v0.6.0

sample (PDF/1.1, producer Acrobat PDFWriter 2.0 for Windows, circa 1995)

$ curl -o "$TMPDIR/sample.pdf" https://www.iptc.org/std/IIM/3.0/specification/IIMV3.PDF
$ echo "7799f6fef4308db9f671ba40e4acfebd1ecea943e295e03d5733c8d650539ad9 $TMPDIR/sample.pdf" | sha256sum -c

error

$ go run ./cmd/pdfcpu info "$TMPDIR/sample.pdf"                    
invalid border array: [0 0 1 [3]]
exit status 1

Investigation

From some manual debugging of the file, it seems like the error originates first from page 6, subtype of Link. Additionally, the full stack to the failing expectation is included below.

Debug Stack
goroutine 1 [running]:
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validateBorderArray(0xc00019a1e0, {0xc000696e00?, 0x15bec4f?, 0x9?})
	./pkg/pdfcpu/validate/annotation.go:1418 +0xb2
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validateAnnotationDictGeneralPart2(0xc00019a1e0?, 0xc0001932f8?, {0x15bec4f, 0x9})
	./pkg/pdfcpu/validate/annotation.go:1497 +0x17f
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validateAnnotationDictGeneral(0xc00037cbf0?, 0x16ae260?, {0x15bec4f, 0x9})
	./pkg/pdfcpu/validate/annotation.go:1520 +0x5c
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validateAnnotationDict(0x1583da0?, 0x16ae260?)
	./pkg/pdfcpu/validate/annotation.go:1602 +0x33
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validatePageAnnotations(0xc00019a1e0, 0x15bbf7b?)
	./pkg/pdfcpu/validate/annotation.go:1667 +0x2bf
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validatePagesAnnotations(0xc00019a1e0, 0x15bbf7b?, 0x0)
	./pkg/pdfcpu/validate/annotation.go:1745 +0x2bc
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validatePagesAnnotations(0xc00019a1e0, 0x15bbf7b?, 0x0)
	./pkg/pdfcpu/validate/annotation.go:1737 +0x2f9
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validatePagesAnnotations(0xc00019a1e0, 0x70000c000001618?, 0x0)
	./pkg/pdfcpu/validate/annotation.go:1737 +0x2f9
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validateRootObject(0xc00019a1e0)
	./pkg/pdfcpu/validate/xReftable.go:928 +0x3d0
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.XRefTable(0xc00019a1e0)
	./pkg/pdfcpu/validate/xReftable.go:44 +0xf0
github.com/pdfcpu/pdfcpu/pkg/api.readAndValidate({0x16ad518?, 0xc000012530?}, 0xc0001a1930, {0xc000193a20?, 0x10b61a5?, 0x1a1df80?})
	./pkg/api/api.go:133 +0xea
github.com/pdfcpu/pdfcpu/pkg/api.PDFInfo({0x16ad518, 0xc000012530}, {0x7ff7bfeffad1, 0x3c}, {0x0, 0x0, 0x0}, 0xc000193a88?)
	./pkg/api/info.go:42 +0xad
github.com/pdfcpu/pdfcpu/pkg/cli.ListInfoFile({0x7ff7bfeffad1, 0x3c}, {0x0, 0x0, 0x0}, 0x10b412c?)
	./pkg/cli/list.go:279 +0x10f
github.com/pdfcpu/pdfcpu/pkg/cli.ListInfoFiles({0xc0001de2f0?, 0x1, 0x104a312?}, {0x0, 0x0, 0x0}, 0xe0?, 0x10c945e?)
	./pkg/cli/list.go:345 +0x233
github.com/pdfcpu/pdfcpu/pkg/cli.ListInfo(0x1549e60?)
	./pkg/cli/cli.go:193 +0x45
github.com/pdfcpu/pdfcpu/pkg/cli.Process(0xc000024580)
	./pkg/cli/process.go:35 +0xba
main.process(0xc0001de2f0?)
	./cmd/pdfcpu/process.go:149 +0x1d
main.processInfoCommand(0xc0001a1930)
	./cmd/pdfcpu/process.go:1441 +0x40a
main.commandMap.process(0xc00008c058?, {0x7ff7bfeffacc, 0x4}, {0x0, 0x0})
	./cmd/pdfcpu/cmd.go:143 +0x342
main.main()
	./cmd/pdfcpu/main.go:56 +0xaf

Reviewing the PDF Reference Manual, Version 1.1, I see the following relevant pieces...

Page 76 (about the Border annotation attribute) it describes 1.1 introducing the fourth, array element. In annotation.go it looks like that is currently supported, but the code expects an array of exactly 2 items. Interestingly, the example given in the manual is exactly what the sample file uses:

An example of a border with a dash array is [ 0 0 1 [ 3 ] ].

Page 147 formally describes the setdash operator (of which the array is the optional, fourth border element).

Sets the dash pattern parameter in the graphics state. If array is empty, the dash pattern is a solid, unbroken line, otherwise array is an array of numbers, all non-negative and at least one non-zero, that specifies distances in user space for the length of dashes and gaps. phase is a number that specifies a distance in user space into the dash pattern at which to begin marking the path. The default dash pattern is a solid line.

Page 144 gives several examples of single-item arrays which state its just equivalent on/off spans.

  • ------------- from [] 0 as turn dash off -- solid line
  • --- --- - from [3] 0 as 3 units on, 3 units off, ...
  • - -- -- -- from [2] 1 as 1 on, 2 off, 2 on, 2 off, ...
  • -- -- -- -- - from [2 1] 0 as 2 on, 1 off, 2 on, 1 off, ...
  • [others omitted]

The descriptions of array are a little ambiguous about the maximum number of items and texts suggest the array should simply be cycled through for dash/spacing. But I can only find examples of 0-, 1-, and 2-length arrays (including in PDF 1.7 / 32000-1:2008 reference).

I know this package doesn't rasterize, but, for what its worth, from a Mac the annotation(s) were evaluated as follows:

  • Adobe Acrobat -- renders a dashed green border
  • Preview (OS-native) -- renders a solid green border (seems like it always renders link borders solid?)
  • Firefox (120.0.1) embedded viewer -- renders a dashed green border
  • Chrome (120.0.6099.129) embedded viewer -- renders no border (seems like it never renders link borders?)

Proposal

Change validation to the following:

diff --git a/pkg/pdfcpu/validate/annotation.go b/pkg/pdfcpu/validate/annotation.go
index 5ba27b7..91de8cf 100644
--- a/pkg/pdfcpu/validate/annotation.go
+++ b/pkg/pdfcpu/validate/annotation.go
@@ -1408,7 +1408,7 @@ func validateBorderArray(xRefTable *model.XRefTable, a types.Array) bool {
        if !ok {
                return xRefTable.ValidationMode == model.ValidationRelaxed
        }
-       if len(a1) != 2 {
+       if len(a1) > 2 {
                return false
        }

Which then allows the info calls to succeed:

$ go run ./cmd/pdfcpu info "$TMPDIR/sample.pdf" | grep Page   
          Page count: 49
           Page size: 595.00 x 842.00 points

I'm not too familiar with other ways this might affect the codebase, but I didn't see potential side effects from a quick look.

It's not clear to me if/when ValidationRelaxed should be respected, but it may be an alternative for this condition, too.

I saw the project prefers issues before pull requests, but pushed the one-liner to a branch if you want to merge it as is. If you have some guidance on how you'd test this, I'd be happy to update the branch and/or send a formal PR.

Miscellaneous

  • Thank you for this package. I originally found it when searching for a library that offered low-level, and spec-compliant models that I could build on and extend for analysis tools.
  • validate: invalid border array #711 -- semi-recent border validation change, but caused by non-compliant annotations
@hhrutter
Copy link
Collaborator

hhrutter commented Jan 8, 2024

Thanks for rerporting this!

@bosix
Copy link

bosix commented Jan 12, 2024

Hi :),

we've a similar issue where the validation (relaxed and strict) failed with invalid border array: [0 0 0 [0]]. If I unterstand the previous comments correct, this is an issue inside pdfcpu.

@hhrutter can you give a time estimate when the bug will be solved?

@hhrutter
Copy link
Collaborator

Coming up..

@hhrutter
Copy link
Collaborator

This should be fixed with latest commit.
Let me know..

@bosix
Copy link

bosix commented Jan 15, 2024

Hello @hhrutter,

Thank you for the update. Unfortunately, in our case, it still fails with the same error.

I'm not entirely familiar with the PDF specification: Is it permissible to have [0 0 0 [0]] as the border array?

Regards

@hhrutter
Copy link
Collaborator

That would be a spec violation:

...the numbers (of a line dash pattern) shall be nonnegative and not all zero...

nn

@dpb587
Copy link
Author

dpb587 commented Jan 22, 2024

Thank you for the additional research time involved in the committed change. I confirmed it reads the file successfully now.

hhrutter added a commit that referenced this issue Jan 29, 2024
@hhrutter
Copy link
Collaborator

@ALL: The latest commit relaxes validation of line dash patterns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants