Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write invalid local Date #320

Closed
sircodemane opened this issue Apr 8, 2021 · 8 comments
Closed

write invalid local Date #320

sircodemane opened this issue Apr 8, 2021 · 8 comments
Labels

Comments

@sircodemane
Copy link

I have created a tool which uses the pdfcpu api to apply watermarks to PDF pages, but there seems to be an issue with the validateDateObject returning this error:

pdfcpu: validateDateObject: <D:20210408134302+-10'00'> invalid date

This is compiled and run on a Windows 10 x64 machine, and it seems to be related to Date & Time Settings in Windows.. I tested with a number of other timezones and it works flawlessly, but for some reason changing the time zone to (UTC -10:00) Hawaii will always result in this invalid date error.

If you need more context, please let me know what would help and I will send it to you. Thanks!

@petervwyatt
Copy link

The PDF Date string as shown here is invalid - it should have just one of the "+" or "-" signs representing the offset from UTC.
See clause 7.9.4 Dates in ISO 32000.

@sircodemane
Copy link
Author

sircodemane commented Apr 9, 2021

The PDF Date string as shown here is invalid - it should have just one of the "+" or "-" signs representing the offset from UTC.
See clause 7.9.4 Dates in ISO 32000.

@petervwyatt thanks for the information. To add more context, I have no idea where that date is being generated, or how to manually override it. I Set a breakpoint on validateDateObject() and tried to trace the function calls and see what was happening, and there's some Xref dereferencing logic that was a bit dense for me to comprehend, but it seems to be coming from there. The last call I make on the pdfcpu api surface is a call to AddWatermark()

Here is the code where the error occurs:

wm := pdfcpu.DefaultWatermarkConfig()
wm.Mode = pdfcpu.WMImage
wm.OnTop = true
wm.Diagonal = 0
wm.InpUnit = pdfcpu.POINTS
wm.FileName = editImgPath
wm.Pos = pdfcpu.TopLeft
wm.Scale = scale(refW, pageW)
wm.ScaleEff = float64(refW)
wm.ScaleAbs = true
wm.Dx = int(math.Round(calcPos(x, refW, pageW)))
wm.Dy = int(math.Round(-calcPos(y, refH, pageH)))
err = api.AddWatermarks(inFile, outFile, []string{strconv.Itoa(page)}, wm, nil)

@petervwyatt
Copy link

Sorry but I'm not a real Gopher (yet)... the error is likely to be in one of the input PDFs (the watermark or the PDF being watermarked) - can you share/check? e.g. your wm.FileName and inFile files.

AFAICT pkg/pdfcpu/date.go parses and validates PDF Date literal ASCII strings correctly.

My only very minor gripe for @hhrutter would be that PDF Date strings don't have to be literal ASCII strings - they can be hex strings, have escape sequences, etc. The "date-ness" processing needs to happen after the PDF string object (of any kind) is parsed and then normalized. But that has precisely zero to do with this error and is only FYI for @hhrutter...

@hhrutter
Copy link
Collaborator

hhrutter commented Apr 9, 2021

We have strict and relaxed validation where all pdfcpu commands default to relaxed.
We can extend relaxedvalidation to digest these odd Dates but I am always curious where these kind of issues originate.
Do you think you can you share a sample or even this file or can you post the output of pdfcpu info?

Thank you for using pdfcpu 💚

@petervwyatt Duly noted.
Since it is hard to get a hand on test PDFs using these spec. corner cases I'd rather wait for corresponding issues aka real world cases and then implement these incrementallly and at the same time add them to my pdfcpu test corpus.

@sircodemane
Copy link
Author

@hhrutter Thank you for the awesome library ❤️
@petervwyatt thank you for contributing your knowledge here ❤️

I don't believe its related to the PDF's we are using since changing the timezone in Windows settings seems to be the difference between working or erroring, but I could be entirely wrong. Unfortunately I cannot share the PDF's since they contain confidential patient medical information and require HIPPA compliance. However, I do believe I can create a working minimal example and share all the necessary resources for testing and debugging. I will report back once I have those materials ready. Thanks for all your help.

@sircodemane
Copy link
Author

@hhrutter I have created an example repo with everything need to simulate a failure case. Running the example in my local time (US mountain time) yields no error, but running it with my time zone set to (UTC -10:00) Hawaii generates the failure.

I learned more while creating this: it seems like @petervwyatt was correct in his suspicion that the issue was an invalid date object from an input file. This issue doesn't seem to arise from the original input file, as I was unable to trigger the error when adding watermarks directly to an unmodified PDF. In our case, we are first merging several files into a single "work file" using api.Merge(), and then applying watermarks to the merged PDF. It seems that Merge() may actually be the culprit here, possibly writing invalid date strings dependent on the host machine's time zone settings.

Please view this example repo for the code, materials, and information to reproduce.

@petervwyatt
Copy link

Thanks @codydbentley - I can now repro when I export TZ='Pacific/Honolulu' and TZ='Pacific/Niue' but not for TZ='Australia/Sydney' or TZ='America/Yakutat'.

I'm guessing the line at fault is line 29 of date.go as it hardcodes "+" outside the %02d that creates the UMT offset hour :

`return fmt.Sprintf("D:%d%02d%02d%02d%02d%02d+%02d'%02d'",`

I think it needs to be "%+02d" to get the sign character from the offset of tz/60/60 or some trickery with getting tz from time.Zone(). But I'm not sufficiently a Gopher to be 100% confident.

@hhrutter hhrutter changed the title "invalid date" error from validateDateObject write invalid local Date Apr 10, 2021
@hhrutter hhrutter added the bug label Apr 10, 2021
@hhrutter
Copy link
Collaborator

Fixed with latest commit.
There was a general bug during writing Dates.

Thanks for uncovering this 💚

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants