Further improve parsing of dictionaries / names #795

fancycode · 2024-01-29T14:39:20Z

With this change, the names are decoded internally, so they can be compared directly when adding entries to dictionaries. On writing, the names are encoded if necessary.

Also removed some duplicate code for name encoding / decoding and simplified object type tests.

Follow-up to #776 to also speed up parsing dictionaries that contain key with a #.

hhrutter · 2024-01-29T15:42:00Z

Is there any way to get into a discussion of an issue.
This especially applies for PRs with heavy, critical changes.
eg. What's the motivation for this, the issue at hand etc.

fancycode · 2024-01-29T15:53:31Z

As written above, it's a follow-up to the previous change in #776, related to issue #775.

Even with the change from #776 you can construct a PDF with dictionaries that take ages to parse (see the updated test in

pdfcpu/pkg/pdfcpu/model/parse_dict_test.go

Line 153 in 4978e9c

sb.WriteString("/Key#28#29 (Value)")

).

I'm happy to discuss the changes in this merge request - at least that's my understanding what merge requests are for. You can easily comment on individual lines, add a review or add global comments as we are doing right now.

hhrutter · 2024-01-29T21:05:58Z

It's easy to come up with spec compliant PDFs that pdfcpu will choke on but that's true for many pdfcpu processors out there. Instead of focusing on theoretical corner cases I'd like to spend my time on real word PDFs that are spec compliant and yet cause trouble.

Yet I am on board if this is about speeding up parsing of average but bigger PDF files.
It will take me some time to get to reviewing your proposal, please bear with me.

hhrutter · 2024-02-28T10:24:40Z

Do you think you can rebase this onto the latest commit?
Will help big time cutting a new release by the end of the day 😉

With this change, the names are decoded internally, so they be can compared directly when adding entries to dictionaries. On writing, the names are encoded if necessary. Also removed some duplicate code for name encoding / decoding.

fancycode · 2024-02-28T10:27:33Z

Sure, just rebased the branch on fc87a22.

hhrutter · 2024-02-28T10:33:50Z

heads up... ValidationNone is gone in case you were using it..

hhrutter · 2024-02-29T00:38:48Z

Thanks!

fancycode added 2 commits February 28, 2024 11:26

Further improve parsing of dictionaries / names.

3d4cbdb

With this change, the names are decoded internally, so they be can compared directly when adding entries to dictionaries. On writing, the names are encoded if necessary. Also removed some duplicate code for name encoding / decoding.

Use type switch instead of long list of type tests.

044a6c0

fancycode force-pushed the improve-name-parsing branch from 50422c4 to 044a6c0 Compare February 28, 2024 10:26

hhrutter added a commit that referenced this pull request Feb 29, 2024

Merge #795, cleanup

dedaddc

hhrutter merged commit 044a6c0 into pdfcpu:master Feb 29, 2024
12 of 15 checks passed

fancycode deleted the improve-name-parsing branch February 29, 2024 07:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further improve parsing of dictionaries / names #795

Further improve parsing of dictionaries / names #795

fancycode commented Jan 29, 2024

hhrutter commented Jan 29, 2024 •

edited

fancycode commented Jan 29, 2024

hhrutter commented Jan 29, 2024

hhrutter commented Feb 28, 2024

fancycode commented Feb 28, 2024

hhrutter commented Feb 28, 2024

hhrutter commented Feb 29, 2024

Further improve parsing of dictionaries / names #795

Further improve parsing of dictionaries / names #795

Conversation

fancycode commented Jan 29, 2024

hhrutter commented Jan 29, 2024 • edited

fancycode commented Jan 29, 2024

hhrutter commented Jan 29, 2024

hhrutter commented Feb 28, 2024

fancycode commented Feb 28, 2024

hhrutter commented Feb 28, 2024

hhrutter commented Feb 29, 2024

hhrutter commented Jan 29, 2024 •

edited