Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further improve parsing of dictionaries / names #795

Merged
merged 2 commits into from Feb 29, 2024

Conversation

fancycode
Copy link
Contributor

With this change, the names are decoded internally, so they can be compared directly when adding entries to dictionaries. On writing, the names are encoded if necessary.

Also removed some duplicate code for name encoding / decoding and simplified object type tests.

Follow-up to #776 to also speed up parsing dictionaries that contain key with a #.

@hhrutter
Copy link
Collaborator

hhrutter commented Jan 29, 2024

Is there any way to get into a discussion of an issue.
This especially applies for PRs with heavy, critical changes.
eg. What's the motivation for this, the issue at hand etc.

@fancycode
Copy link
Contributor Author

As written above, it's a follow-up to the previous change in #776, related to issue #775.

Even with the change from #776 you can construct a PDF with dictionaries that take ages to parse (see the updated test in

sb.WriteString("/Key#28#29 (Value)")
).

I'm happy to discuss the changes in this merge request - at least that's my understanding what merge requests are for. You can easily comment on individual lines, add a review or add global comments as we are doing right now.

@hhrutter
Copy link
Collaborator

It's easy to come up with spec compliant PDFs that pdfcpu will choke on but that's true for many pdfcpu processors out there. Instead of focusing on theoretical corner cases I'd like to spend my time on real word PDFs that are spec compliant and yet cause trouble.

Yet I am on board if this is about speeding up parsing of average but bigger PDF files.
It will take me some time to get to reviewing your proposal, please bear with me.

@hhrutter
Copy link
Collaborator

Do you think you can rebase this onto the latest commit?
Will help big time cutting a new release by the end of the day 😉

With this change, the names are decoded internally, so they be can compared
directly when adding entries to dictionaries. On writing, the names are
encoded if necessary.

Also removed some duplicate code for name encoding / decoding.
@fancycode
Copy link
Contributor Author

Sure, just rebased the branch on fc87a22.

@hhrutter
Copy link
Collaborator

heads up... ValidationNone is gone in case you were using it..

hhrutter added a commit that referenced this pull request Feb 29, 2024
@hhrutter hhrutter merged commit 044a6c0 into pdfcpu:master Feb 29, 2024
12 of 15 checks passed
@hhrutter
Copy link
Collaborator

Thanks!

@fancycode fancycode deleted the improve-name-parsing branch February 29, 2024 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants