Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle unicode string #30

Merged
merged 1 commit into from
Aug 5, 2015
Merged

handle unicode string #30

merged 1 commit into from
Aug 5, 2015

Conversation

tjwei
Copy link
Contributor

@tjwei tjwei commented Aug 5, 2015

Fix a few regular expression for pdf string.
My project https://github.com/tjwei/translatePDF uses pdfrw and handling a lot of unicode pdf string.
This is a patch that works for handling various Chinese string in pdf.
This patch fixed the following issue:
(\0160) should be parsed as \016 0 not oct(0160), so it should be decoded into \xe30 not max(int(1600, 8), 127).

@pmaupin pmaupin merged commit 66271ce into pmaupin:master Aug 5, 2015
@pmaupin
Copy link
Owner

pmaupin commented Aug 5, 2015

Thanks!

@pmaupin
Copy link
Owner

pmaupin commented Aug 5, 2015

Do you have a PDF that fails on the old code and works on the new code? I could add it to the tests.

@tjwei
Copy link
Contributor Author

tjwei commented Aug 5, 2015

I have a few, but unfortunately are all copyrighted. Attempted to use xetex to generated one, but not successful. Will send you one when I found it.

@pmaupin
Copy link
Owner

pmaupin commented Aug 5, 2015

It really wants a unittest in here anyway. I'll add an issue for that.

Thanks,
Pat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants