Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hyphenated wrapped words not resolving #16

Open
alxhoff opened this issue Mar 26, 2020 · 2 comments
Open

hyphenated wrapped words not resolving #16

alxhoff opened this issue Mar 26, 2020 · 2 comments
Labels

Comments

@alxhoff
Copy link
Contributor

alxhoff commented Mar 26, 2020

hyphenated words that just happen to fall at the end of a line are reconstructed without the hyphen.

In my paper I have this example.
`...but with very contrasting power-
performance thread....."

This becomes
"but with very contrasting powerperformance thread"

after pdf2text. No idea if it's solvable but thought I'd let you know.

@emareg
Copy link
Owner

emareg commented May 7, 2020

Thanks for the hint. I am aware of that and I think it is solvable by checking against a spell checker. Otherwise it is not possible to tell if hyphens are intra or inter words. E.g. "high- end" vs. "high- lighting".
If it is your own paper, the best solution is probably to run the script on the .tex file.

@emareg emareg added the bug label Jul 29, 2020
@emareg
Copy link
Owner

emareg commented Jul 29, 2020

I added a first mechanism to resolve the hyphenation issue in d839993. So far the script looks for words at the end of a line containing the suffixes "based", "case", or "level", which indicate a potential error from the pdf2text tool but it is not perfect yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants