Space regression by PR 1172 #1362
Labels
is-bug
From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
whitespace
While doing extract_text, getting the right number of whitespaces (spaces and newlines) is hard.
workflow-text-extraction
From a users perspective, text extraction is the affected feature/workflow
I've just noticed that PR #1172 introduced a space regression issue for text extraction. A lot of spaces got removed. Those spaces should have stayed.
Code + PDF
Just standard text extraction:
PDFs:
See https://arxiv.org/pdf/2201.00029.pdf :
The text was updated successfully, but these errors were encountered: