You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A lot of text comes out very difficult to parse, because multiple items of text on a line get combined, with no separator. The pdf-parse library had a disableCombineTextItems property on render options which could improve this situation. Perhaps something like that, or a "line items separator" string you could specify that gets inserted in between same-line items of text.
The text was updated successfully, but these errors were encountered:
@JonSilver hi and thank you :)
Sorry for the late reply. Actually, I always try to answer in the next few days but I was on vacation for a week.
That's right the text contents are combined per line.
Currently I am working on another parsing function that allows you to access the complete content of the pdf.
All composites will be returned in an array.
Would this be a solution for you?
Yes I suppose an array of items would be pretty good too, but an optional, settable delimiter to be included in the output text between items would be great for regex parsing. Different purposes, different solutions. 😊
Nice work on this library! Just wondering...
A lot of text comes out very difficult to parse, because multiple items of text on a line get combined, with no separator. The
pdf-parse
library had adisableCombineTextItems
property on render options which could improve this situation. Perhaps something like that, or a "line items separator" string you could specify that gets inserted in between same-line items of text.The text was updated successfully, but these errors were encountered: