You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to thank you for working on this great package. It's extremely useful and has plenty of applications. I hope you continue to work and maintain it.
I noted that one of the limitations (as you mentioned) is text fragmentation when the text in pdf are in columns (eg most scientific articles). I came across this function tabulizer::extract_text(file) which can read multiple columns. I wonder if you can use something similar in your package to fix that issue. This tabulizer function will also still also cause issues with tables and images/table captions but at least will get the flow of the main text correct.
thank you
The text was updated successfully, but these errors were encountered:
Thanks for the comments. I, with one of my graduate students, are currently working on expanding this package and a companion package. One of the elements we are working to improve is this feature. I don't plan to use the tabulizer package as it has some pretty strict dependencies (ie, Java). However, look for some improvements coming soon to multiple column PDFs.
I would like to thank you for working on this great package. It's extremely useful and has plenty of applications. I hope you continue to work and maintain it.
I noted that one of the limitations (as you mentioned) is text fragmentation when the text in pdf are in columns (eg most scientific articles). I came across this function
tabulizer::extract_text(file)
which can read multiple columns. I wonder if you can use something similar in your package to fix that issue. This tabulizer function will also still also cause issues with tables and images/table captions but at least will get the flow of the main text correct.thank you
The text was updated successfully, but these errors were encountered: