Problems reading scientific papers using pdfminer #354
Replies: 2 comments 1 reply
-
Hi @ceilbeck, and thanks for your interest in this library. In this case, it appears that the PDF is malformed — something that's not related to Running that tool like so: cpdf Fujita2016.pdf -o Fujita2016-fixed.pdf ... produces this output:
Then, running the same code as above, but swapping in |
Beta Was this translation helpful? Give feedback.
-
The units, there and throughout
If the layout is predictable between pages, I think the simplest approach would be to use |
Beta Was this translation helpful? Give feedback.
-
I am trying to identify some key words in a large collections of PDF files for a neuro-rehab study. As an example, my python program miner1.py to analyse Fujita2016.pdf is
With the file Fujita2016.pdf
Fujita2016.pdf
The key words get printed out together with 20 characters on either side. The program works to some extent, but only recognises the first page of the PDF. More minor problems are that some of the text gets printed out without white space separators, and the double-column format causes some confusion.
I am running Python 3.8.5.
I'd be most grateful for any suggestions.
Chris Eilbeck
Beta Was this translation helpful? Give feedback.
All reactions