Can subscript be extracted #990
Closed
fsmosca
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 2 replies
-
Hi @fsmosca Appreciate your interest in the library. You'll have to write your own logic to identify what characters are subscript or superscript based on their size attribute Closing as duplicate of #730 |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a pdf containing a table with subscript text. I tried to extract it, converted it to csv using pandas and save it with utf-8-sig encoding. But it is not working. There is a newline char inserted.
output
How to eliminate the newline char
\n
in table creation?This should have been NH4 and C10-13.
Note if the subscript in the pdf is really a unicode symbol, then the library can handle it.
Beta Was this translation helpful? Give feedback.
All reactions