Skip to content

Possible bug: two letters in 'text' field of 'char' #361

Answered by jsvine
LivingDeadCloud asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @LivingDeadCloud, I think you've come across a ligature — a typographic convention in which two letters combined into a single symbol: https://en.wikipedia.org/wiki/Orthographic_ligature

So the char you see is, indeed, one symbol — but it represents two letters. I don't know exactly where the conversion happens, but it's "before" pdfplumber. My best guess, without doing further research, is that it's "before" pdfminer.six as well and instead part of the font's definition.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by LivingDeadCloud
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants