pdfminer.layout
Encapsulates “layout parameters”, used to determine the layout of the PDF in conversion. Specifically, holds text direction, line overlap, character margin, line margin and word margin values.
function __init__(direction=None, line_overlap=0.5, char_margin=1.0, line_margin=0.5, word_margin=0.1)
direction
looks like it should be a string, where 'V'
denotes vertical text direction, and anything else denotes horizontal (LTR) text direction. It looks as if right-to-left text is not supported at all.
Note: the numerical values for line_overlap
, char_margin
, line_margin
and word_margin
are not specified as an actual length, but as a proportion of the length to the size of each character in question.
line_overlap
— not entirely sure what this does, but presumably dictates if text lines overlap (to handle negative leading?)
char_margin
— two text chunks whose distance is closer than this value are considered contiguous and get grouped into one.
line_margin
— two lines whose distance is closer than the the value here are grouped as a text box, which is a rectangular area that contains a “cluster” of texts.
word_margin
— it may be required to insert blank characters (spaces) as necessary if the distance between two words is greater than this value, as a blank between words might not be represented as a space, but indicated by the positioning of each word.