Skip to content
obeattie edited this page Sep 13, 2010 · 2 revisions

LAParams

Encapsulates “layout parameters”, used to determine the layout of the PDF in conversion. Specifically, holds text direction, line overlap, character margin, line margin and word margin values.

function __init__(direction=None, line_overlap=0.5, char_margin=1.0, line_margin=0.5, word_margin=0.1)

direction looks like it should be a string, where 'V' denotes vertical text direction, and anything else denotes horizontal (LTR) text direction. It looks as if right-to-left text is not supported at all.

Note: the numerical values for line_overlap, char_margin, line_margin and word_margin are not specified as an actual length, but as a proportion of the length to the size of each character in question.

line_overlap — not entirely sure what this does, but presumably dictates if text lines overlap (to handle negative leading?)

char_margin — two text chunks whose distance is closer than this value are considered contiguous and get grouped into one.

line_margin — two lines whose distance is closer than the the value here are grouped as a text box, which is a rectangular area that contains a “cluster” of texts.

word_margin — it may be required to insert blank characters (spaces) as necessary if the distance between two words is greater than this value, as a blank between words might not be represented as a space, but indicated by the positioning of each word.