Skip to content

How to exclude header and footer while extracting text ? #968

Answered by jsvine
Laxmi530 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @Laxmi530, and thanks for your interest in pdfplumber. PDFs don't have a specific concept of a header or footer; whatever looks like a header or footer to a human is a design decision made by the PDF's creator. That said, if you know where the header ends and the footer begins, you can use page.crop((x0, top, x1, bottom)).extract_text(...) to get just the text in the core region.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@Laxmi530
Comment options

@dhdaines
Comment options

@dhdaines
Comment options

Answer selected by Laxmi530
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants