tabula-py
is a simple Python wrapper of tabula-java, which can read table of PDF.
You can read tables from PDF and convert into pandas's DataFrame.
- Java
- pandas
pip install tabula-py
See example notebook
- pages (str, int,
list
ofint
, optional)- An optional values specifying pages to extract from. It allows
str
,int
,list
ofint
. - Example: 1, '1-2,3', 'all' or [1,2]. Default is 1
- An optional values specifying pages to extract from. It allows
- guess (bool, optional):
- Guess the portion of the page to analyze per page.
- area (
list
offloat
, optional):- Portion of the page to analyze(top,left,bottom,right).
- Example: [269.875, 12.75, 790.5, 561]. Default is entire page
- spreadsheet (bool, optional):
- Force PDF to be extracted using spreadsheet-style extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet)
- nospreadsheet (bool, optional):
- Force PDF not to be extracted using spreadsheet-style extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet)
- password (bool, optional):
- Password to decrypt document. Default is empty
- silent (bool, optional):
- Suppress all stderr output.