# Camelot: Text-based PDFs with tables

## Installation, part 1

First, we'll need to install Ghostscript and Tkinter, two things Camelot depends on. You can follow the [installation instructions](https://camelot-py.readthedocs.io/en/master/user/install-deps.html#install-deps), but I've also reproduced them below.

### OS X

You can install them on OS X with Homebrew.

```
brew install ghostscript tcl-tk
```

This might take a little while.

### Windows

Download and run the installers for [Ghostscript](https://www.ghostscript.com/download/gsdnld.html) and [ActiveTcl](https://www.activestate.com/products/tcl/downloads/)

## Installation, part 2

Hopefully they installed correctly, and **now you can actually install Camelot.** [Instructions are here](https://camelot-py.readthedocs.io/en/master/user/install.html#install), but I've also reproduced them below.

```
pip install "camelot-py[base]"
```

If Ghostscript doesn't successfully install (above), you won't be able to read **lattice**-style PDFs.

## Use


Simple use is like so:

```python
import camelot

tables = camelot.read_pdf("......")
```

and a single table is turned into a dataframe with

```python
tables[0].df
```

**Try to read in `players/players.pdf`.**

## Option: flavor

...but to make things work correctly, you often have to specify `flavor=` as one of two options:

* `lattice` for borders between things
* `stream` for empty space between things

## Option: pages

Try `misc-pdfs/2018_625_Ad_Val_Tax_Levy_Report_648625_7.pdf`. Does it pull in all of the tables?

You can specify the page range with `pages=`. For example, `1,2` or `1-end`.

## Combining the the tables into a dataframe

You can take every single dataframe that's pulled out of the page and combine it into a single big dataframe.

```python
import pandas as pd

dfs = [table.df for table in tables]
pd.concat(dfs, ignore_index=True)
```

## Fun debugging and investigating tricks

```
camelot.plot(tables[0], kind='text').show()
camelot.plot(tables[0], kind='grid').show()
camelot.plot(tables[1], kind='contour').show()
```


## Non-English PDFs

It doesn't care, it isn't doing OCR. Try for `non-english/museums.pdf`.