A simple library for extracting text from any PDF in Python x AWS.
pip install textasaurus
Get an API key from the textasaurus API
TEXTASAURUS_API_KEY=Your_API_KEY
Run single file
textasaurus your_file.pdf
Run file directory
textasaurus your_files/
Import in Python
from textasaurus import Textasaurus
dino = Textasaurus('YOUR_API_KEY')
dino.analyze('my_file.pdf')
from textasaurus import Textasaurus
dino = Textasaurus('YOUR_API_KEY')
dino.analyze('my_files/')
Extract raw text from your PDFs for data analysis or machine learning model training
Skip the frusturation of dealing with the current Python libraries for working with PDFs in Python.