Skip to content

A simple library for extracting text from any PDF in Python x AWS.

License

Notifications You must be signed in to change notification settings

meads2/textasaurus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

header img

Textasaurus

A simple library for extracting text from any PDF in Python x AWS.

cloud architecture

Getting Started

1. Install Textasaurus

pip install textasaurus

2. Get API Key

Get an API key from the textasaurus API

TEXTASAURUS_API_KEY=Your_API_KEY

3. Run Textasaurus

Run single file

textasaurus your_file.pdf

Run file directory

textasaurus your_files/

Import in Python

from textasaurus import Textasaurus
dino = Textasaurus('YOUR_API_KEY') 
dino.analyze('my_file.pdf')
from textasaurus import Textasaurus
dino = Textasaurus('YOUR_API_KEY') 
dino.analyze('my_files/')

Use Cases

1. Batch PDF Text Extraction

Extract raw text from your PDFs for data analysis or machine learning model training

2. Skip the frusturation

Skip the frusturation of dealing with the current Python libraries for working with PDFs in Python.

About

A simple library for extracting text from any PDF in Python x AWS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages