Skip to content

fabiobarkoski/piah

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

piah

PyPI - Version PyPI - Python Version


Piah automatically parse the data from PDF's or texts based only in the dataclass that you provide and return the same dataclass fullfilled with the values. Piah is based in the OxyParser

Table of Contents

Installation

pip install piah

Usage

first, set your key in the environment variables like:

import os

os.environ["OPENAI_API_KEY"] = "your-api-key"

or set in a .env file and then just use piah, e.g:

from piah import Piah
from dataclasses import dataclass

@dataclass
class Person:
  name: str
  age: int

parser = Piah("gpt-3.5-turbo")
result = parser.parse("Hello Iam python and I have 33 years old", Person)

to parse PDF's:

result = parser.parse("example.pdf", Person)
#or
result = parser.parse(Path("example.pdf"), Person)

Supported Models and Providers

piah uses LiteLLM, so consult the LiteLLM docs to check if the desired Model is supported.

TODO

  • Write docstrings
  • Improve allowed types
  • Improve system prompt

Know Issues

Seems that piah don't pass every time in the test, because the LLM don't parse correctly every time large PDF's

License

piah is distributed under the terms of the MIT license.

About

Automatically parse PDF and texts to dataclasses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published