Skip to content

stautonico/PyDocParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocParser Logo

Unofficial python client for the Docparser API

Table Of Contents


Installation

Installation for general usage:

Note: pydocparser was only tested for python3 (not guaranteed to work for python2)

pip install pydocparser or if you have python3 pip3 install pydocparser

OR

You can download the release of your choice from here

Unzip the file

change directory to the unziped folder

run python setup.py install or python3 setup.py install


Installation for development:

git clone https://github.com/tman540/pydocparser

pip install -r requirements.txt

Usage

To use pydocparser, you must create an instance of the Parser class from the pydocparser module:

import pydocparser

parser = pydocparser.Parser()

Next, you must obtain your secret API key (which you can get from here)

Now, pydocparser requires this key to be able to access your account. You can do that like this:

parser.login(YOUR_API_KEY_HERE)

The docparser API has a function for testing connection to the API

result = parser.ping()
print(result)
# pong

If parser.ping() returns ‘pong’, then you have a successful connection to the docparser API. If you get an output like this: Invalid API key. Use Parser.login(api_key) and you entered your API key, make sure your API key is correct.

You can get a list of current parsers like this:

parsers = parser.list_parsers()

This will return a list of the names of all available parsers.

To upload a file to docparser, you can use the upload function:

id = parser.upload_file_by_path("fileone.pdf", "PDF Parser") #args: file to upload, the name of the parser
# Note that "fileone.pdf" was in the current working directory

The function will return the document ID of the file that was just uploaded. To retrieve the parsed data, you can call the fetch function:

data = parser.get_one_result("PDF Parser", id) # The id is the doc id that was returned by `parser.upload()`

fetch returns all the parsed data from the file you selected


Contributing

This project started from the need to use docparser through python at work. I noticed that there was no API library for python, so I decided to make it myself. I am a one man operation so I am glad to accept any help I can get. You can contribute by making your changes, submitting a pull request with a detailed description of what you added. I will review your changes, and if I decide that your changes will make it into the next release, I will credit you accordingly. You can also contribute by submitting bug reports/feature request through GitHub issues.


License

This library is available as open source un the MIT License.


Changelog

V1.0 (7/11/19) Initial release

V1.1 (7/12/19) Bug Fixes + New Functions

V2.0 (6/3/21) Full Re-write

To-Do

  • Change function names to more closely resemble those in the PHP/Node/AJAX clients
  • Update setup.py to include install requirements
  • Fix README.md to work better on PyPi