Matrix Parser

A reliable, robust, and repeatable document parsing engine for parsing data from PDFs of consistent formats.

What problem do we solve?

In today's digital world, invoices at supermarkets, bank statements at CAs, and other such documents are shared through PDFs or scanned copies. It is a very tedious task to manually interpret such structured data when there are a huge number of documents and/or entries. You can extract the text using OCR, but how do you distinguish the different rows and columns in the document? That's where our product comes in. You can use our language to write a simple script in our SQL-like language that runs on the OCR results and returns you a CSV file.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
test_pdfs		test_pdfs
.gitignore		.gitignore
CachedOCR.py		CachedOCR.py
GoogleOCR.py		GoogleOCR.py
Parser.py		Parser.py
README.md		README.md
main.py		main.py
ocr_types.py		ocr_types.py
output.json		output.json
requirements.txt		requirements.txt
shell.nix		shell.nix
wsi.py		wsi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Matrix Parser

What problem do we solve?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

matrix-parser/parsing-engine

Folders and files

Latest commit

History

Repository files navigation

Matrix Parser

What problem do we solve?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages