Open source, Django based document manager with custom metadata indexing, file serving integration and OCR capabilities.
- User defined metadata fields
- Dynamic default values for metadata
- Lookup support for metadata
- Filesystem integration by means of metadata indexing directories
- User defined document uuid generation
- Local file or server side staging file uploads
- Batch upload many documents with the same metadata
- User defined document checksum algorithm
- Previews for a great deal of image formats, including PDF
- Search documents by any field value
- Group documents by metadata automatically
- Permissions and roles support
- Multi page document support
- Page transformations
- Distributed OCR processing
- Multilingual (English, Spanish)
- Duplicated document search
- Upload multiple documents inside a ZIP file
- Plugable storage backends (File based and GridFS included)
Python:
- Django - A high-level Python Web framework that encourages rapid development and clean, pragmatic design.
- django-pagination
- django-filetransfers - File upload/download abstraction
- celery - asynchronous task queue/job queue based on distributed message passing
- django-celery - celery Django integration
For the GridFS storage backend:
- PyMongo - the recommended way to work with MongoDB from Python
- GridFS - a storage specification for large objects in MongoDB
Or execute pip install -r requirements/production.txt to install the dependencies automatically.
Executables:
- ImageMagick - Convert, Edit, Or Compose Bitmap Images
- libmagic - MIME detection library
- tesseract-ocr - An OCR Engine that was developed at HP Labs between 1985 and 1995... and now at Google.
- unpaper - post-processing scanned and photocopied book pages
- MongoDB - a scalable, open source, document-oriented database
See docs/LICENSE file.
Roberto Rosario - Twitter [E-mail](roberto.rosario.gonzalez at gmail)
See docs/CREDITS file.
See docs/FAQ file for common questions and issues.