Skip to content

schwamster/docStack

Repository files navigation

Architecture diagram

docStack

docStack is a document management system with extensible amount of postproscessing services that make it easier to find, analyze and export your documents.

What is docStack?

docStack makes document management easy. upload documents in a variety of formats. docStack will store the data and provide you with the tools to find the documents again at some later time. docStack will also run a number of post-processing steps on your documents to make it a lot easier to find stuff again.

  • text extract: docStack will extract the text out of your document no matter if it is a scan, photo or a word document
  • tagging: dockStack will extract tags from your document
  • language detection: docStack will detect the language of your document
  • content analysis: docStack will try to find out what kind of document it is and provide you with key entities of that document kind

docStack is also very much under development and is mainly used to figure out how to do a couple of things: microservices, docker, cognitive services...

Continuous integration / delivery

service git build docker
doc-stack-app https://github.com/schwamster/doc-stack-app CircleCI Docker Automated buil
doc-stack-app-api https://github.com/schwamster/doc-stack-app-api CircleCI Docker Automated buil
text_worker https://github.com/schwamster/text_worker CircleCI Docker Automated buil
ocr_service https://github.com/schwamster/ocr_service CircleCI Docker Automated buil
pdf_to_text https://github.com/schwamster/pdf_to_text CircleCI Docker Automated buil
doc-store https://github.com/schwamster/doc-store CircleCI Docker Automated buil
luis_adapter_service https://github.com/schwamster/luis_adapter_service CircleCI Docker Automated buil
doc-identity https://github.com/schwamster/doc-identity CircleCI Docker Automated buil
doc-notifications https://github.com/schwamster/doc-notifications

Documentation

Overview

Architecture diagram

doc-stack-app

This is the frontend the user can user to interact with docStack. It is an angular app. The docker-compose file hosts the application on port 4200 => http://localhost:4200. For more information about the frontend application please check out the project itself under: doc-stack-app

doc-stack-app-api

This service is the Backend Api of doc-stack-app. This follows the "Backend for Frontend" pattern - see BFF. For more information about the frontend application please check out the project itself under: doc-stack-app-api

doc-identity

This service is an Open ID Connect Server that is responsible to authenticate request from the UI to the BFF and for all service to service communication. We use IdentityServer4 to provide that functionality. For more information about the frontend application please check out the project itself under: doc-identity

text_worker

Extracts text from the document. It will use different sub-services (ocr_service, pdf_to_text) depending on the source document to do that. For more information about the frontend application please check out the project itself under: text_worker

ocr_service

Extracts document from a document with the help of OCR. For this we currently use Computer Vision For more information about the frontend application please check out the project itself under: ocr_service

pdf_to_text

Extract the text from pdfs with the help of this node.js package: https://github.com/zetahernandez/pdf-to-text For more information about the frontend application please check out the project itself under: pdf_to_text

analyze_worker

This is just an conceptual service at this moment. This combines all services that are postproscessing a document where the text has already been extracted. So far one idea is to run the extracted text through Luis with a configured Model to extract entites from the text. Other ideas: Tagging, Sorting, Find related documents...

luis_apdater

Analyzes a document with the help of LUIS. The result is an intent and the intents entities. The goal is to have the user choose multiple LUIS models and even create custom models. For more information about the frontend application please check out the project itself under: luis_adapter_service

doc-store

This service is responsible for the CRUD operations on the documents. Creating new documents and enriching documents with more information. This service also provides the APIs necessary to search and retrieve documents. For more information about the frontend application please check out the project itself under: doc-store

doc-notifications

Since docstack is based on a multi-step event-driven architecture, doc-notifications (written in node) serves as the "mediator" which orchestrates these async steps of each user-triggered action. If you think of each feature of docstack as a workflow containing multiple steps which must be executed by various services, doc-notifications is the one that contains that workflow information and the required steps to complete a certain request.

The workflows that the app executes and that are fulfilled by the doc-notifications mediator service can be seen below under section "How does the application work".

For more information about the notifications service please check out the project itself under: doc-notifications

Getting started

The easiest way to get up and running is with docker. All services described in the Overview are available as docker images (see continuous integration/delivery). For more info on how to get started with docker go here: https://docs.docker.com/engine/getstarted/

After cloning this project you will have to set a number of environment variables on your computer (see docker-compose.yml to find out were they are used). Alternativly you can just set them in the docker-compose.yml file. They are however omitted here, because they are of a more sensitive nature and therefore not supposed to be version controlled

- ComputerVisionKey
- LuisAppId
- LuisSubscriptionKey
- DocStackAdminPassword

More info on how to set environment variables:

Windows | Unix/Linux

ComputerVisionKey

We are using Microsofts Cognitive Services in the ocr_service to turn images and pdfs to text. You will have to get your own Api Key ( free up to a reasonable limit for now) from here Set the environment variable "ComputerVisionKey" to that value.

LuisAppId & LuisSubscriptionKey

We are using Language Understanding Intelligent Services in analyze the text of the documents. You will have to get your own Api Key ( free up to a reasonable limit for now) from here Set the following environment variables:

LuisAppId => what luis app (model) to use default is the msft example app with the cortana model LuisSubscriptionKey => Your own subscription key from luis

DocStackAdminPassword

Right now there is only one user you can use to log on: username: admin password: whatever you set in the "DocStackAdminPassword" environmentvariable

The password is used by doc-identity as an admin password. Usermanagment and so forth is still under construction.

Running for the first time

After you have set the environment variables and docker is up and running just navigate to the root of this project and run:

    docker-compose up

docker will now pull all images and then run each service. You can add the "-d" parameter to start the services in detached mode.

Open a browser and navigate to http://localhost:4200 to access the application.

How does the application work

Basic Flows:

  • User logs on. User uploads a document. docStack processes the document. User can see processing results | User can read/view the document.
  • User logs on. User searches for a document. User finds document. User can see the processing results | User can read/view the document

Contribution

Contributors are very welcome. If you are interested to find out how a microservice infrastructure might work (still trying to figure that out myself...) or you think the document management is interesting or if you see where we are doing things wrong dont hesitate to help out.

Please follow this guideline if you want to contribute:

  • Fork the repository.
  • Create a branch to work in.
  • Make your feature addition or bug fix.
  • Don't forget the unit tests.
  • Send a pull request.

Please join the conversation here:

Join the chat at https://gitter.im/docStack-im/Lobby

License

Copyright © Bastian Töpfer and contributors. docStack is provided as-is under the MIT license. For more information see LICENSE.

Acknowledgement

Some services use external services:

Powered by Microsoft’s Cognitive Services: https://www.microsoft.com/cognitive-services

The Open Connect ID solution is provided by this awesome project: IdentityServer4

The logo was made with Logo Maker

Stack graphic by Freepik from Flaticon under Creative Commons BY 3.0

The architecture diagram was made with draw.io

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages