Skip to content
This repository has been archived by the owner on Mar 26, 2024. It is now read-only.

krishnasism/heissdocs

Repository files navigation

Note: This project is not maintained anymore

heißdocs - A Document Query Application 🔍📄

Official Documentation

# Under Active Development #

Add a searchable layer on top of your PDFs!

Fully open-source and ready to be deployed. You store, own, and control the data.

heißdocs - Open Source & Self Hosted Document Query Engine | Product Hunt

Demo: Recording-2023-08-26-211554

Note:

This is a project in progress, so please expect things to break as it moves forward. But the vision of this project is to allow the user to NOT be locked into an ecosystem, so your data is governed and stored by you - therefore even if the app breaks, your data should be supported and can be accessed using tools already at your disposal.

Usage

What is the purpose of this project?

It is to allow a user or an organization to keep track of their PDF files. The complicated thing about PDFs is that they aren't searchable by content. Simply upload a scanned or normal PDF and start searching for content in it with the undisputed power of Elasticsearch (or a NoSQL database)!

heißdocs creates a search layer for your PDFs, down to the exact page (Working on pointing to the exact word!),

  1. Set up according to the instructions under Setup
  2. Upload a file on the Dashboard
  3. Start searching!

Features

  • ☁️ Multi-cloud support (AWS, GCP, Azure)
  • 💬 Semantic search (Langchain + OpenAI)
  • 💿 Multiple Storage Options
  • 🔍 Powerful Search + Versatile Storage
  • 📄 View source documents
  • 🔒 Full ownership of data
  • 🆓 Completely open-source
  • 💻 Self-hosted
  • ... more things to come + feel free to add in requests!

Setup

Pre-requisites

Please set up the required services before starting the application. You can follow the documentation to configure all services.

  1. Auth0 - required even before startup:
    1. For Auth0 you will need to get the required values from the Auth0 portal and paste them accordingly in the .env files in frontend and app. This needs to be configured even before building the application.

Setting up

Start by creating a .env file in the root directory and fill in the values according to the .env.example file.

Before startup, only the Auth0 values need to be set up. Please follow the documentation for the full guide.

cp .env.example .env

The values in the root .env file can remain unchanged unless you are planning on hosting each of the services individually.

Similarly, create a .env file inside the app, frontend, and engine folders and fill them in following the instructions in the respective .env.example files.

cp frontend/.env.example frontend/.env
cp app/.env.example app/.env
cp engine/.env.example engine/.env

All the keys except Auth0 keys, can be left untouched. Everything else is settable in settings.


Running

Ensure that the credentials that you pasted in the .env files have the necessary authorizations for operations such as GET, PUT, LIST ... etc.

Once your .env files are ready, navigate to the root directory and run:

docker compose up --build

Then go to localhost:8080 and log in.


[Optional] In case you want hot-reload on your frontend, you can choose to run the services separately

Run the backend services:

docker compose -f docker-compose.yaml up --build

If you want elasticsearch locally running as well, you can include the docker-compose.elasticsearch.override.yaml file as well in the docker compose command.

docker compose -f docker-compose.yaml -f docker-compose.elasticsearch.override.yaml up --build

Run the frontend:

cd frontend
npm install
npm run dev -- --port 8080

Run database migrations

cd app
alembic upgrade head

[Optional] If you have your own hosted PostgresSQL database, please make sure to update the sqlalchemy.url in the alembic.ini file.

Settings

Before using the application, navigate to the Settings page by clicking on the left-side dashboard button, and configure the settings.

Ready!

You are all set!

Overview

Here's a quick overview of the project

Ingestion Flow Technical Diagrams - Frame 1

Query Flow Technical Diagrams - Frame 2


In progress for the community - by Krishnasis 👨🏽‍💻

Powered by FastAPI 💗