Upload documents * Ask Question * Local & Private
Superhat enables you to create a private secure server that can store documents and answer questions using AI. You can ask it it generate reports with graphs and charts as well. It stores CSV/Sheets in SQL database, so you don't have to worry about huge tables. You can upload thousand's of docs and ask questions, it will automatically figure out what documents to use to get the relevant answer. All answers are backed by reference to the documents it retrieved, so you can double check (manually or through another AI). Best part is, it runs all the software components locally. Therefore, after installation you can cut it off from internet and it will still work.
Everything runs locally on your server
| Service | Notes |
|---|---|
| Web Server | This is your primary gateway/frontend to use the service |
| Postgres | SQL database to store huge CSV/Sheets |
| Embedding Inference Server | Huggingface model to generate embeddings for RAG |
| ReRanker Inference Server | Huggingface model to re-rank documents for mem0 module |
| Weaviate | VectorDB of choice for RAG |
| vLLM Inference Server | Run open-source LLM of your choice for e.g. Qwen3/gpt-oss-20b |
| minio | s3 compatible storage engine used to manage uploaded documents |
| VectorDb Server | Superhat server for vectordb operation |
| API server | It exposes the upload/users/query to external world through API and tokens |
| Chat Server | Superhat server that handles all chat interaction and docs retreival |
| Ingestion Server | Superhat server that is responsible for indexing uploaded documents |
| Metadata Server | Superhat server responsible for keeping tracks of all document locations, life-cylce, sharing and ownership |
| keycloak | User authentication server |
pre-requisite:
- docker : docker, docker-compose, docker-registry
One time setup and initialization
--- do this on your server ---
# Clone repo
$ git clone https://github.com/queryhat/super-hat.git
# All subsequent commands will be run from ".../local" directory
# deploy to local server using docker compose
$ cd super-hat/deployment/dev/local
$ cp .env.example .env
# At the minimum, you want to edit the following in .env
# QHAT_LOCAL_VOLUME_ROOT: This is where all your data remains persistent e.g. database, vectordb, etc
# VLLM_API_KEY, OPENAI_BASE_URL, VLLM_MODEL_ID: This controls how and where the LLM is accessed
# Now build docker images and initialize root-directory
$ ./setup_local.sh
# Start your service ... will take couple of minutes pulling images/LLM from internet for the very first time
$ docker compose up -d websvr
# Note down important port numbers, you will need them to access service
$ egrep 'QHAT_APISVR_SERVICE_PORT|QHAT_WEBSVR_SERVICE_PORT' .env
# QHAT_APISVR_SERVICE_PORT=8000
# QHAT_WEBSVR_SERVICE_PORT=8021
You can either use OpenAI API compatible models, or the ones supported by vLLM (included). This is controlled through .env file.
OpenAI:
VLLM_PORT=443
VLLM_MODEL_ID="gpt-5-mini"
VLLM_API_KEY='sk-proj-...'
OPENAI_BASE_URL="https://api.openai.com/v1"
Groq:
VLLM_PORT=443
VLLM_MODEL_ID="openai/gpt-oss-120b"
VLLM_API_KEY='gsk_...'
OPENAI_BASE_URL="https://api.groq.com/openai/v1"
vLLM:
Make sure you have GPU with enough memory. Limited testing done on Qwen3/gpt-oss-*, and looks like gpt-oss model works better.
VLLM_MODEL_ID="openai/gpt-oss-20b"
VLLM_API_KEY='cSt6YXROaHNET0EwaGV5cQa6' <=== Generate/use any random key
# To use vLLM, further steps are needed
# 1. pull model weights from Huggingface
$ ./vllm-openai/init-gpt-oss.sh
$ docker compose up -d --force-recreate vllm-openai
Create ssh tunnel from your desktop/laptop to the superhat server
$ ssh -N -L 8021:localhost:8021 -L 8000:localhost:8000 superhat-server
Now you can access and use superhat from browser: "http://localhost:8021/login" (skip the /login in future visits)
Each registered user can upload it's own files and ask question about it.
Potential target to be improved in future
-
Supported file types
Only: pdf/docx/csv/google sheet supported right now. -
Single turn chat
Every question you ask is standalone, no chat history is used/send -
Inconsistent document reference
Reponse doesn't sometimes return reference to document, other times it doesn't i.e. inconsistent behaviour. Though it strictly answers from the added documents only.