Skip to content

queryhat/super-hat

Repository files navigation

Join us on Discord

Super-Hat 🎩 - Local AI document server

Upload documents * Ask Question * Local & Private

Introduction

Superhat enables you to create a private secure server that can store documents and answer questions using AI. You can ask it it generate reports with graphs and charts as well. It stores CSV/Sheets in SQL database, so you don't have to worry about huge tables. You can upload thousand's of docs and ask questions, it will automatically figure out what documents to use to get the relevant answer. All answers are backed by reference to the documents it retrieved, so you can double check (manually or through another AI). Best part is, it runs all the software components locally. Therefore, after installation you can cut it off from internet and it will still work.

Technical stack

Everything runs locally on your server

Service Notes
Web Server This is your primary gateway/frontend to use the service
Postgres SQL database to store huge CSV/Sheets
Embedding Inference Server Huggingface model to generate embeddings for RAG
ReRanker Inference Server Huggingface model to re-rank documents for mem0 module
Weaviate VectorDB of choice for RAG
vLLM Inference Server Run open-source LLM of your choice for e.g. Qwen3/gpt-oss-20b
minio s3 compatible storage engine used to manage uploaded documents
VectorDb Server Superhat server for vectordb operation
API server It exposes the upload/users/query to external world through API and tokens
Chat Server Superhat server that handles all chat interaction and docs retreival
Ingestion Server Superhat server that is responsible for indexing uploaded documents
Metadata Server Superhat server responsible for keeping tracks of all document locations, life-cylce, sharing and ownership
keycloak User authentication server

Getting started

pre-requisite:

  • docker : docker, docker-compose, docker-registry

Configure

One time setup and initialization

--- do this on your server ---
# Clone repo
$ git clone https://github.com/queryhat/super-hat.git

# All subsequent commands will be run from ".../local" directory
# deploy to local server using docker compose
$ cd super-hat/deployment/dev/local
$ cp .env.example .env
# At the minimum, you want to edit the following in .env
#   QHAT_LOCAL_VOLUME_ROOT: This is where all your data remains persistent e.g. database, vectordb, etc
#   VLLM_API_KEY, OPENAI_BASE_URL, VLLM_MODEL_ID: This controls how and where the LLM is accessed

# Now build docker images and initialize root-directory
$ ./setup_local.sh

# Start your service ... will take couple of minutes pulling images/LLM from internet for the very first time
$ docker compose up -d websvr

# Note down important port numbers, you will need them to access service
$ egrep 'QHAT_APISVR_SERVICE_PORT|QHAT_WEBSVR_SERVICE_PORT' .env
# QHAT_APISVR_SERVICE_PORT=8000
# QHAT_WEBSVR_SERVICE_PORT=8021

LLM Choice

You can either use OpenAI API compatible models, or the ones supported by vLLM (included). This is controlled through .env file.

OpenAI:

VLLM_PORT=443
VLLM_MODEL_ID="gpt-5-mini"
VLLM_API_KEY='sk-proj-...'
OPENAI_BASE_URL="https://api.openai.com/v1"

Groq:

VLLM_PORT=443
VLLM_MODEL_ID="openai/gpt-oss-120b"
VLLM_API_KEY='gsk_...'
OPENAI_BASE_URL="https://api.groq.com/openai/v1"

vLLM:
Make sure you have GPU with enough memory. Limited testing done on Qwen3/gpt-oss-*, and looks like gpt-oss model works better.

VLLM_MODEL_ID="openai/gpt-oss-20b"
VLLM_API_KEY='cSt6YXROaHNET0EwaGV5cQa6' <=== Generate/use any random key

# To use vLLM, further steps are needed
# 1. pull model weights from Huggingface
$ ./vllm-openai/init-gpt-oss.sh
$ docker compose up -d --force-recreate vllm-openai

Access service

Create ssh tunnel from your desktop/laptop to the superhat server

$ ssh -N -L 8021:localhost:8021 -L 8000:localhost:8000 superhat-server

Now you can access and use superhat from browser: "http://localhost:8021/login" (skip the /login in future visits) Each registered user can upload it's own files and ask question about it.

Limitations

Potential target to be improved in future

  • Supported file types
    Only: pdf/docx/csv/google sheet supported right now.

  • Single turn chat
    Every question you ask is standalone, no chat history is used/send

  • Inconsistent document reference
    Reponse doesn't sometimes return reference to document, other times it doesn't i.e. inconsistent behaviour. Though it strictly answers from the added documents only.

About

Private AI document server

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages