v0.1.0
🎉 Initial Public Release
LongParser is the open-source document intelligence engine built by ENDEVSOLS
for production RAG pipelines.
Added
- 5-stage extraction pipeline —
Extract → Validate → HITL Review → Chunk → Embed → Index - Multi-format extraction — PDF, DOCX, PPTX, XLSX, CSV via Docling
HybridChunker— token-aware, heading-hierarchy-aware, table-aware chunking- Human-in-the-Loop (HITL) review — approve / edit / reject blocks and chunks
via LangGraphinterrupt()before embedding - 3-layer memory chat — short-term turns + rolling summary + long-term facts,
powered by LCEL chains - Multi-provider LLM support — OpenAI (
gpt-4o), Gemini (gemini-2.0-flash),
Groq (llama-3.3-70b-versatile), OpenRouter - Multi-backend vector stores — Chroma, FAISS, Qdrant
- Async-first REST API — FastAPI + Motor (MongoDB) + ARQ (Redis job queue)
LongParserRetriever— drop-in LangChainBaseRetrieveradapterLongParserLoader— LangChain document loader integrationLongParserReader— LlamaIndexBaseReaderintegrationLongParserCallbackHandler— observability callbacks for LangChain chains- Built-in citation validation — chunk IDs verified against retrieved set
before any answer is returned - Privacy-first — all processing runs locally; no data leaves your infrastructure
py.typedmarker — full PEP 561 typing support- Unit test suite —
test_schemas.py(22 passing),test_llm_chain.py,
test_chat_utils.py - GitHub Actions CI — lint (
ruff), tests across Python 3.10 / 3.11 / 3.12,
coverage reporting - GitHub Actions publish — PyPI trusted publishing triggered on GitHub releases
pyproject.tomlwithserver,langchain,llamaindex,embeddings,
chroma,faiss,qdrantoptional extrasDockerfileanddocker-compose.ymlfor one-command local deploymentCONTRIBUTING.md,SECURITY.md,.env.example— full OSS scaffolding