Skip to content


Nlmatics extracts data from large documents sets using retrieval augemented generation (RAG). It can also be used for RAG search on knowledge bases. It comes with an extensive UI for search, data extraction and PDF viewing. It ingests documents using the llmsherpa/nlm-ingestor backend and indexes the document in elastic search which are retrieved using a hybrid search approach.


Nlmatics was founded by Ambika Sukla and Bulent Yener.

Nlmatics developed an early RAG like question answering, semantic search and data extraction pipeline using layout aware chunking, vector + bm25 indexing and language models. ‍ The open source codebase was developed from 2020-2023 by Yi Zhang, Ambika Sukla, Kiran Panicker, Niranjan Borawake, Suhail Kandanur, Wonjun Kang, Reshav Abraham, Nima Sheikholeslami, Lora Johns, Jasmin Omanovic, Karen Reeves, Sonia Joseph, Evan Li, Batya Stein, Cheyenne Zhang, Ashlan Ahmed, Nicholas Greenspan, Connie Xu, Shivangi Jha and others with product management support from Pooja Reddy, Ambika Sukla and Jan Choy.

Nlmatics is thankful to have worked with prominent early adopters in financial services, legal services and life sciences who recognized and leveraged our technology way before the current wave of generative AI.

Nlmatics raised seed funding from Felix Anthony, Silvertech Ventures, World Trade Ventures and ERS Ventures.

Popular repositories

  1. llmsherpa llmsherpa Public

    Developer APIs to Accelerate LLM Projects

    Jupyter Notebook 1k 101

  2. nlm-ingestor nlm-ingestor Public

    This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.

    Python 855 82

  3. nlm-tika nlm-tika Public

    Java 12 9

  4. nlm-app nlm-app Public

    Frontend code of nlmatics search and data extraction application

    JavaScript 3 1

  5. Public

    1 4

  6. nlm-utils nlm-utils Public

    Common utilities used by all nlm-* libraries.

    Python 1 5


Showing 10 of 17 repositories

Top languages


Most used topics