Skip to content

v0.6.0 Full LangChain support in addition to LlamaIndex, more DBs (15 PG, 4 RDF)

Choose a tag to compare

@stevereiner stevereiner released this 14 May 06:49
· 8 commits to main since this release

v0.6.0 LlamaIndex or LangChain, 15 graph (8 + ArangoDB,AGE,Cosmos,SurrealDB,Spanner,HugeGraph,TigerGraph). 4 RDF (3 + Neptune RDF), 10 vector, 3 search

  1. 15 total property graph databases — 8 existing LlamaIndex stores with LangChain versions
    added (Neo4j, ArcadeDB, FalkorDB, Memgraph, NebulaGraph, Neptune, Neptune Analytics,
    LadybugDB); 6 new LC-only stores (ArangoDB, Apache AGE, Azure Cosmos DB Gremlin, Apache
    HugeGraph, SurrealDB, TigerGraph); 1 new LI-only store (Google Cloud Spanner)

  2. 10 LangChain vector backends (Qdrant, Elasticsearch, Milvus, Weaviate, LanceDB, Chroma,
    Pinecone, pgvector, OpenSearch, Neo4j vector); 3 LangChain search backends (Elasticsearch,
    OpenSearch, BM25); 4 RDF/triple-store backends (Fuseki, GraphDB, Oxigraph + new Amazon
    Neptune RDF with IAM SigV4 auth)

  3. flexible-graphrag now runs fully on LlamaIndex, fully on LangChain, or any mix — both
    frameworks are first-class peers. Each pipeline stage is independently configurable:
    CHUNKER_BACKEND, KG_EXTRACTOR_BACKEND, GRAPH_BACKEND, VECTOR_BACKEND, SEARCH_BACKEND,
    RETRIEVAL_FUSION, LLM_PROVIDER, EMBEDDING_KIND. Note: document readers / data sources
    remain LlamaIndex-based (first pipeline stage). LangChain-only graph stores auto-select
    GRAPH_BACKEND=langchain.

  4. Retrievers for both LI and LC with fusion support for both frameworks (RETRIEVAL_FUSION=
    llamaindex uses QueryFusionRetriever; =langchain uses EnsembleRetriever when all stores are
    LC-backed). LangChain retrievers include: Synonym Exploder (expands query terms for vector
    search), pg_vector + neighborhood traversal for Neo4j (LANGCHAIN_PG_VECTOR_SEARCH,
    USE_PG_NEIGHBORHOOD), and text-to-query graph QA for all LC property graph stores (generates
    Cypher for Neo4j/ArcadeDB/Memgraph/FalkorDB/Ladybug/AGE, GQL for HugeGraph, SurrealQL for
    SurrealDB, AQL for ArangoDB, SPARQL for all RDF stores).

  5. Matrix test support — run_matrix.py / run_all_profiles.py; 24+ integration test profiles
    covering all PG, vector, search, RDF, and chunker combinations

  6. Docling OCR — DOCLING_OCR=true + DOCLING_OCR_ENGINE (auto / rapidocr / easyocr /
    tesseract_cli / tesserocr / ocrmac); optional extras for easyocr, tesserocr, ocrmac

  7. Incremental update (add, delete, modify) end-to-end across property graph, RDF,
    vector, and search databases on both LlamaIndex and LangChain backends

  8. scripts/cleanup.py — all 15 property graph stores have native-client cleanup; early skip
    when store stage disabled to improve speed, postgres document state / datasource config
    tabvle cleanup skipped ifuse incremental update false.

  9. Observability — upgraded OpenLIT + OpenInference LangChain instrumentation; both OTLP
    producers (LlamaIndex via OpenLIT, LangChain via OpenInference) work simultaneously

  10. Docs site (zensical 0.0.40) + major doc updates — ARCHITECTURE.md (15 PG stores, 13 LLM
    providers, framework backends section), per-store setup guides (Cosmos Gremlin, Neptune,
    Spanner), CONFIG-PROPERTY-GRAPH, DATABASE-CONFIGURATION, UI-TAB-SEARCH, MCP-TOOLS; all
    broken links fixed, README.md updated

  11. Per-store config isolation — each database and LLM/embedding provider has its own typed
    config env var ({TYPE}_GRAPH_DB_CONFIG, {TYPE}_VECTOR_DB_CONFIG, {TYPE}_SEARCH_DB_CONFIG,
    {KIND}_EMBEDDING_MODEL, etc.); per-store config takes precedence over generic fallback; no
    shared config collisions across stores

  12. Time logging now separates out KG extraction time from graph storage time.