Skip to content

pydocs-mcp 0.3.0

Choose a tag to compare

@msobroza msobroza released this 15 Jun 21:11
· 12 commits to main since this release

Published to PyPI: https://pypi.org/project/pydocs-mcp/0.3.0/

pip install pydocs-mcp

Added (LLM tree-reasoning — enrichment, token budget, two-stage rerank)

  • PageIndex node enrichment — each LLM-visible tree node now carries its real
    signature (params + type hints + return annotation), its decorators, and a
    docstring excerpt, beyond the generated summary. Tunable via doc_excerpt
    (sections | full | off) and doc_excerpt_max_chars. A non-destructive
    schema auto-refresh (v9) re-extracts the metadata on next index without
    re-embedding unchanged chunks.
  • Token-counted tree budget — the serialized tree handed to the LLM is bounded
    in real tiktoken tokens (previously whitespace words, which under-counted code
    ~3× and could overflow the model's context window with a 400
    context_length_exceeded). max_tree_wordsmax_tree_tokens
    (int | None; None auto-derives from the configured model's context window).
    Over-budget pruning is content-first — drop per-node doc excerpts before whole
    nodes. Adds tiktoken as a runtime dependency.
  • BM25 → tree two-stage rerank — opt-in rerank_candidates mode on the
    llm_tree_reasoning step scopes the LLM-visible tree to a prior BM25/dense
    candidate set and writes its ranked picks back as the pipeline's final ranking
    (with a repoqa_bm25_tree_rerank benchmark config).
  • Persist chunks.qualified_name (schema v7) so tree-reasoning picks resolve to
    the correct chunks.

Added (on-device dense embeddings)

  • sentence_transformers embedding provider (provider: sentence_transformers)
    serving Qwen/Qwen3-Embedding-0.6B and other SentenceTransformer models via
    torch — a GPU-reliable on-device dense embedder (torch frees CUDA memory
    between sequential index-builds). Opt-in via the [sentence-transformers]
    extra. New EmbeddingConfig knobs max_seq_length / normalize /
    query_prompt_name (the first two fold into the pipeline hash; the
    query-only prompt does not).

Removed

  • The onnx embedding provider (OnnxEmbedder and the onnx_file /
    query_instruction config fields). The torch-backed sentence_transformers
    provider replaces it for on-device Qwen3-Embedding — onnxruntime leaked the
    CUDA arena across the benchmark's sequential index-builds.

Added (GPU inference)

  • --gpu flag on serve, index, and watch (and the benchmark runner)
    to run all embedder inference — FastEmbed, the sentence_transformers
    provider, and PyLate late-interaction — on CUDA. No YAML change; covers both
    index-time and query-time embedding. The execution device is excluded from the
    pipeline / index-cache hash, so toggling --gpu shares the same .tq /
    fast-plaid index and never forces a re-index (it is a latency knob, not a
    quality change).
  • EmbeddingConfig.device (cpu / cuda) wiring through build_embedder
    into the FastEmbed and sentence_transformers embedders;
    AppConfig.with_device(gpu=...) stamps the device after config load. GPU
    runtimes (onnxruntime-gpu, fastembed-gpu, CUDA torch) are documented in
    INSTALL.md, not auto-installed.