Skip to content

Conversation

ggerganov
Copy link
Member

For consistency, add /v1/health endpoint - same as /health.

@ggerganov ggerganov requested a review from ngxson as a code owner October 7, 2025 11:42
@ngxson
Copy link
Collaborator

ngxson commented Oct 7, 2025

I'm a bit doubt if this can actually be useful, as the /v1 is mainly for compatibility with OAI API, while /health is for docker healthcheck. Indeed, we don't actually need /v1 prefix as we don't have API versioning on server

@ggerganov
Copy link
Member Author

It's purely for consistency. It's seems weird to use /v1/models and /v1/completions, but have /health instead of /v1/health.

@ngxson
Copy link
Collaborator

ngxson commented Oct 7, 2025

Hmm ok then, otherwise we can add a middleware to automatically add /v1 prefix for all endpoints instead of writing them manually. But that's something I can do later.

@ggerganov ggerganov merged commit df1b612 into master Oct 7, 2025
68 checks passed
@ggerganov ggerganov deleted the gg/server-health branch October 7, 2025 12:57
anyshu pushed a commit to anyshu/llama.cpp that referenced this pull request Oct 10, 2025
* master: (113 commits)
  webui: updated the chat service to only include max_tokens in the req… (ggml-org#16489)
  cpu : optimize the ggml NORM operation (ggml-org#15953)
  server : host-memory prompt caching (ggml-org#16391)
  No markdown in cot (ggml-org#16483)
  model-conversion : add support for SentenceTransformers (ggml-org#16387)
  ci: add ARM64 Kleidiai build and test support (ggml-org#16462)
  CANN: Improve ACL graph matching (ggml-org#16166)
  kleidiai: kernel interface refactoring (ggml-org#16460)
  [SYCL] refactor soft_max, add soft_max_back (ggml-org#16472)
  model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (ggml-org#16367)
  refactor: centralize CoT parsing in backend for streaming mode (ggml-org#16394)
  Disable CUDA host buffers on integrated GPUs (ggml-org#16308)
  server : fix cancel pending task (ggml-org#16467)
  metal : mark FA blocks (ggml-org#16372)
  server : improve context checkpoint logic (ggml-org#16440)
  ggml webgpu: profiling, CI updates, reworking of command submission (ggml-org#16452)
  llama : support LiquidAI LFM2-MoE hybrid model (ggml-org#16464)
  server : add `/v1/health` endpoint (ggml-org#16461)
  webui : added download action (ggml-org#13552) (ggml-org#16282)
  presets : fix pooling param for embedding models (ggml-org#16455)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants