Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

feat: Support for batch embedding #371

@hiro-v

Description

@hiro-v

Problem

  • I am trying to integrate Jan RAG feature using Nitro embedding at POST http://localhost:3928/v1/embeddings
  • It supports the input in payload as string by default, however as the application requires batch embedding, nitro yields 500

Success Criteria

  • Load model with embedding true
curl --location 'http://127.0.0.1:3928/inferences/llamacpp/loadmodel' \
--header 'Content-Type: application/json' \
--data '{
   "llama_model_path": "/Users/hiro/Downloads/ggml-model-q4_k.gguf",
   "ctx_len": 2048,
   "ngl": 100,
   "cont_batching": false,
   "embedding": true,
   "system_prompt": "",
   "user_prompt": "\n### Instruction:\n",
   "ai_prompt": "\n### Response:\n"
 }'
  • Try to call the embedding endpoint with input as array
curl --location 'http://localhost:3928/v1/embeddings' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--header 'Access-Control-Allow-Origin: *' \
--data '{
    "input": ["Hello", "Nam", "here"],
    "model": "embedding",
    "encoding_format": "float"
}'
  • The server returns 500

Additional context

Metadata

Metadata

Assignees

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions