Number of generation doesn't give 'n' results on other LM clients (vllm, ollama) except OpenAI

The param `n` determines the total number of generations but it only works with openAI and not with vLLM or Ollama. From the ollama logs, it appears that its`/api/generate` endpoint is hit 5 for the example below but the output is not concatenated.

```python
from typing import List
import dspy

vllm_llm = dspy.HFClientVLLM(model="meta-llama/meta-llama-3-8B-Instruct", port=8000, url="http://localhost", temperature=0.7)
dspy.settings.configure(lm=vllm_llm)

class SciGen(dspy.Signature):
    """context and answer on science questions"""
    question = dspy.InputField(desc="Input question")
    context = dspy.OutputField(desc="context") 
    answer = dspy.OutputField(desc="Answer using only the context")

context_generator = dspy.Predict(SciGen, n=5)
response = context_generator(question="Why do we have solar eclipse?")

print(len(response.completions)) # Outputs 5 for openai call but 1 for ollama

```

I can use the for loop to hit the endpoint multiple times as mentioned in the docs but I am assuming the batching of request would provide better performance.

```python
question = "Why do we have solar eclipse?"
for idx in range(5):
    response = context_generator(question=question, config=dict(temperature=0.4+0.0001*idx))
    print(f'{idx+1}.', response.completions)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Number of generation doesn't give 'n' results on other LM clients (vllm, ollama) except OpenAI #914

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Number of generation doesn't give 'n' results on other LM clients (vllm, ollama) except OpenAI #914

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions