Skip to content

Number of generation doesn't give 'n' results on other LM clients (vllm, ollama) except OpenAI #914

@AmoghM

Description

@AmoghM

The param n determines the total number of generations but it only works with openAI and not with vLLM or Ollama. From the ollama logs, it appears that its/api/generate endpoint is hit 5 for the example below but the output is not concatenated.

from typing import List
import dspy

vllm_llm = dspy.HFClientVLLM(model="meta-llama/meta-llama-3-8B-Instruct", port=8000, url="http://localhost", temperature=0.7)
dspy.settings.configure(lm=vllm_llm)

class SciGen(dspy.Signature):
    """context and answer on science questions"""
    question = dspy.InputField(desc="Input question")
    context = dspy.OutputField(desc="context") 
    answer = dspy.OutputField(desc="Answer using only the context")

context_generator = dspy.Predict(SciGen, n=5)
response = context_generator(question="Why do we have solar eclipse?")

print(len(response.completions)) # Outputs 5 for openai call but 1 for ollama

I can use the for loop to hit the endpoint multiple times as mentioned in the docs but I am assuming the batching of request would provide better performance.

question = "Why do we have solar eclipse?"
for idx in range(5):
    response = context_generator(question=question, config=dict(temperature=0.4+0.0001*idx))
    print(f'{idx+1}.', response.completions)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions