-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Description
The param n determines the total number of generations but it only works with openAI and not with vLLM or Ollama. From the ollama logs, it appears that its/api/generate endpoint is hit 5 for the example below but the output is not concatenated.
from typing import List
import dspy
vllm_llm = dspy.HFClientVLLM(model="meta-llama/meta-llama-3-8B-Instruct", port=8000, url="http://localhost", temperature=0.7)
dspy.settings.configure(lm=vllm_llm)
class SciGen(dspy.Signature):
"""context and answer on science questions"""
question = dspy.InputField(desc="Input question")
context = dspy.OutputField(desc="context")
answer = dspy.OutputField(desc="Answer using only the context")
context_generator = dspy.Predict(SciGen, n=5)
response = context_generator(question="Why do we have solar eclipse?")
print(len(response.completions)) # Outputs 5 for openai call but 1 for ollamaI can use the for loop to hit the endpoint multiple times as mentioned in the docs but I am assuming the batching of request would provide better performance.
question = "Why do we have solar eclipse?"
for idx in range(5):
response = context_generator(question=question, config=dict(temperature=0.4+0.0001*idx))
print(f'{idx+1}.', response.completions)Metadata
Metadata
Assignees
Labels
No labels