-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
What happened?
Description:
I am trying to pass messages containing audio data into dspy.Predict, but it seems that the model is analyzing the base64 string of the audio instead of properly processing the audio content.
Code Snippet:
lm = dspy.LM(
"gemini-2.0-flash-exp", api_key=os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
)
dspy.configure(lm=lm)
audio_path = "temp_segment_1894 1.wav"
audio_data = pathlib.Path(audio_path).read_bytes()
audio_data_base64 = base64.b64encode(audio_data).decode("utf-8")
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze audio"},
{
"type": "image_url",
"image_url": "data:audio/wav;base64,{}".format(
audio_data_base64
),
},
],
}
]
print(lm(messages=messages)) # This works correctly
classify = dspy.Predict('messages -> sentiment')
# Issue: Cannot pass messages; the output seems to analyze the base64 string instead of the actual audio content.
Expected Behavior:
The model should process the audio data properly and return the sentiment analysis.
Observed Behavior:
dspy.Predict appears to be treating the base64 string as text instead of decoding and analyzing the actual audio.
Questions:
How should messages be passed to dspy.Predict correctly?
Is there a way to specify that messages contain audio data so the model processes it correctly?
Should a custom data structure or preprocessing step be added before passing messages?
Environment:
dspy version: [2.6.6]
Model: gemini-2.0-flash-exp
Python version: [3.12]
Any guidance on properly passing audio messages into Prediction would be greatly appreciated.
Steps to reproduce
provided as code snippet
DSPy version
2.6.6