-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Claude 3 Image Query Support #700
Conversation
b9183de
to
a16caa7
Compare
Once things stabilize on a handful of PRs we should add the vision drivers to #702 |
griptape/drivers/image_query/amazon_bedrock_image_query_driver.py
Outdated
Show resolved
Hide resolved
""" | ||
|
||
api_key: str = field(kw_only=True, metadata={"serializable": True}) | ||
model: str = field(default="claude-3-sonnet-20240229", kw_only=True, metadata={"serializable": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove default model value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do, but I was curious about this. I saw we set defaults for OpenAI drivers, but not others. Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't realize we set in openai. That probably shouldn't be set their either. But for sake of consistency we can keep it.
@andrewfrench thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had already removed the anthropic default. I removed the openai default as well in the name of consistency.
), | ||
kw_only=True, | ||
) | ||
max_output_tokens: Optional[int] = field(default=4096, kw_only=True, metadata={"serializable": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to provide a default value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anthropic requires some value to be specified. I tried with None and it gets angry. So I thought a default would be best because then it's more upfront to the customer vs. hiding it in our internal call. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks Anthropic. If we do this, we should probably standardize across the Query Drivers. OpenAi does not have one set, for instance.
Any thoughts @andrewfrench?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenAI's default no-value behavior is unhelpful enough as to consider max_tokens
a required parameter there as well. Perhaps we should update both the OpenAI and Anthropic drivers to explicitly require this field? In either case, a refactor of the tokenizers + prompt stack to support image and media inputs will go a long way to help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with staying consistent across all image_query
drivers and making max_output_tokens
a required value for now. That means no default and put the onus on the customer, or should we provide a "middle-of-the-road" default for both OpenAI and Anthropic? I was thinking yes to still providing a default so that customers can use it out-of-the-box without needing to go deeply understand each LLMs tokenization. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put max_output_tokens
on the base class and made the default 256.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it named as max_tokens
for consistency with the Prompt Drivers. We can do a full rename in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
griptape/drivers/image_query_model/base_image_query_model_driver.py
Outdated
Show resolved
Hide resolved
griptape/drivers/image_query_model/bedrock_claude_image_query_model_driver.py
Outdated
Show resolved
Hide resolved
griptape/drivers/image_query_model/bedrock_claude_image_query_model_driver.py
Outdated
Show resolved
Hide resolved
griptape/drivers/image_query_model/bedrock_claude_image_query_model_driver.py
Outdated
Show resolved
Hide resolved
ea47d85
to
8277686
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent some time this morning playing with Claude 3 image queries with great success, awesome stuff! If we can choose a path on this conversation I think we're good to go.
918282d
to
3ee66c1
Compare
Added these in |
griptape/drivers/prompt_model/bedrock_claude_prompt_model_driver.py
Outdated
Show resolved
Hide resolved
griptape/drivers/image_query_model/bedrock_claude_image_query_model_driver.py
Outdated
Show resolved
Hide resolved
@@ -16,6 +16,7 @@ | |||
@define | |||
class BaseImageQueryDriver(SerializableMixin, ExponentialBackoffMixin, ABC): | |||
structure: Optional[Structure] = field(default=None, kw_only=True) | |||
max_output_tokens: int = field(default=256, kw_only=True, metadata={"serializable": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep it named as max_tokens
for consistency with other Drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I saw a PR that updated max_tokens
to max_input_tokens
and max_output_tokens
? Was that just when you are providing a tokenizer
?
I am fine with updating this to match the rest of the drivers, but I would say max_output_tokens
is better because that is explicitly what it is. Is that the same for the other drivers as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that max_output_tokens
is probably a better field name, but the other PR was only in the context of the Tokenizers, not the Drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other drivers have inputs as text, not images. so to clarify my question: are the other drivers that have max_tokens
intended to be for the output tokens or the input tokens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated back to max_tokens
Adding Claude 3 Image Query support via both Anthropic and Bedrock.