Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Claude 3 Image Query Support #700

Merged
merged 26 commits into from
Mar 21, 2024
Merged

Claude 3 Image Query Support #700

merged 26 commits into from
Mar 21, 2024

Conversation

emjay07
Copy link
Contributor

@emjay07 emjay07 commented Mar 19, 2024

Adding Claude 3 Image Query support via both Anthropic and Bedrock.

image_query_driver=AnthropicImageQueryDriver(
    api_key=anthropic_api_key,
    model="claude-3-sonnet-20240229"
)
image_query_driver=AmazonBedrockImageQueryDriver(
    image_query_model_driver=BedrockClaudeImageQueryModelDriver(),
    model="anthropic.claude-3-sonnet-20240229-v1:0"
)

@emjay07 emjay07 force-pushed the feature/claude-3-image-query branch from b9183de to a16caa7 Compare March 19, 2024 22:54
@emjay07 emjay07 requested a review from a team March 19, 2024 23:32
@emjay07 emjay07 assigned emjay07 and unassigned emjay07 Mar 19, 2024
@collindutter
Copy link
Member

Once things stabilize on a handful of PRs we should add the vision drivers to #702

@emjay07 emjay07 marked this pull request as ready for review March 19, 2024 23:38
pyproject.toml Outdated Show resolved Hide resolved
"""

api_key: str = field(kw_only=True, metadata={"serializable": True})
model: str = field(default="claude-3-sonnet-20240229", kw_only=True, metadata={"serializable": True})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove default model value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do, but I was curious about this. I saw we set defaults for OpenAI drivers, but not others. Why is that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't realize we set in openai. That probably shouldn't be set their either. But for sake of consistency we can keep it.

@andrewfrench thoughts?

Copy link
Contributor Author

@emjay07 emjay07 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had already removed the anthropic default. I removed the openai default as well in the name of consistency.

),
kw_only=True,
)
max_output_tokens: Optional[int] = field(default=4096, kw_only=True, metadata={"serializable": True})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to provide a default value here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anthropic requires some value to be specified. I tried with None and it gets angry. So I thought a default would be best because then it's more upfront to the customer vs. hiding it in our internal call. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks Anthropic. If we do this, we should probably standardize across the Query Drivers. OpenAi does not have one set, for instance.

Any thoughts @andrewfrench?

Copy link
Member

@andrewfrench andrewfrench Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenAI's default no-value behavior is unhelpful enough as to consider max_tokens a required parameter there as well. Perhaps we should update both the OpenAI and Anthropic drivers to explicitly require this field? In either case, a refactor of the tokenizers + prompt stack to support image and media inputs will go a long way to help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with staying consistent across all image_query drivers and making max_output_tokens a required value for now. That means no default and put the onus on the customer, or should we provide a "middle-of-the-road" default for both OpenAI and Anthropic? I was thinking yes to still providing a default so that customers can use it out-of-the-box without needing to go deeply understand each LLMs tokenization. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put max_output_tokens on the base class and made the default 256.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it named as max_tokens for consistency with the Prompt Drivers. We can do a full rename in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@andrewfrench andrewfrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent some time this morning playing with Claude 3 image queries with great success, awesome stuff! If we can choose a path on this conversation I think we're good to go.

@emjay07 emjay07 force-pushed the feature/claude-3-image-query branch from 918282d to 3ee66c1 Compare March 21, 2024 19:57
@emjay07
Copy link
Contributor Author

emjay07 commented Mar 21, 2024

Once things stabilize on a handful of PRs we should add the vision drivers to #702

Added these in

andrewfrench
andrewfrench previously approved these changes Mar 21, 2024
@@ -16,6 +16,7 @@
@define
class BaseImageQueryDriver(SerializableMixin, ExponentialBackoffMixin, ABC):
structure: Optional[Structure] = field(default=None, kw_only=True)
max_output_tokens: int = field(default=256, kw_only=True, metadata={"serializable": True})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep it named as max_tokens for consistency with other Drivers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I saw a PR that updated max_tokens to max_input_tokens and max_output_tokens? Was that just when you are providing a tokenizer?

I am fine with updating this to match the rest of the drivers, but I would say max_output_tokens is better because that is explicitly what it is. Is that the same for the other drivers as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that max_output_tokens is probably a better field name, but the other PR was only in the context of the Tokenizers, not the Drivers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other drivers have inputs as text, not images. so to clarify my question: are the other drivers that have max_tokens intended to be for the output tokens or the input tokens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated back to max_tokens

CHANGELOG.md Outdated Show resolved Hide resolved
@emjay07 emjay07 requested review from collindutter and removed request for vasinov March 21, 2024 21:39
@emjay07 emjay07 merged commit f96af2e into dev Mar 21, 2024
6 checks passed
@emjay07 emjay07 deleted the feature/claude-3-image-query branch March 21, 2024 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants