Agents Refactor 1/2: Plugin Capabilities and SteamshipLLM prototype #564
Agents Refactor 1/2: Plugin Capabilities and SteamshipLLM prototype #564
Conversation
GitOnUp
left a comment
There was a problem hiding this comment.
Initial comment chain seeding
| :return: a List of Blocks that are returned from the plugin. | ||
| """ | ||
|
|
||
| # TODO (PR callout): I'm not certain this class needs to be abstract? This seems pretty soup-to-nuts. |
There was a problem hiding this comment.
Discussion: doesn't seem like SteamshipLLM needs to be an ABC.
There was a problem hiding this comment.
At a high level, if it's possible to have a single LLM class that really does check the boxes we need, that feels like a great win.
There was a problem hiding this comment.
There are two ways that LLMs tend to be used. I don't know if I'm using the right words, but they're roughly:
- Chat completion
- Prompt completion
One thing that, as a docstring reader, I find myself wanting SUPER CLEAR guidance on is what the difference between messages and history are.
- If I'm seeking chat completion, what exactly is the limit to what
messagesshould have? - If I'm seeking prompt completion, what happens if
historyis non-empty?
There was a problem hiding this comment.
There might be an opportunity for naming clarification here if:
messages-->current_prompthistory-->previous_chat_history
Alternatively, history could go away and be replaced with use_chat_completion: bool?
| ) | ||
|
|
||
| blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block()) | ||
| # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment |
There was a problem hiding this comment.
Probably outside the scope of this PR, but if we're already posting files, just post blocks to the client as part of the request eventually?
There was a problem hiding this comment.
If I understand this question correctly, I think we already do support this. It's just that the agent-style completions that we've been hanging around recently tend to be file-centric
There was a problem hiding this comment.
We don't support sending raw blocks via generate() at the moment (at least at the SDK level). This concern goes away, a bit, in the streaming case where we require not only that the blocks be part of a File, but of a very specific File. The blocks.append() call above will need to be rewritten for that use case.
There was a problem hiding this comment.
In @douglas-reid 's new streaming PR, we finally achieve the original intent of the generator SDK being file-specific, where the thing being passed in is just the ChatHistory file. Adding a possible generator input that's just an arbitrary list of blocks is definitely doable though if we want it.
| # TODO (PR callout): It looks like current OpenAI impl might not do this? | ||
| temp_file.delete() |
There was a problem hiding this comment.
This seemed like a resource leak to me.
| # TODO (PR callout): Use special output parser in the meantime to extract the tool to call from output block | ||
| # tagged with new tags, _easy_ slot in with existing FunctionsBasedLLMAgent. | ||
| return generation_task.output.blocks |
There was a problem hiding this comment.
In TagConstants, we have a new CAPABILITY_RESPONSE kind, name would be the name of the capability, and value would be per-capability.
e.g. in the case of Functions, return steamship.function-calling as the name, and "tool_name" as the value for which tool to call, and the block(s) that is(/are) tagged are the input to the tool.
There was a problem hiding this comment.
Do you have a clear code example of this somewhere?
There was a problem hiding this comment.
Does the CAPABILITY_RESPONSE layer of categorization add something actionable, or could it just be kind=tool_invocation name=<tool_to_invoke>? or similar?
| class CapabilityImpl(Capability, extra=Extra.forbid): | ||
| # TODO (PR): The Extra.forbid here is to enforce clamping of deserialization, but it may just be simpler to leave | ||
| # that up to individual capabilities? My goal here is to prevent accidental breakage of contract when someone | ||
| # provides specific metadata that makes it so e.g. a plugin with an older view of the world thinks it has Best | ||
| # Effort support but can't because of those extra requests. |
There was a problem hiding this comment.
Discussion: adding extra details to a capability seems dangerous because it adds extra things to support that a plugin may not know about.
Also note to self, the extra as a metaclass argument here should get moved into a Config inner class for consistency.
| name = "steamship.function_calling_support" | ||
|
|
||
| functions: List[OpenAIFunction] | ||
| # TODO (PR): Worth generalizing this? |
There was a problem hiding this comment.
Discussion: at very least it seems like generalizing the name here makes sense.
There was a problem hiding this comment.
I'd much prefer that we serialize Tool and pass that around. List[Tool]. That would prevent any sort of intermediate step in the SDK that is backend specific (until OpenAIFunction becomes a de facto standard, at least).
Bonus points if we can figure out a way to include parameterization of Tool::run in a way that will auto-handle Block --> param translation, so that devs can write code like def run(self, query: str, location: str) and not have to worry about extracting those values from List[Block].
eob
left a comment
There was a problem hiding this comment.
First pass of comments - will take a second pass & want to make sure @douglas-reid gets a chance to weigh in too. this is looking pretty great though
| self.plugin_instance = plugin_instance | ||
|
|
||
| @staticmethod | ||
| def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256): |
There was a problem hiding this comment.
[Style, Nonblocking]
I really like what this implies.. sort of:
SteamshipLLM.with_gpt4
SteamshipLLM.with_llama2
SteamshipLLM.with_vicuna
etc
A few thoughts:
- To the extent that there are common, known params, I think this is an opportunity to officially type and standardize them in the SDK (temperature, etc)
- The
with_feels a bit weird to me.. I think because it's a Python reserved word used in a very specific way that isn't the way this is being used here. What if it was just:LLM.gpt4()or evenLLM.gpt() - I'm not sure if we need the
Steamshipprefix since that's already the scope of this entire library
There was a problem hiding this comment.
Re: with_, I'm emulating steamship.agents.utils.with_llm. Assume these live on SteamshipLLM and not in e.g. a helper module[1], prefixing these in some way is nice because:
- IDE users can type
.with_and get a list of things they can use - It avoids collisions with method names we might otherwise want to use in the case of someone naming their model in a generic way
[1]: By "helper module" I mean something like how Google's Java library Guava has Arrays as a class with helper methods that work with the Array class. In this case we'd have SteamshipLLMs.gpt4(**<gpt4_params>, SteamshipLLMs.llama2(**<llama2_params), etc., which all return SteamshipLLM (singular).
Using Steamship as a prefix was a suggestion in the design proposal:
Consolidate LLM and ChatLLM into single abstract base class (proposed name: SteamshipLLM to avoid collisions on updates).
There was a problem hiding this comment.
Instead of with and the python association, we could use: using_ or from_ or even wrapping_ and get the same benefits. WDYT?
SteamshipLLM.using_openai_chatcompleteorSteamshipLMM.using_replicate_modelorSteamshipLLM.from_openaiorSteamshipLLM.wrapping_openaior ...
| :return: a List of Blocks that are returned from the plugin. | ||
| """ | ||
|
|
||
| # TODO (PR callout): I'm not certain this class needs to be abstract? This seems pretty soup-to-nuts. |
There was a problem hiding this comment.
At a high level, if it's possible to have a single LLM class that really does check the boxes we need, that feels like a great win.
| :return: a List of Blocks that are returned from the plugin. | ||
| """ | ||
|
|
||
| # TODO (PR callout): I'm not certain this class needs to be abstract? This seems pretty soup-to-nuts. |
There was a problem hiding this comment.
There are two ways that LLMs tend to be used. I don't know if I'm using the right words, but they're roughly:
- Chat completion
- Prompt completion
One thing that, as a docstring reader, I find myself wanting SUPER CLEAR guidance on is what the difference between messages and history are.
- If I'm seeking chat completion, what exactly is the limit to what
messagesshould have? - If I'm seeking prompt completion, what happens if
historyis non-empty?
| ) | ||
|
|
||
| blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block()) | ||
| # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment |
There was a problem hiding this comment.
If I understand this question correctly, I think we already do support this. It's just that the agent-style completions that we've been hanging around recently tend to be file-centric
| # TODO (PR callout): It looks like current OpenAI impl might not do this? | ||
| temp_file.delete() |
| self, | ||
| messages: List[Block], | ||
| system_prompt: List[Block] = None, | ||
| history: Optional[List[Block]] = None, |
There was a problem hiding this comment.
fwiw, we don't separate these out at the moment. our request looks like:
[ sys_message, some prior messages (user <--> assistant), last user message, scratchpad messages (working state for current request) ].
Importantly, we want all of these Blocks to be in the same File.
There was a problem hiding this comment.
I'm all for helpers here if they make things simpler, but thinking about how this might be used, it does seem like a single File that would include these tagged blocks might be easier, unless we want to add lists of arbitrary block IDs as a possible generator input (see below).
|
|
||
| blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block()) | ||
| # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment | ||
| temp_file = File.create( |
There was a problem hiding this comment.
you'll want to look at the in-flight streaming PR, as we no longer create a separate file (when we can avoid it). For streaming, these blocks all need to be in the same File, and that File needs to be the ChatHistory file.
| ) | ||
|
|
||
| blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block()) | ||
| # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment |
There was a problem hiding this comment.
We don't support sending raw blocks via generate() at the moment (at least at the SDK level). This concern goes away, a bit, in the streaming case where we require not only that the blocks be part of a File, but of a very specific File. The blocks.append() call above will need to be rewritten for that use case.
| # TODO (PR callout): Use special output parser in the meantime to extract the tool to call from output block | ||
| # tagged with new tags, _easy_ slot in with existing FunctionsBasedLLMAgent. | ||
| return generation_task.output.blocks |
There was a problem hiding this comment.
Do you have a clear code example of this somewhere?
| return [ | ||
| Block( | ||
| text="Can you make me an image of a whale?", | ||
| tags=[Tag(kind=TagKind.CHAT, name=ChatTag.HISTORY)], |
There was a problem hiding this comment.
FWIW, so far, we only use ChatTag.HISTORY on the File-level tags for the ChatHistory file. There are several reasons for this, including that a message is only HISTORY based on other context. For instance, at first, it is just a MESSAGE.
| self.plugin_instance = plugin_instance | ||
|
|
||
| @staticmethod | ||
| def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256): |
There was a problem hiding this comment.
I feel like we've been burned before by not allowing kwargs on the end of these types of methods. I suggest allowing the config to include **kwargs style expansion for forward-compat with the plugin.
| self.plugin_instance = plugin_instance | ||
|
|
||
| @staticmethod | ||
| def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256): |
There was a problem hiding this comment.
This is a style choice, but our gpt-4 plugin allows other models. I think we either need to make a different builder-style method, expose model_name as an explicit parameter, or something equivalent.
This might also be a good time to think about renaming this to reflect provider vs. model type (or some other aspect). SteamshipLLM.using_openai_chatcomplete perhaps, or (less-clear) SteamshipLLM.using_gpt ?
| self.plugin_instance = plugin_instance | ||
|
|
||
| @staticmethod | ||
| def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256): |
There was a problem hiding this comment.
Instead of with and the python association, we could use: using_ or from_ or even wrapping_ and get the same benefits. WDYT?
SteamshipLLM.using_openai_chatcompleteorSteamshipLMM.using_replicate_modelorSteamshipLLM.from_openaiorSteamshipLLM.wrapping_openaior ...
| name = "steamship.function_calling_support" | ||
|
|
||
| functions: List[OpenAIFunction] | ||
| # TODO (PR): Worth generalizing this? |
There was a problem hiding this comment.
I'd much prefer that we serialize Tool and pass that around. List[Tool]. That would prevent any sort of intermediate step in the SDK that is backend specific (until OpenAIFunction becomes a de facto standard, at least).
Bonus points if we can figure out a way to include parameterization of Tool::run in a way that will auto-handle Block --> param translation, so that devs can write code like def run(self, query: str, location: str) and not have to worry about extracting those values from List[Block].
dkolas
left a comment
There was a problem hiding this comment.
I think this all tracks for me! Seems like a great way to sort out the differences between the LLM plugins.
A few comments / suggestions / possible minor changes below.
| self, | ||
| messages: List[Block], | ||
| system_prompt: List[Block] = None, | ||
| history: Optional[List[Block]] = None, |
There was a problem hiding this comment.
I'm all for helpers here if they make things simpler, but thinking about how this might be used, it does seem like a single File that would include these tagged blocks might be easier, unless we want to add lists of arbitrary block IDs as a possible generator input (see below).
| ) | ||
|
|
||
| blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block()) | ||
| # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment |
There was a problem hiding this comment.
In @douglas-reid 's new streaming PR, we finally achieve the original intent of the generator SDK being file-specific, where the thing being passed in is just the ChatHistory file. Adding a possible generator input that's just an arbitrary list of blocks is definitely doable though if we want it.
|
|
||
| blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block()) | ||
| # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment | ||
| temp_file = File.create( |
| # TODO (PR callout): Use special output parser in the meantime to extract the tool to call from output block | ||
| # tagged with new tags, _easy_ slot in with existing FunctionsBasedLLMAgent. | ||
| return generation_task.output.blocks |
There was a problem hiding this comment.
Does the CAPABILITY_RESPONSE layer of categorization add something actionable, or could it just be kind=tool_invocation name=<tool_to_invoke>? or similar?
…eorge/plugin-capabilities
|
Merging as discussed at sync |
PR Structure Notes
I've got a large number of
# TODO (PR Callout)or# TODO (PR)form comments in this PR, which are generally discussion topics rather than actual TODOs. I've preseeded comment threads on those for easy discovery, in the case that there's not more to say other than the comment I've just quoted the comment.I've also prototyped what I think SteamshipLLM looks like here, and given the number of open questions I wanted to clear those before finalizing the more formulaic pieces.
Overview
This PR provides the ability for Plugins to assert their capabilities and for users to request that they be fulfilled, along with parameters for those capabilities if applicable. Those parameters are most relevant in the case of functions for GPT-4, but could be extended in the general case for other capabilities.
Step 2 is to adapt the plugin arch to use this and have Agent determine that certain capabilities should be used based on e.g. providing tools or a ChatHistory.
Why blocks and temp files instead of on Options?
Why MIME type instead of Tags?
Blocks have one MIME type, and in the case that it should only be parsed one way with one format, it makes sense to me to clamp that.
Behavior overview lifted from the docstring on capabilities.py: