Skip to content

Agents Refactor 1/2: Plugin Capabilities and SteamshipLLM prototype #564

Merged
GitOnUp merged 20 commits intomainfrom
george/plugin-capabilities
Oct 6, 2023
Merged

Agents Refactor 1/2: Plugin Capabilities and SteamshipLLM prototype #564
GitOnUp merged 20 commits intomainfrom
george/plugin-capabilities

Conversation

@GitOnUp
Copy link
Copy Markdown
Contributor

@GitOnUp GitOnUp commented Sep 28, 2023

PR Structure Notes

I've got a large number of # TODO (PR Callout) or # TODO (PR) form comments in this PR, which are generally discussion topics rather than actual TODOs. I've preseeded comment threads on those for easy discovery, in the case that there's not more to say other than the comment I've just quoted the comment.

I've also prototyped what I think SteamshipLLM looks like here, and given the number of open questions I wanted to clear those before finalizing the more formulaic pieces.

Overview

This PR provides the ability for Plugins to assert their capabilities and for users to request that they be fulfilled, along with parameters for those capabilities if applicable. Those parameters are most relevant in the case of functions for GPT-4, but could be extended in the general case for other capabilities.

Step 2 is to adapt the plugin arch to use this and have Agent determine that certain capabilities should be used based on e.g. providing tools or a ChatHistory.

Why blocks and temp files instead of on Options?

  • The response contains only blocks. Without engine changes, this means we can communicate back and forth via the same channel consistently.
  • I had considered options as a venue for communicating requirements, but I also want to be able to communicate that yes, we did fulfill these requirements, and here's how.
  • OpenAI's ChatLLM does this already

Why MIME type instead of Tags?

Blocks have one MIME type, and in the case that it should only be parsed one way with one format, it makes sense to me to clamp that.


Behavior overview lifted from the docstring on capabilities.py:

Capabilities are a concept communicated back and forth between plugins and client code via blocks, and they are meant to
indicate client code's need for certain levels of support of a range of features.

Clients can request NATIVE, BEST_EFFORT, or OPTIONAL support for features that a plugin may or may not support.  Plugins
are expected to parse this and fail-fast if the user has requested support for a feature that the plugin does not
support, so that users are not e.g. billed for usage they can't incorporate.

Capability requests can include other information on the request itself, but oftentimes indicate that certain blocks
will be tagged in Steamship-native ways as part of the rest of the payload.  For example, ConversationSupport is a
capability that indicates the CHAT TagKind will be included in blocks that are part of the input, and the plugin is
expected to incorporate these with a model that supports them.

In the case that a Plugin does not support behavior indicated by the Capability request, it will throw, listing the
models that it could not support at the levels requested.  Otherwise, when Plugins respond, they'll include another
block indicating at which level they served the requested capabilities.

@GitOnUp GitOnUp changed the title WIP Agents Refactor 1/2: Plugin Capabilities Sep 30, 2023
@GitOnUp GitOnUp changed the title Agents Refactor 1/2: Plugin Capabilities Agents Refactor 1/2: Plugin Capabilities and SteamshipLLM prototype Oct 1, 2023
Copy link
Copy Markdown
Contributor Author

@GitOnUp GitOnUp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comment chain seeding

Comment thread src/steamship/agents/llms/steamship_llm.py
Comment thread src/steamship/agents/llms/steamship_llm.py Outdated
:return: a List of Blocks that are returned from the plugin.
"""

# TODO (PR callout): I'm not certain this class needs to be abstract? This seems pretty soup-to-nuts.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: doesn't seem like SteamshipLLM needs to be an ABC.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, if it's possible to have a single LLM class that really does check the boxes we need, that feels like a great win.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways that LLMs tend to be used. I don't know if I'm using the right words, but they're roughly:

  • Chat completion
  • Prompt completion

One thing that, as a docstring reader, I find myself wanting SUPER CLEAR guidance on is what the difference between messages and history are.

  • If I'm seeking chat completion, what exactly is the limit to what messages should have?
  • If I'm seeking prompt completion, what happens if history is non-empty?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be an opportunity for naming clarification here if:

  • messages --> current_prompt
  • history --> previous_chat_history

Alternatively, history could go away and be replaced with use_chat_completion: bool?

)

blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
# TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably outside the scope of this PR, but if we're already posting files, just post blocks to the client as part of the request eventually?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this question correctly, I think we already do support this. It's just that the agent-style completions that we've been hanging around recently tend to be file-centric

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support sending raw blocks via generate() at the moment (at least at the SDK level). This concern goes away, a bit, in the streaming case where we require not only that the blocks be part of a File, but of a very specific File. The blocks.append() call above will need to be rewritten for that use case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In @douglas-reid 's new streaming PR, we finally achieve the original intent of the generator SDK being file-specific, where the thing being passed in is just the ChatHistory file. Adding a possible generator input that's just an arbitrary list of blocks is definitely doable though if we want it.

Comment on lines +105 to +106
# TODO (PR callout): It looks like current OpenAI impl might not do this?
temp_file.delete()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like a resource leak to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Comment on lines +108 to +110
# TODO (PR callout): Use special output parser in the meantime to extract the tool to call from output block
# tagged with new tags, _easy_ slot in with existing FunctionsBasedLLMAgent.
return generation_task.output.blocks
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TagConstants, we have a new CAPABILITY_RESPONSE kind, name would be the name of the capability, and value would be per-capability.

e.g. in the case of Functions, return steamship.function-calling as the name, and "tool_name" as the value for which tool to call, and the block(s) that is(/are) tagged are the input to the tool.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a clear code example of this somewhere?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the CAPABILITY_RESPONSE layer of categorization add something actionable, or could it just be kind=tool_invocation name=<tool_to_invoke>? or similar?

Comment thread src/steamship/plugin/capabilities.py Outdated
Comment on lines +117 to +121
class CapabilityImpl(Capability, extra=Extra.forbid):
# TODO (PR): The Extra.forbid here is to enforce clamping of deserialization, but it may just be simpler to leave
# that up to individual capabilities? My goal here is to prevent accidental breakage of contract when someone
# provides specific metadata that makes it so e.g. a plugin with an older view of the world thinks it has Best
# Effort support but can't because of those extra requests.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: adding extra details to a capability seems dangerous because it adds extra things to support that a plugin may not know about.

Also note to self, the extra as a metaclass argument here should get moved into a Config inner class for consistency.

Comment thread src/steamship/plugin/capabilities.py Outdated
Comment thread src/steamship/plugin/capabilities.py Outdated
name = "steamship.function_calling_support"

functions: List[OpenAIFunction]
# TODO (PR): Worth generalizing this?
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: at very least it seems like generalizing the name here makes sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much prefer that we serialize Tool and pass that around. List[Tool]. That would prevent any sort of intermediate step in the SDK that is backend specific (until OpenAIFunction becomes a de facto standard, at least).

Bonus points if we can figure out a way to include parameterization of Tool::run in a way that will auto-handle Block --> param translation, so that devs can write code like def run(self, query: str, location: str) and not have to worry about extracting those values from List[Block].

@GitOnUp GitOnUp marked this pull request as ready for review October 1, 2023 09:51
@GitOnUp GitOnUp requested review from dkolas, douglas-reid and eob October 1, 2023 09:52
Copy link
Copy Markdown
Contributor

@eob eob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass of comments - will take a second pass & want to make sure @douglas-reid gets a chance to weigh in too. this is looking pretty great though

self.plugin_instance = plugin_instance

@staticmethod
def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style, Nonblocking]

I really like what this implies.. sort of:

SteamshipLLM.with_gpt4
SteamshipLLM.with_llama2
SteamshipLLM.with_vicuna

etc

A few thoughts:

  • To the extent that there are common, known params, I think this is an opportunity to officially type and standardize them in the SDK (temperature, etc)
  • The with_ feels a bit weird to me.. I think because it's a Python reserved word used in a very specific way that isn't the way this is being used here. What if it was just: LLM.gpt4() or even LLM.gpt()
  • I'm not sure if we need the Steamship prefix since that's already the scope of this entire library

Copy link
Copy Markdown
Contributor Author

@GitOnUp GitOnUp Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: with_, I'm emulating steamship.agents.utils.with_llm. Assume these live on SteamshipLLM and not in e.g. a helper module[1], prefixing these in some way is nice because:

  • IDE users can type .with_ and get a list of things they can use
  • It avoids collisions with method names we might otherwise want to use in the case of someone naming their model in a generic way

[1]: By "helper module" I mean something like how Google's Java library Guava has Arrays as a class with helper methods that work with the Array class. In this case we'd have SteamshipLLMs.gpt4(**<gpt4_params>, SteamshipLLMs.llama2(**<llama2_params), etc., which all return SteamshipLLM (singular).

Using Steamship as a prefix was a suggestion in the design proposal:

Consolidate LLM and ChatLLM into single abstract base class (proposed name: SteamshipLLM to avoid collisions on updates).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of with and the python association, we could use: using_ or from_ or even wrapping_ and get the same benefits. WDYT?

  • SteamshipLLM.using_openai_chatcomplete or
  • SteamshipLMM.using_replicate_model or
  • SteamshipLLM.from_openai or
  • SteamshipLLM.wrapping_openai or ...

Comment thread src/steamship/agents/llms/steamship_llm.py
:return: a List of Blocks that are returned from the plugin.
"""

# TODO (PR callout): I'm not certain this class needs to be abstract? This seems pretty soup-to-nuts.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, if it's possible to have a single LLM class that really does check the boxes we need, that feels like a great win.

:return: a List of Blocks that are returned from the plugin.
"""

# TODO (PR callout): I'm not certain this class needs to be abstract? This seems pretty soup-to-nuts.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways that LLMs tend to be used. I don't know if I'm using the right words, but they're roughly:

  • Chat completion
  • Prompt completion

One thing that, as a docstring reader, I find myself wanting SUPER CLEAR guidance on is what the difference between messages and history are.

  • If I'm seeking chat completion, what exactly is the limit to what messages should have?
  • If I'm seeking prompt completion, what happens if history is non-empty?

)

blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
# TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this question correctly, I think we already do support this. It's just that the agent-style completions that we've been hanging around recently tend to be file-centric

Comment on lines +105 to +106
# TODO (PR callout): It looks like current OpenAI impl might not do this?
temp_file.delete()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

self,
messages: List[Block],
system_prompt: List[Block] = None,
history: Optional[List[Block]] = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, we don't separate these out at the moment. our request looks like:

[ sys_message, some prior messages (user <--> assistant), last user message, scratchpad messages (working state for current request) ].

Importantly, we want all of these Blocks to be in the same File.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for helpers here if they make things simpler, but thinking about how this might be used, it does seem like a single File that would include these tagged blocks might be easier, unless we want to add lists of arbitrary block IDs as a possible generator input (see below).


blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
# TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment
temp_file = File.create(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll want to look at the in-flight streaming PR, as we no longer create a separate file (when we can avoid it). For streaming, these blocks all need to be in the same File, and that File needs to be the ChatHistory file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note above

)

blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
# TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support sending raw blocks via generate() at the moment (at least at the SDK level). This concern goes away, a bit, in the streaming case where we require not only that the blocks be part of a File, but of a very specific File. The blocks.append() call above will need to be rewritten for that use case.

Comment on lines +108 to +110
# TODO (PR callout): Use special output parser in the meantime to extract the tool to call from output block
# tagged with new tags, _easy_ slot in with existing FunctionsBasedLLMAgent.
return generation_task.output.blocks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a clear code example of this somewhere?

return [
Block(
text="Can you make me an image of a whale?",
tags=[Tag(kind=TagKind.CHAT, name=ChatTag.HISTORY)],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, so far, we only use ChatTag.HISTORY on the File-level tags for the ChatHistory file. There are several reasons for this, including that a message is only HISTORY based on other context. For instance, at first, it is just a MESSAGE.

self.plugin_instance = plugin_instance

@staticmethod
def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we've been burned before by not allowing kwargs on the end of these types of methods. I suggest allowing the config to include **kwargs style expansion for forward-compat with the plugin.

self.plugin_instance = plugin_instance

@staticmethod
def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a style choice, but our gpt-4 plugin allows other models. I think we either need to make a different builder-style method, expose model_name as an explicit parameter, or something equivalent.

This might also be a good time to think about renaming this to reflect provider vs. model type (or some other aspect). SteamshipLLM.using_openai_chatcomplete perhaps, or (less-clear) SteamshipLLM.using_gpt ?

self.plugin_instance = plugin_instance

@staticmethod
def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of with and the python association, we could use: using_ or from_ or even wrapping_ and get the same benefits. WDYT?

  • SteamshipLLM.using_openai_chatcomplete or
  • SteamshipLMM.using_replicate_model or
  • SteamshipLLM.from_openai or
  • SteamshipLLM.wrapping_openai or ...

Comment thread src/steamship/plugin/capabilities.py Outdated
name = "steamship.function_calling_support"

functions: List[OpenAIFunction]
# TODO (PR): Worth generalizing this?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much prefer that we serialize Tool and pass that around. List[Tool]. That would prevent any sort of intermediate step in the SDK that is backend specific (until OpenAIFunction becomes a de facto standard, at least).

Bonus points if we can figure out a way to include parameterization of Tool::run in a way that will auto-handle Block --> param translation, so that devs can write code like def run(self, query: str, location: str) and not have to worry about extracting those values from List[Block].

dkolas
dkolas previously approved these changes Oct 4, 2023
Copy link
Copy Markdown
Contributor

@dkolas dkolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this all tracks for me! Seems like a great way to sort out the differences between the LLM plugins.

A few comments / suggestions / possible minor changes below.

self,
messages: List[Block],
system_prompt: List[Block] = None,
history: Optional[List[Block]] = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for helpers here if they make things simpler, but thinking about how this might be used, it does seem like a single File that would include these tagged blocks might be easier, unless we want to add lists of arbitrary block IDs as a possible generator input (see below).

)

blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
# TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In @douglas-reid 's new streaming PR, we finally achieve the original intent of the generator SDK being file-specific, where the thing being passed in is just the ChatHistory file. Adding a possible generator input that's just an arbitrary list of blocks is definitely doable though if we want it.


blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
# TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment
temp_file = File.create(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note above

Comment on lines +108 to +110
# TODO (PR callout): Use special output parser in the meantime to extract the tool to call from output block
# tagged with new tags, _easy_ slot in with existing FunctionsBasedLLMAgent.
return generation_task.output.blocks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the CAPABILITY_RESPONSE layer of categorization add something actionable, or could it just be kind=tool_invocation name=<tool_to_invoke>? or similar?

Comment thread src/steamship/agents/llms/steamship_llm.py
Comment thread src/steamship/agents/llms/steamship_llm.py
@GitOnUp
Copy link
Copy Markdown
Contributor Author

GitOnUp commented Oct 6, 2023

Merging as discussed at sync

@GitOnUp GitOnUp merged commit 0abbe43 into main Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants