Agents Refactor 1/2: Plugin Capabilities and SteamshipLLM prototype by GitOnUp · Pull Request #564 · steamship-core/python-client

GitOnUp · 2023-09-28T17:34:50Z

PR Structure Notes

I've got a large number of # TODO (PR Callout) or # TODO (PR) form comments in this PR, which are generally discussion topics rather than actual TODOs. I've preseeded comment threads on those for easy discovery, in the case that there's not more to say other than the comment I've just quoted the comment.

I've also prototyped what I think SteamshipLLM looks like here, and given the number of open questions I wanted to clear those before finalizing the more formulaic pieces.

Overview

This PR provides the ability for Plugins to assert their capabilities and for users to request that they be fulfilled, along with parameters for those capabilities if applicable. Those parameters are most relevant in the case of functions for GPT-4, but could be extended in the general case for other capabilities.

Step 2 is to adapt the plugin arch to use this and have Agent determine that certain capabilities should be used based on e.g. providing tools or a ChatHistory.

Why blocks and temp files instead of on Options?

The response contains only blocks. Without engine changes, this means we can communicate back and forth via the same channel consistently.
I had considered options as a venue for communicating requirements, but I also want to be able to communicate that yes, we did fulfill these requirements, and here's how.
OpenAI's ChatLLM does this already

Why MIME type instead of Tags?

Blocks have one MIME type, and in the case that it should only be parsed one way with one format, it makes sense to me to clamp that.

Behavior overview lifted from the docstring on capabilities.py:

Capabilities are a concept communicated back and forth between plugins and client code via blocks, and they are meant to
indicate client code's need for certain levels of support of a range of features.

Clients can request NATIVE, BEST_EFFORT, or OPTIONAL support for features that a plugin may or may not support.  Plugins
are expected to parse this and fail-fast if the user has requested support for a feature that the plugin does not
support, so that users are not e.g. billed for usage they can't incorporate.

Capability requests can include other information on the request itself, but oftentimes indicate that certain blocks
will be tagged in Steamship-native ways as part of the rest of the payload.  For example, ConversationSupport is a
capability that indicates the CHAT TagKind will be included in blocks that are part of the input, and the plugin is
expected to incorporate these with a model that supports them.

In the case that a Plugin does not support behavior indicated by the Capability request, it will throw, listing the
models that it could not support at the levels requested.  Otherwise, when Plugins respond, they'll include another
block indicating at which level they served the requested capabilities.

…eorge/plugin-capabilities

GitOnUp

Initial comment chain seeding

GitOnUp · 2023-10-01T09:40:28Z

+        :return: a List of Blocks that are returned from the plugin.
+        """
+
+        # TODO (PR callout): I'm not certain this class needs to be abstract?  This seems pretty soup-to-nuts.


Discussion: doesn't seem like SteamshipLLM needs to be an ABC.

At a high level, if it's possible to have a single LLM class that really does check the boxes we need, that feels like a great win.

There are two ways that LLMs tend to be used. I don't know if I'm using the right words, but they're roughly:

Chat completion

Prompt completion

One thing that, as a docstring reader, I find myself wanting SUPER CLEAR guidance on is what the difference between messages and history are.

If I'm seeking chat completion, what exactly is the limit to what messages should have?

If I'm seeking prompt completion, what happens if history is non-empty?

There might be an opportunity for naming clarification here if:

messages --> current_prompt

history --> previous_chat_history

Alternatively, history could go away and be replaced with use_chat_completion: bool?

GitOnUp · 2023-10-01T09:41:13Z

+            )
+
+        blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
+        # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment


Probably outside the scope of this PR, but if we're already posting files, just post blocks to the client as part of the request eventually?

If I understand this question correctly, I think we already do support this. It's just that the agent-style completions that we've been hanging around recently tend to be file-centric

We don't support sending raw blocks via generate() at the moment (at least at the SDK level). This concern goes away, a bit, in the streaming case where we require not only that the blocks be part of a File, but of a very specific File. The blocks.append() call above will need to be rewritten for that use case.

In @douglas-reid 's new streaming PR, we finally achieve the original intent of the generator SDK being file-specific, where the thing being passed in is just the ChatHistory file. Adding a possible generator input that's just an arbitrary list of blocks is definitely doable though if we want it.

GitOnUp · 2023-10-01T09:41:28Z

+            # TODO (PR callout): It looks like current OpenAI impl might not do this?
+            temp_file.delete()


This seemed like a resource leak to me.

GitOnUp · 2023-10-01T09:44:16Z

+        # TODO (PR callout): Use special output parser in the meantime to extract the tool to call from output block
+        #  tagged with new tags, _easy_ slot in with existing FunctionsBasedLLMAgent.
+        return generation_task.output.blocks


In TagConstants, we have a new CAPABILITY_RESPONSE kind, name would be the name of the capability, and value would be per-capability.

e.g. in the case of Functions, return steamship.function-calling as the name, and "tool_name" as the value for which tool to call, and the block(s) that is(/are) tagged are the input to the tool.

Do you have a clear code example of this somewhere?

Does the CAPABILITY_RESPONSE layer of categorization add something actionable, or could it just be kind=tool_invocation name=<tool_to_invoke>? or similar?

GitOnUp · 2023-10-01T09:45:29Z

+class CapabilityImpl(Capability, extra=Extra.forbid):
+    # TODO (PR): The Extra.forbid here is to enforce clamping of deserialization, but it may just be simpler to leave
+    #  that up to individual capabilities?  My goal here is to prevent accidental breakage of contract when someone
+    #  provides specific metadata that makes it so e.g. a plugin with an older view of the world thinks it has Best
+    #  Effort support but can't because of those extra requests.


Discussion: adding extra details to a capability seems dangerous because it adds extra things to support that a plugin may not know about.

Also note to self, the extra as a metaclass argument here should get moved into a Config inner class for consistency.

GitOnUp · 2023-10-01T09:47:47Z

+    name = "steamship.function_calling_support"
+
+    functions: List[OpenAIFunction]
+    # TODO (PR): Worth generalizing this?


Discussion: at very least it seems like generalizing the name here makes sense.

I'd much prefer that we serialize Tool and pass that around. List[Tool]. That would prevent any sort of intermediate step in the SDK that is backend specific (until OpenAIFunction becomes a de facto standard, at least).

Bonus points if we can figure out a way to include parameterization of Tool::run in a way that will auto-handle Block --> param translation, so that devs can write code like def run(self, query: str, location: str) and not have to worry about extracting those values from List[Block].

eob

First pass of comments - will take a second pass & want to make sure @douglas-reid gets a chance to weigh in too. this is looking pretty great though

eob · 2023-10-02T16:11:43Z

+        self.plugin_instance = plugin_instance
+
+    @staticmethod
+    def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256):


[Style, Nonblocking]

I really like what this implies.. sort of:

SteamshipLLM.with_gpt4 SteamshipLLM.with_llama2 SteamshipLLM.with_vicuna etc

A few thoughts:

To the extent that there are common, known params, I think this is an opportunity to officially type and standardize them in the SDK (temperature, etc)

The with_ feels a bit weird to me.. I think because it's a Python reserved word used in a very specific way that isn't the way this is being used here. What if it was just: LLM.gpt4() or even LLM.gpt()

I'm not sure if we need the Steamship prefix since that's already the scope of this entire library

Re: with_, I'm emulating steamship.agents.utils.with_llm. Assume these live on SteamshipLLM and not in e.g. a helper module[1], prefixing these in some way is nice because:

IDE users can type .with_ and get a list of things they can use

It avoids collisions with method names we might otherwise want to use in the case of someone naming their model in a generic way

[1]: By "helper module" I mean something like how Google's Java library Guava has Arrays as a class with helper methods that work with the Array class. In this case we'd have SteamshipLLMs.gpt4(**<gpt4_params>, SteamshipLLMs.llama2(**<llama2_params), etc., which all return SteamshipLLM (singular).

Using Steamship as a prefix was a suggestion in the design proposal:

Consolidate LLM and ChatLLM into single abstract base class (proposed name: SteamshipLLM to avoid collisions on updates).

Instead of with and the python association, we could use: using_ or from_ or even wrapping_ and get the same benefits. WDYT?

SteamshipLLM.using_openai_chatcomplete or

SteamshipLMM.using_replicate_model or

SteamshipLLM.from_openai or

SteamshipLLM.wrapping_openai or ...

eob · 2023-10-02T16:12:58Z

+        :return: a List of Blocks that are returned from the plugin.
+        """
+
+        # TODO (PR callout): I'm not certain this class needs to be abstract?  This seems pretty soup-to-nuts.


At a high level, if it's possible to have a single LLM class that really does check the boxes we need, that feels like a great win.

eob · 2023-10-02T16:14:57Z

+        :return: a List of Blocks that are returned from the plugin.
+        """
+
+        # TODO (PR callout): I'm not certain this class needs to be abstract?  This seems pretty soup-to-nuts.


There are two ways that LLMs tend to be used. I don't know if I'm using the right words, but they're roughly:

Chat completion

Prompt completion

One thing that, as a docstring reader, I find myself wanting SUPER CLEAR guidance on is what the difference between messages and history are.

If I'm seeking chat completion, what exactly is the limit to what messages should have?

If I'm seeking prompt completion, what happens if history is non-empty?

eob · 2023-10-02T16:15:57Z

+            )
+
+        blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
+        # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment


If I understand this question correctly, I think we already do support this. It's just that the agent-style completions that we've been hanging around recently tend to be file-centric

eob · 2023-10-02T16:16:10Z

+            # TODO (PR callout): It looks like current OpenAI impl might not do this?
+            temp_file.delete()


douglas-reid · 2023-10-02T16:54:52Z

+        self,
+        messages: List[Block],
+        system_prompt: List[Block] = None,
+        history: Optional[List[Block]] = None,


fwiw, we don't separate these out at the moment. our request looks like:

[ sys_message, some prior messages (user <--> assistant), last user message, scratchpad messages (working state for current request) ].

Importantly, we want all of these Blocks to be in the same File.

I'm all for helpers here if they make things simpler, but thinking about how this might be used, it does seem like a single File that would include these tagged blocks might be easier, unless we want to add lists of arbitrary block IDs as a possible generator input (see below).

douglas-reid · 2023-10-02T16:56:34Z

+
+        blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
+        # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment
+        temp_file = File.create(


you'll want to look at the in-flight streaming PR, as we no longer create a separate file (when we can avoid it). For streaming, these blocks all need to be in the same File, and that File needs to be the ChatHistory file.

See note above

douglas-reid · 2023-10-02T16:59:41Z

+            )
+
+        blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
+        # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment


We don't support sending raw blocks via generate() at the moment (at least at the SDK level). This concern goes away, a bit, in the streaming case where we require not only that the blocks be part of a File, but of a very specific File. The blocks.append() call above will need to be rewritten for that use case.

douglas-reid · 2023-10-02T17:00:27Z

+        # TODO (PR callout): Use special output parser in the meantime to extract the tool to call from output block
+        #  tagged with new tags, _easy_ slot in with existing FunctionsBasedLLMAgent.
+        return generation_task.output.blocks


Do you have a clear code example of this somewhere?

douglas-reid · 2023-10-02T17:08:16Z

+    return [
+        Block(
+            text="Can you make me an image of a whale?",
+            tags=[Tag(kind=TagKind.CHAT, name=ChatTag.HISTORY)],


FWIW, so far, we only use ChatTag.HISTORY on the File-level tags for the ChatHistory file. There are several reasons for this, including that a message is only HISTORY based on other context. For instance, at first, it is just a MESSAGE.

douglas-reid · 2023-10-02T17:11:24Z

+        self.plugin_instance = plugin_instance
+
+    @staticmethod
+    def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256):


I feel like we've been burned before by not allowing kwargs on the end of these types of methods. I suggest allowing the config to include **kwargs style expansion for forward-compat with the plugin.

douglas-reid · 2023-10-02T17:16:53Z

+        self.plugin_instance = plugin_instance
+
+    @staticmethod
+    def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256):


This is a style choice, but our gpt-4 plugin allows other models. I think we either need to make a different builder-style method, expose model_name as an explicit parameter, or something equivalent.

This might also be a good time to think about renaming this to reflect provider vs. model type (or some other aspect). SteamshipLLM.using_openai_chatcomplete perhaps, or (less-clear) SteamshipLLM.using_gpt ?

douglas-reid · 2023-10-02T17:19:09Z

+        self.plugin_instance = plugin_instance
+
+    @staticmethod
+    def with_gpt4(client: Steamship, temperature: float = 0.4, max_tokens: int = 256):


Instead of with and the python association, we could use: using_ or from_ or even wrapping_ and get the same benefits. WDYT?

SteamshipLLM.using_openai_chatcomplete or

SteamshipLMM.using_replicate_model or

SteamshipLLM.from_openai or

SteamshipLLM.wrapping_openai or ...

douglas-reid · 2023-10-02T17:23:20Z

+    name = "steamship.function_calling_support"
+
+    functions: List[OpenAIFunction]
+    # TODO (PR): Worth generalizing this?


I'd much prefer that we serialize Tool and pass that around. List[Tool]. That would prevent any sort of intermediate step in the SDK that is backend specific (until OpenAIFunction becomes a de facto standard, at least).

Bonus points if we can figure out a way to include parameterization of Tool::run in a way that will auto-handle Block --> param translation, so that devs can write code like def run(self, query: str, location: str) and not have to worry about extracting those values from List[Block].

dkolas

I think this all tracks for me! Seems like a great way to sort out the differences between the LLM plugins.

A few comments / suggestions / possible minor changes below.

dkolas · 2023-10-04T13:37:24Z

+        self,
+        messages: List[Block],
+        system_prompt: List[Block] = None,
+        history: Optional[List[Block]] = None,


I'm all for helpers here if they make things simpler, but thinking about how this might be used, it does seem like a single File that would include these tagged blocks might be easier, unless we want to add lists of arbitrary block IDs as a possible generator input (see below).

dkolas · 2023-10-04T13:39:59Z

+            )
+
+        blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
+        # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment


In @douglas-reid 's new streaming PR, we finally achieve the original intent of the generator SDK being file-specific, where the thing being passed in is just the ChatHistory file. Adding a possible generator input that's just an arbitrary list of blocks is definitely doable though if we want it.

dkolas · 2023-10-04T13:40:04Z

+
+        blocks.append(CapabilityPluginRequest(requested_capabilities=capabilities).to_block())
+        # TODO (PR callout): I have maybe ideas for not needing temp files here in all cases? GH comment
+        temp_file = File.create(


See note above

dkolas · 2023-10-04T14:37:49Z

+        # TODO (PR callout): Use special output parser in the meantime to extract the tool to call from output block
+        #  tagged with new tags, _easy_ slot in with existing FunctionsBasedLLMAgent.
+        return generation_task.output.blocks


Does the CAPABILITY_RESPONSE layer of categorization add something actionable, or could it just be kind=tool_invocation name=<tool_to_invoke>? or similar?

…eorge/plugin-capabilities

GitOnUp · 2023-10-06T23:51:09Z

Merging as discussed at sync

GitOnUp added 5 commits September 28, 2023 10:33

WIP

e7af2c9

First Tests

f3882f6

Finish tests

b6b6b15

Merge branch 'main' of github.com:steamship-core/python-client into g…

beb3910

…eorge/plugin-capabilities

Fix pydantic validators

09f199b

GitOnUp changed the title ~~WIP~~ Agents Refactor 1/2: Plugin Capabilities Sep 30, 2023

Steamship LLM prototype

3626b0e

GitOnUp changed the title ~~Agents Refactor 1/2: Plugin Capabilities~~ Agents Refactor 1/2: Plugin Capabilities and SteamshipLLM prototype Oct 1, 2023

Fix name issue

bb7c27f

GitOnUp commented Oct 1, 2023

View reviewed changes

GitOnUp marked this pull request as ready for review October 1, 2023 09:51

GitOnUp requested review from dkolas, douglas-reid and eob October 1, 2023 09:52

eob reviewed Oct 2, 2023

View reviewed changes

douglas-reid reviewed Oct 2, 2023

View reviewed changes

GitOnUp added 4 commits October 3, 2023 09:57

Config to classes

43f4683

Impls

0c63e43

Support v request

16182b9

Move tools instead of OAIFunctions

302af6e

dkolas previously approved these changes Oct 4, 2023

View reviewed changes

Blocks as one input

41d4305

GitOnUp dismissed dkolas’s stale review via 41d4305 October 5, 2023 16:31

GitOnUp added 7 commits October 5, 2023 12:15

Remove todo

1ab2e7c

MIME type separation

86ee1c8

Plugin versioning in helpers

52ca878

MIME types for functions responses

7f8c719

Merge branch 'main' of github.com:steamship-core/python-client into g…

ab3d171

…eorge/plugin-capabilities

Function call capabilities merged with chat history changes

3f7d119

Function invocation result type

a3d5f03

Fix test broken by mime types change

0ce29c2

GitOnUp merged commit 0abbe43 into main Oct 6, 2023

		# TODO (PR callout): It looks like current OpenAI impl might not do this?
		temp_file.delete()

Conversation

GitOnUp commented Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Structure Notes

Overview

Why blocks and temp files instead of on Options?

Why MIME type instead of Tags?

Uh oh!

GitOnUp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eob left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GitOnUp Oct 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GitOnUp commented Sep 28, 2023 •

edited

Loading

GitOnUp Oct 2, 2023 •

edited

Loading