how do actions work? #1

dcsan · 2023-04-11T17:48:48Z

ref https://twitter.com/lgrammel/status/1645840202055680013

if i use this framework to query openAI GPT model, how is output routed to a plugin vs just sending the openAI response?

since openAI doesn't know about these client side plugins, how do the two interact?

referencing:

I see

You can perform the following actions using ${
      this.format.description
    }:

${this.describeActions()}

## RESPONSE FORMAT (ALWAYS USE THIS FORMAT)

Explain and describe your reasoning step by step.
Then use the following format to specify the action you want to perform next:

${this.format.format({
  action: "an action",
  param1: "a parameter value",
  param2: "another parameter value",
})}

You must always use exactly one action with the correct syntax per response.
Each response must precisely follow the action syntax.

so does openAI API respond with a block of text with action: "an action",
where 'an action' is something chatGPT would like executed

then client side the Action plugin will parse the openAI response and make any needed API calls requested from the action?

So looking at an example action
https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/action/tool/summarize-webpage/SummarizeWebpageAction.ts
has

description = "Summarize a webpage considering a topic.",

so openAI would be initialized with

``
You can perform the following actions using ${
this.format.description
}:

${this.describeActions()}

You can perform the following actions using [Summarize a webpage considering a topic]


still seems a bit messy but maybe I can run through and debug the intermediate steps.

The text was updated successfully, but these errors were encountered:

lgrammel · 2023-04-12T07:43:52Z

Here is a quick summary:

Actions are registered in the ActionRegistry.
Descriptions of the actions are included in the OpenAI prompt by calling actionRegistry.getAvailableActions(), e.g. through the AvailableActionsSectionPrompt.
actionRegistry.getAvailableActionInstructions() generates explanation and a detailed list of all actions using their examples and the formatter being used. For the JsonActionFormat, that prompt section looks e.g. like tihs:

## AVAILABLE ACTIONS
You can perform the following actions using JSON:

### tool.search-wikipedia
Search wikipedia using a search term. Returns a list of pages.
Syntax:
{
  "action": "tool.search-wikipedia",
  "query": "{search query}"
}

### tool.read-wikipedia-article
Read a wikipedia article and summarize it considering the query.
Syntax:
{
  "action": "tool.read-wikipedia-article",
  "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
  "topic": "{query that you are answering}"
}

### done
Indicate that you are done with the task.
Syntax:
{
  "action": "done"
}

## RESPONSE FORMAT (ALWAYS USE THIS FORMAT)

Explain and describe your reasoning step by step.
Then use the following format to specify the action you want to perform next:

{
  "action": "an action",
  "param1": "a parameter value",
  "param2": "another parameter value"
}

You must always use exactly one action with the correct syntax per response.
Each response must precisely follow the action syntax.

The LLM response can include an action after the text. It is parsed using the ActionFormat parse method, e.g. in DynamicCompositeStep.generateNextStep()
The action is then retrieved from the registry and an action step is created (also in DynamicCompositeStep).
When the step is executed and it is a ToolStep (which is created by ToolActions), then its executor is invoked.
The tool executor runs the actual code.

Here is more information on the different concepts:

Action
Actions are descriptions of operations that the LLM can decide to do. The LLM is informed about the available actions in the prompt, and if they are part of the response, they are parsed.

Step
The main operation of one iteration of the agent.

Tool
Tools run code on behalf of the agent. The LLM can decide to use tools by choosing a ToolAction in its response. ToolActions create ToolSteps, which run the ToolExecutor.

dcsan · 2023-04-12T12:54:58Z

wow thanks for the detailed reply! so this is how the system this side functions and handles calling the tools. this is code that I can probably follow.

But I'm still not clear how the decision to even include a tool is made, in terms of how the openAI model decides whether to include the tool's boilerplate in it's response?
eg when would it look up a wikipedia article, vs just replying based on what it knows?

Is this based on sending a pre-prompt/system prompt which is the tool description, and then just hoping the model chooses the tool on relevant occasions, and populates the tools boilerplate with exactly what it wants to lookup?

if this calling the tools code / processing the result is all done in your system, does that allow the openAI model to do anything with the tools output? Is it fed back to that agent somehow? eg

user: How many plays did Shakespeare write?
bot: (makes up a reply? or looks up in wikipedia?)

if it does use the tool to do a look up, how does it know the result?
how can it form a sentence based on the tools query results?
is that tool result passed in as another 'turn' of the conversation? (your 'step' above?)

lgrammel · 2023-04-14T11:15:38Z

Yes, the LLM chooses the tool based on the prompt and provides the arguments as well. The results are then fed into the prompt for the next step (iteration).

dcsan · 2023-04-14T14:11:58Z

so the LLM response is smart enough to parse this

{
  "action": "an action",
  "param1": "a parameter value",
  "param2": "another parameter value"
}

and replace "an action" with eg "tool.read-wikipedia-article", and the same for parameters.
this is implemented / called by the client

How is the result of the action (eg reading a wikipedia page) embedded into the bots response?
of course we don't want to send the wiki page to the user as a reply.
So is there some parameter on actions that says "send the result of an action back to the LLM, rather than routing the output to the user" so the LLM could integrate that knowledge into the next step (iteration) response?
Or is that just the default, that executing a tool will take two conversation turns vs a single LLM ask:reply ?

lgrammel · 2023-04-15T10:48:38Z

The agent runs a loop until it's done - it does not immediately respond to the user. The output that you are seeing in the console is just an example from observing the agent run. I plan to add examples of other environments as well.

You can find more details about loops here:

https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/step/Loop.ts
https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/step/GenerateNextStepLoop.ts (this is the type of loop used by the wikipedia QA example)

On each iteration of a loop with GenerateNextStepLoop, a prompt is sent to the LLM. Results from previous action steps are included using the resultFormatter associated with the action:

https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/prompt/RecentStepsPrompt.ts#L48

You can find the default result formatters in the actions (you can change them to your liking when you set up your agent), e.g.: https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/tool/programmable-google-search-engine/ProgrammableGoogleSearchEngineTool.ts#L26

lgrammel · 2023-04-24T13:01:41Z

I've added some basic documentation: https://js-agent.ai/docs/intro

lgrammel closed this as completed Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how do actions work? #1

how do actions work? #1

dcsan commented Apr 11, 2023

lgrammel commented Apr 12, 2023

dcsan commented Apr 12, 2023 •

edited

Loading

lgrammel commented Apr 14, 2023

dcsan commented Apr 14, 2023

lgrammel commented Apr 15, 2023

lgrammel commented Apr 24, 2023

how do actions work? #1

how do actions work? #1

Comments

dcsan commented Apr 11, 2023

lgrammel commented Apr 12, 2023

dcsan commented Apr 12, 2023 • edited Loading

lgrammel commented Apr 14, 2023

dcsan commented Apr 14, 2023

lgrammel commented Apr 15, 2023

lgrammel commented Apr 24, 2023

dcsan commented Apr 12, 2023 •

edited

Loading