Add tests for micro agents

li-boxuan · Apr 28, 2024 · 80bd184 · 80bd184
1 parent 6019777
commit 80bd184
Show file tree

Hide file tree

Showing 21 changed files with 551 additions and 1 deletion.
diff --git a/agenthub/micro/repo_explorer/agent.yaml b/agenthub/micro/repo_explorer/agent.yaml
@@ -1,4 +1,4 @@
-name: RepoExplorer
+name: RepoExplorerAgent
 description: Generates a detailed summary of an existing codebase
 inputs: {}
 outputs:

diff --git a/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_001.log b/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_001.log
@@ -0,0 +1,65 @@
+
+
+----------
+
+# Task
+You are in charge of accomplishing the following task:
+Write a shell script 'hello.sh' that prints 'hello'.
+
+In order to accomplish this goal, you must delegate tasks to one or more agents, who
+can do the actual work. A description of each agent is provided below. You MUST
+select one of the delegates below to move towards accomplishing the task, and you MUST
+provide the correct inputs for the delegate you select.
+
+## Agents
+
+### StudyRepoForTaskAgent
+Given a particular task, finds and describes all relevant parts of the codebase
+#### Inputs
+{"task": "string"}
+
+### MathAgent
+Solves simple and complex math problems using python
+#### Inputs
+{"task": "string"}
+
+### RepoExplorerAgent
+Generates a detailed summary of an existing codebase
+#### Inputs
+{}
+
+### VerifierAgent
+Given a particular task, verifies that the task has been completed
+#### Inputs
+{"task": "string"}
+
+### CoderAgent
+Given a particular task, and a detailed description of the codebase, accomplishes the task
+#### Inputs
+{"task": "string", "codebase_summary": "string"}
+
+### PostgresAgent
+Writes and maintains PostgreSQL migrations
+#### Inputs
+{"task": "string"}
+
+
+## History
+
+[]
+
+## Available Actions
+* `delegate` - send a task to another agent from the list provided. Arguments:
+  * `agent` - the agent to which the task is delegated. MUST match a name in the list of agents provided.
+  * `inputs` - a dictionary of input parameters to the agent, as specified in the list
+
+* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. Arguments:
+  * `outputs` - a dictionary representing the outputs of your task, if any
+
+
+## Format
+Your response MUST be in JSON format. It must be an object, and it must contain two fields:
+* `action`, which is one of the actions specified here
+* `args`, which is a map of key-value pairs, specifying the arguments for that action
+
+You MUST NOT include any other text besides the JSON response
diff --git a/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_002.log b/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_002.log
@@ -0,0 +1,43 @@
+
+
+----------
+
+# Task
+You are a software engineer. You've inherited an existing codebase, which you're
+learning about for the first time. Your goal is to produce a detailed summary
+of the codebase, including:
+* The overall purpose of the project
+* The directory structure
+* The main components of the codebase
+* How the components fit together
+
+## Available Actions
+* `run` - runs a command on the command line in a Linux shell. Arguments:
+  * `command` - the command to run
+  * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
+
+* `read` - reads the content of a file. Arguments:
+  * `path` - the path of the file to read
+
+* `think` - make a plan, set a goal, or record your thoughts. Arguments:
+  * `thought` - the thought to record
+
+* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. Arguments:
+  * `outputs` - a dictionary representing the outputs of your task, if any
+
+
+You should ONLY `run` commands that have no side-effects, like `ls` and `grep`.
+
+Do NOT finish until you have a complete understanding of the codebase.
+When you're done, put your summary into the output of the `finish` action.
+
+## History
+
+[]
+
+## Format
+Your response MUST be in JSON format. It must be an object, and it must contain two fields:
+* `action`, which is one of the actions specified here
+* `args`, which is a map of key-value pairs, specifying the arguments for that action
+
+You MUST NOT include any other text besides the JSON response
diff --git a/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_003.log b/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_003.log
@@ -0,0 +1,43 @@
+
+
+----------
+
+# Task
+You are a software engineer. You've inherited an existing codebase, which you're
+learning about for the first time. Your goal is to produce a detailed summary
+of the codebase, including:
+* The overall purpose of the project
+* The directory structure
+* The main components of the codebase
+* How the components fit together
+
+## Available Actions
+* `run` - runs a command on the command line in a Linux shell. Arguments:
+  * `command` - the command to run
+  * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
+
+* `read` - reads the content of a file. Arguments:
+  * `path` - the path of the file to read
+
+* `think` - make a plan, set a goal, or record your thoughts. Arguments:
+  * `thought` - the thought to record
+
+* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. Arguments:
+  * `outputs` - a dictionary representing the outputs of your task, if any
+
+
+You should ONLY `run` commands that have no side-effects, like `ls` and `grep`.
+
+Do NOT finish until you have a complete understanding of the codebase.
+When you're done, put your summary into the output of the `finish` action.
+
+## History
+
+[[{"action": "run", "args": {"command": "ls -l", "background": false}, "message": "Running command: ls -l"}, {"observation": "run", "content": "total 0", "extras": {"command_id": -1, "command": "ls -l", "exit_code": 0}, "message": "Command `ls -l` executed with exit code 0."}]]
+
+## Format
+Your response MUST be in JSON format. It must be an object, and it must contain two fields:
+* `action`, which is one of the actions specified here
+* `args`, which is a map of key-value pairs, specifying the arguments for that action
+
+You MUST NOT include any other text besides the JSON response
diff --git a/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_004.log b/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_004.log
@@ -0,0 +1,43 @@
+
+
+----------
+
+# Task
+You are a software engineer. You've inherited an existing codebase, which you're
+learning about for the first time. Your goal is to produce a detailed summary
+of the codebase, including:
+* The overall purpose of the project
+* The directory structure
+* The main components of the codebase
+* How the components fit together
+
+## Available Actions
+* `run` - runs a command on the command line in a Linux shell. Arguments:
+  * `command` - the command to run
+  * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
+
+* `read` - reads the content of a file. Arguments:
+  * `path` - the path of the file to read
+
+* `think` - make a plan, set a goal, or record your thoughts. Arguments:
+  * `thought` - the thought to record
+
+* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. Arguments:
+  * `outputs` - a dictionary representing the outputs of your task, if any
+
+
+You should ONLY `run` commands that have no side-effects, like `ls` and `grep`.
+
+Do NOT finish until you have a complete understanding of the codebase.
+When you're done, put your summary into the output of the `finish` action.
+
+## History
+
+[[{"action": "run", "args": {"command": "ls -l", "background": false}, "message": "Running command: ls -l"}, {"observation": "run", "content": "total 0", "extras": {"command_id": -1, "command": "ls -l", "exit_code": 0}, "message": "Command `ls -l` executed with exit code 0."}], [{"action": "think", "args": {"thought": "The initial 'ls -l' command shows an empty directory, which means there may have been an error or misunderstanding. There's no immediate structure or files to analyze. I need to verify whether the directory is truly empty or if more information is required to proceed."}, "message": "The initial 'ls -l' command shows an empty directory, which means there may have been an error or misunderstanding. There's no immediate structure or files to analyze. I need to verify whether the directory is truly empty or if more information is required to proceed."}, {"observation": "null", "content": "", "extras": {}, "message": ""}]]
+
+## Format
+Your response MUST be in JSON format. It must be an object, and it must contain two fields:
+* `action`, which is one of the actions specified here
+* `args`, which is a map of key-value pairs, specifying the arguments for that action
+
+You MUST NOT include any other text besides the JSON response
diff --git a/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_005.log b/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_005.log
@@ -0,0 +1,43 @@
+
+
+----------
+
+# Task
+You are a software engineer. You've inherited an existing codebase, which you're
+learning about for the first time. Your goal is to produce a detailed summary
+of the codebase, including:
+* The overall purpose of the project
+* The directory structure
+* The main components of the codebase
+* How the components fit together
+
+## Available Actions
+* `run` - runs a command on the command line in a Linux shell. Arguments:
+  * `command` - the command to run
+  * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
+
+* `read` - reads the content of a file. Arguments:
+  * `path` - the path of the file to read
+
+* `think` - make a plan, set a goal, or record your thoughts. Arguments:
+  * `thought` - the thought to record
+
+* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. Arguments:
+  * `outputs` - a dictionary representing the outputs of your task, if any
+
+
+You should ONLY `run` commands that have no side-effects, like `ls` and `grep`.
+
+Do NOT finish until you have a complete understanding of the codebase.
+When you're done, put your summary into the output of the `finish` action.
+
+## History
+
+[[{"action": "run", "args": {"command": "ls -l", "background": false}, "message": "Running command: ls -l"}, {"observation": "run", "content": "total 0", "extras": {"command_id": -1, "command": "ls -l", "exit_code": 0}, "message": "Command `ls -l` executed with exit code 0."}], [{"action": "think", "args": {"thought": "The initial 'ls -l' command shows an empty directory, which means there may have been an error or misunderstanding. There's no immediate structure or files to analyze. I need to verify whether the directory is truly empty or if more information is required to proceed."}, "message": "The initial 'ls -l' command shows an empty directory, which means there may have been an error or misunderstanding. There's no immediate structure or files to analyze. I need to verify whether the directory is truly empty or if more information is required to proceed."}, {"observation": "null", "content": "", "extras": {}, "message": ""}], [{"action": "run", "args": {"command": "pwd", "background": false}, "message": "Running command: pwd"}, {"observation": "run", "content": "/workspace", "extras": {"command_id": -1, "command": "pwd", "exit_code": 0}, "message": "Command `pwd` executed with exit code 0."}]]
+
+## Format
+Your response MUST be in JSON format. It must be an object, and it must contain two fields:
+* `action`, which is one of the actions specified here
+* `args`, which is a map of key-value pairs, specifying the arguments for that action
+
+You MUST NOT include any other text besides the JSON response
diff --git a/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_006.log b/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_006.log
@@ -0,0 +1,43 @@
+
+
+----------
+
+# Task
+You are a software engineer. You've inherited an existing codebase, which you're
+learning about for the first time. Your goal is to produce a detailed summary
+of the codebase, including:
+* The overall purpose of the project
+* The directory structure
+* The main components of the codebase
+* How the components fit together
+
+## Available Actions
+* `run` - runs a command on the command line in a Linux shell. Arguments:
+  * `command` - the command to run
+  * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
+
+* `read` - reads the content of a file. Arguments:
+  * `path` - the path of the file to read
+
+* `think` - make a plan, set a goal, or record your thoughts. Arguments:
+  * `thought` - the thought to record
+
+* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. Arguments:
+  * `outputs` - a dictionary representing the outputs of your task, if any
+
+
+You should ONLY `run` commands that have no side-effects, like `ls` and `grep`.
+
+Do NOT finish until you have a complete understanding of the codebase.
+When you're done, put your summary into the output of the `finish` action.
+
+## History
+
+[[{"action": "run", "args": {"command": "ls -l", "background": false}, "message": "Running command: ls -l"}, {"observation": "run", "content": "total 0", "extras": {"command_id": -1, "command": "ls -l", "exit_code": 0}, "message": "Command `ls -l` executed with exit code 0."}], [{"action": "think", "args": {"thought": "The initial 'ls -l' command shows an empty directory, which means there may have been an error or misunderstanding. There's no immediate structure or files to analyze. I need to verify whether the directory is truly empty or if more information is required to proceed."}, "message": "The initial 'ls -l' command shows an empty directory, which means there may have been an error or misunderstanding. There's no immediate structure or files to analyze. I need to verify whether the directory is truly empty or if more information is required to proceed."}, {"observation": "null", "content": "", "extras": {}, "message": ""}], [{"action": "run", "args": {"command": "pwd", "background": false}, "message": "Running command: pwd"}, {"observation": "run", "content": "/workspace", "extras": {"command_id": -1, "command": "pwd", "exit_code": 0}, "message": "Command `pwd` executed with exit code 0."}], [{"action": "think", "args": {"thought": "The 'pwd' command indicates I am in the '/workspace' directory and the 'ls -l' shows it as empty. It's essential to check if there might be hidden files or directories that are not showing up in a normal listing. I will run 'ls -la' to include hidden files and directories in the list."}, "message": "The 'pwd' command indicates I am in the '/workspace' directory and the 'ls -l' shows it as empty. It's essential to check if there might be hidden files or directories that are not showing up in a normal listing. I will run 'ls -la' to include hidden files and directories in the list."}, {"observation": "null", "content": "", "extras": {}, "message": ""}]]
+
+## Format
+Your response MUST be in JSON format. It must be an object, and it must contain two fields:
+* `action`, which is one of the actions specified here
+* `args`, which is a map of key-value pairs, specifying the arguments for that action
+
+You MUST NOT include any other text besides the JSON response
diff --git a/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_007.log b/tests/integration/mock/ManagerAgent/test_write_simple_script/prompt_007.log
@@ -0,0 +1,43 @@
+
+
+----------
+
+# Task
+You are a software engineer. You've inherited an existing codebase, which you're
+learning about for the first time. Your goal is to produce a detailed summary
+of the codebase, including:
+* The overall purpose of the project
+* The directory structure
+* The main components of the codebase
+* How the components fit together
+
+## Available Actions
+* `run` - runs a command on the command line in a Linux shell. Arguments:
+  * `command` - the command to run
+  * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
+
+* `read` - reads the content of a file. Arguments:
+  * `path` - the path of the file to read
+
+* `think` - make a plan, set a goal, or record your thoughts. Arguments:
+  * `thought` - the thought to record
+
+* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. Arguments:
+  * `outputs` - a dictionary representing the outputs of your task, if any
+
+
+You should ONLY `run` commands that have no side-effects, like `ls` and `grep`.
+
+Do NOT finish until you have a complete understanding of the codebase.
+When you're done, put your summary into the output of the `finish` action.
+
+## History
+
+[[{"action": "run", "args": {"command": "ls -l", "background": false}, "message": "Running command: ls -l"}, {"observation": "run", "content": "total 0", "extras": {"command_id": -1, "command": "ls -l", "exit_code": 0}, "message": "Command `ls -l` executed with exit code 0."}], [{"action": "think", "args": {"thought": "The initial 'ls -l' command shows an empty directory, which means there may have been an error or misunderstanding. There's no immediate structure or files to analyze. I need to verify whether the directory is truly empty or if more information is required to proceed."}, "message": "The initial 'ls -l' command shows an empty directory, which means there may have been an error or misunderstanding. There's no immediate structure or files to analyze. I need to verify whether the directory is truly empty or if more information is required to proceed."}, {"observation": "null", "content": "", "extras": {}, "message": ""}], [{"action": "run", "args": {"command": "pwd", "background": false}, "message": "Running command: pwd"}, {"observation": "run", "content": "/workspace", "extras": {"command_id": -1, "command": "pwd", "exit_code": 0}, "message": "Command `pwd` executed with exit code 0."}], [{"action": "think", "args": {"thought": "The 'pwd' command indicates I am in the '/workspace' directory and the 'ls -l' shows it as empty. It's essential to check if there might be hidden files or directories that are not showing up in a normal listing. I will run 'ls -la' to include hidden files and directories in the list."}, "message": "The 'pwd' command indicates I am in the '/workspace' directory and the 'ls -l' shows it as empty. It's essential to check if there might be hidden files or directories that are not showing up in a normal listing. I will run 'ls -la' to include hidden files and directories in the list."}, {"observation": "null", "content": "", "extras": {}, "message": ""}], [{"action": "run", "args": {"command": "ls -la", "background": false}, "message": "Running command: ls -la"}, {"observation": "run", "content": "total 4\r\ndrwxr-xr-x 2 opendevin root   64 Apr 28 05:05 .\r\ndrwxr-xr-x 1 root      root 4096 Apr 28 05:05 ..", "extras": {"command_id": -1, "command": "ls -la", "exit_code": 0}, "message": "Command `ls -la` executed with exit code 0."}]]
+
+## Format
+Your response MUST be in JSON format. It must be an object, and it must contain two fields:
+* `action`, which is one of the actions specified here
+* `args`, which is a map of key-value pairs, specifying the arguments for that action
+
+You MUST NOT include any other text besides the JSON response