Skip to content

A code-first agent framework for seamlessly planning and executing data analytics tasks.

License

Notifications You must be signed in to change notification settings

m-marinucci/TaskWeaver

 
 

Repository files navigation

TaskWeaver

A code-first agent framework for seamlessly planning and executing data analytics tasks. This innovative framework interprets user requests through coded snippets and efficiently coordinates a variety of plugins in the form of functions to execute data analytics tasks

Highlighted Features

  • Rich data structure - TaskWeaver allows you to work with rich data structures in Python, such as DataFrames, instead of having to work with text strings.
  • Customized algorithms - TaskWeaver allows you to encapsulate your own algorithms into plugins (in the form of Python functions), and orchestrate them to achieve complex tasks.
  • Incorporating domain-specific knowledge - TaskWeaver is designed to be easily incorporating domain-specific knowledge, such as the knowledge of execution flow, to improve the reliability of the AI copilot.
  • Stateful conversation - TaskWeaver is designed to support stateful conversation. It can remember the context of the conversation and leverage it to improve the user experience.
  • Code verification - TaskWeaver is designed to verify the generated code before execution. It can detect potential issues in the generated code and provide suggestions to fix them.
  • Easy to use - TaskWeaver is designed to be easy to use. We provide a set of sample plugins and a tutorial to help you get started. Users can easily create their own plugins based on the sample plugins. TaskWeaver offers an open-box experience, allowing users to run a service immediately after installation.
  • Easy to debug - TaskWeaver is designed to be easy to debug. We have detailed logs to help you understand what is going on during calling the LLM, the code generation, and execution process.
  • Security consideration - TaskWeaver supports a basic session management to keep different users' data separate. The code execution is separated into different processes in order not to interfere with each other.
  • Easy extension - TaskWeaver is designed to be easily extended to accomplish more complex tasks. You can create multiple AI copilots to act in different roles, and orchestrate them to achieve complex tasks.

Getting started

Prerequisites

  • Python 3.10 or above
  • OpenAI (or Azure OpenAI) access with GPT-3.5 above models. However, it is strongly recommended to use the GPT-4, which is more stable.
  • Other requirements can be found in the requirements.txt file.

OpenAI API had a major update from 0.xx to 1.xx in November 2023. Please make sure you are not using an old version because the API is not backward compatible.

Quick Start

Installation

You can install TaskWeaver by running the following command:

git clone https://github.com/microsoft/TaskWeaver.git
cd TaskWeaver
# install the requirements
pip install -r requirements.txt

Project Directory

TaskWeaver runs as a process, you need to create a project directory to store plugins and configuration files. We provided a sample project directory in the project folder. You can copy the project folder to your workspace. A project directory typically contains the following files and folders:

📦project
 ┣ 📜taskweaver_config.json # the configuration file for TaskWeaver
 ┣ 📂plugins # the folder to store plugins
 ┣ 📂planner_examples # the folder to store planner examples
 ┣ 📂codeinterpreter_examples # the folder to store code interpreter examples
 ┣ 📂sample_data # the folder to store sample data used for evaluations
 ┣ 📂logs # the folder to store logs, will be generated after program starts
 ┗ 📂workspace # the directory stores session data, will be generated after program starts
    ┗ 📂 session_id 
      ┣ 📂ces # the folder used by the code execution service
      ┗ 📂cwd # the current working directory to run the generated code

OpenAI Configuration

Before running TaskWeaver, you need to provide your OpenAI API key and other necessary information. You can do this by editing the taskweaver_config.json file. If you are using Azure OpenAI, you need to set the following parameters in the taskweaver_config.json file:

Azure OpenAI

{
"llm.api_base": "https://xxx.openai.azure.com/",
"llm.api_key": "the api key",
"llm.api_type": "azure",
"llm.api_version": "the api version",
"llm.model": "the model name, e.g., gpt-4"
}

OpenAI

{
"llm.api_key": "the api key",
"llm.model": "the model name, e.g., gpt-4"
}

💡 Only the latest OpenAI API supports the json_object response format. If you are using an older version of OpenAI API, you need to set the llm.response_format to null.

More configuration options can be found in the configuration documentation.

Start TaskWeaver

# assume you are in the taskweaver folder
# -p is the path to the project directory
python -m taskweaver -p ./project/

This will start the TaskWeaver process and you can interact with it through the command line interface. If everything goes well, you will see the following prompt:

=========================================================
 _____         _     _       __
|_   _|_ _ ___| | _ | |     / /__  ____ __   _____  _____
  | |/ _` / __| |/ /| | /| / / _ \/ __ `/ | / / _ \/ ___/
  | | (_| \__ \   < | |/ |/ /  __/ /_/ /| |/ /  __/ /
  |_|\__,_|___/_|\_\|__/|__/\___/\__,_/ |___/\___/_/
=========================================================
TaskWeaver: I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
Human: ___

Two Walkthrough Examples

Example 1: Pull data from a database and apply an anomaly detection algorithm

In this example, we will show you how to use TaskWeaver to pull data from a database and apply an anomaly detection algorithm.

anomaly_detection.mp4

If you want to follow this example, you need to configure the sql_pull_data plugin in the project/plugins/sql_pull_data.yaml file. You need to provide the following information:

api_type: azure or openai
api_base: ...
api_key: ...
api_version: ...
deployment_name: ...
sqlite_db_path: sqlite:///../../../sample_data/anomaly_detection.db

The sql_pull_data plugin is a plugin that pulls data from a database. It takes a natural language request as input and returns a DataFrame as output.

This plugin is implemented based on Langchain. If you want to follow this example, you need to install the Langchain package:

pip install langchain
pip install tabulate

Example 2: Forecast QQQ's price in the next week

In this example, we will show you how to use TaskWeaver to forecast QQQ's price in the next week using the ARIMA algorithm.

stock_forecast.mp4

If you want to follow this example, you need to you have two requirements installed:

pip install yfinance
pip install statsmodels

For more examples, please refer to our paper.

Use TaskWeaver as a library

If you want to use TaskWeaver as a library, you can refer to the following code example:

from taskweaver.app.app import TaskWeaverApp

app_dir = "/path/to/project/"
app = TaskWeaverApp(app_dir=app_dir)
session = app.get_session()

user_query = "hello, what can you do?"
response_round = session.send_message(user_query,
                                      event_handler=lambda x, y: print(f"{x}:\n{y}"))
print(response_round.to_dict())

Note:

  • event_handler: a callback function that is utilized to display the response obtained from TaskWeaver step by step. It takes two arguments: the message type (e.g., plan) and the message content.
  • response_round: the response from TaskWeaver. which is an object of the Round class. An example of the Round object is shown below:
{
    "id": "round-20231201-043134-218a2681",
    "user_query": "hello, what can you do?",
    "state": "finished",
    "post_list": [
        {
            "id": "post-20231201-043134-10eedcca",
            "message": "hello, what can you do?",
            "send_from": "User",
            "send_to": "Planner",
            "attachment_list": []
        },
        {
            "id": "post-20231201-043141-86a2aaff",
            "message": "I can help you with various tasks, such as counting rows in a data file, detecting anomalies in a dataset, searching for products on Klarna, summarizing research papers, and pulling data from a SQL database. Please provide more information about the task you want to accomplish, and I'll guide you through the process.",
            "send_from": "Planner",
            "send_to": "User",
            "attachment_list": [
                {
                    "id": "atta-20231201-043141-6bc4da86",
                    "type": "init_plan",
                    "content": "1. list the available functions"
                },
                {
                    "id": "atta-20231201-043141-6f29f6c9",
                    "type": "plan",
                    "content": "1. list the available functions"
                },
                {
                    "id": "atta-20231201-043141-76186c7a",
                    "type": "current_plan_step",
                    "content": "1. list the available functions"
                }
            ]
        }
    ]
}

Customizing TaskWeaver

There are two ways to customize TaskWeaver: creating plugins and creating examples.

Creating Plugins

Since TaskWeaver can already perform some basic tasks, you can create plugins to extend its capabilities. A plugin is a python function that takes a set of arguments and returns a set of results.

Typically, you only need to write a plugin in the following example scenarios:

  • You want to encapsulate your own algorithm into a plugin.
  • You want to import a python package that is not supported by TaskWeaver.
  • You want to connect to an external data source to pull data.
  • You want to query a web API.

Refer to the plugin documentation for more details. Otherwise, you can leverage TaskWeaver's code generation capability to perform tasks.

Creating Examples

The purpose of examples is to help LLMs understand how to perform tasks especially when the tasks are complex and need domain-specific knowledge.

There are two types of examples: (1) planning examples and (2) code interpreter examples. Planning examples are used to demonstrate how to use TaskWeaver to plan for a specific task. Code generation examples are used to demonstrate how to generate code or orchestrate plugins to perform a specific task.

Refer to the example documentation for more details.

Citation

Our paper could be found here. If you use TaskWeaver in your research, please cite our paper:

@article{taskweaver,
  title={TaskWeaver: ACode-First Agent Framework},
  author={Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, Hang Dong, Jue Zhang, Lu Wang, Minghua Ma, Pu Zhao, Si Qin, Xiaoting Qin, Chao Du, Yong Xu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang},
  journal={arXiv preprint arXiv:2311.17541},
  year={2023}
}

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

About

A code-first agent framework for seamlessly planning and executing data analytics tasks.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%