# API-Bank

```{note}
Recent research has demonstrated that LLMs can enhance their capabilities
by utilizing external tools. However,
three pivotal questions remain unanswered:

1. How effective are current LLMs in utilizing
tools?
2. How can we enhance LLMs’ ability
to utilize tools?
3. What obstacles need to be
overcome to leverage tools?

To address these
questions, we introduce [API-Bank](https://arxiv.org/abs/2304.08244), a benchmark specifically designed for
tool-augmented LLMs. 

* For the first question,
we develop a runnable evaluation system consisting
of 73 API tools. We annotate 314
tool-use dialogues with 753 API calls to assess
the existing LLMs’ capabilities in planning,
retrieving, and calling APIs.
* For the second
question, we construct a comprehensive
training set containing 1,888 tool-use dialogues
from 2,138 APIs spanning 1,000 distinct domains.
Using this dataset, we train Lynx, a
tool-augmented LLM initialized from Alpaca.
* Through error analysis, we highlight
the key challenges for future research in this
field to answer the third question.
```

## Design Principles of API-Bank

An ideal tool-augmented LLM should enable users
to define the APIs they require in a private API
Pool and request the LLM to invoke these APIs at
the appropriate times to fulfill their needs. Based
on our interviews with users, we have identified
three remaining conditions assess
the following abilities:

* *Call*: the ability to call APIs based on the
given query when the APIs are known;

* *Retrieval+Call*: the ability to retrieve and call
a single API when the APIs are unknown;

* *Plan+Retrieval+Call*: the ability to continuously
plan, retrieve, and call multiple APIs
when the APIs are unknown.

```{figure} ../images/api-bank1.png
```

## Evaluation System of API-Bank

We have implemented 73 APIs in our system, including
commonly used daily APIs such as weather forecast, and accessing other AI models like Textto-
Image Generation. For APIs
that access external information (e.g., search engines),
we must ensure the retrieved information
remains constant to ensure reproducibility.

Among them, we developed a special API called
"API Search" to fulfill the evaluation requirements
of both *Retrieval+Call* and *Plan+Retrieval+Call*
abilities. Specifically, in these two scenarios, the
LLM is unaware of the APIs available in the API
Pool in advance, so it needs to make use of the
API Search to identify the potentially needed APIs
according to the user query. In the input given to the
LLM, we provide the instructions of the API Search
at the beginning, and an API Search is required
before every other API call.

For the API call evaluation,
we employ the Accuracy metric, which is calculated
as the number of correct predictions divided
by the total number of predictions.

```{figure} ../images/api-bank2.png
```

```{figure} ../images/api-bank3.png
```