[Bug]: Output from tool calls are not very observable #9770

tslmy · 2023-12-31T06:54:57Z

Bug Description

What seems wrong

I can't see responses from function calls in Arize Phoenix...

... or in ChainLit:

What I was expecting

I usually can see results from other types of Spans, such as embedding in Arize Phoenix...

... and Query Engines tools in ChainLit:

so I was expecting the same level of observability for FunctionTools.

My guess

I think this is because "function_call_response", the string LlamaIndex uses to identify responses from generic function calls with callback handlers, was non-standard to OpenInference Tracing conventions.

The "function_call_response" is defined here:

llama_index/llama_index/callbacks/schema.py

Line 61 in 3b52215

FUNCTION_OUTPUT = "function_call_response" # function call output

.

It is sent to callback managers here:

llama_index/llama_index/agent/react/step.py

Line 230 in 3b52215

event.on_end(payload={EventPayload.FUNCTION_OUTPUT: str(tool_output)})

.

And the conventions are here. Indeed, Arize Phoenix doesn't know "function_call_response", as shown here:

https://github.com/Arize-ai/phoenix/blob/363e8913a3a0f7dafca5dc6bba6bf0e9776c1158/app/src/openInference/tracing/semanticConventions.ts#L51-L58

Version

0.9.23

Steps to Reproduce

from llama_hub.tools.wikipedia import WikipediaToolSpec
from llama_index.agent import ReActAgent
from llama_index.callbacks.base import CallbackManager
from llama_index.llms import OpenAILike
import chainlit as cl
import phoenix as px


# "Phoenix can display in real time the traces automatically collected from your LlamaIndex application."
# https://docs.llamaindex.ai/en/stable/module_guides/observability/observability.html
px.launch_app()


@cl.on_chat_start
async def factory():
    callback_manager = CallbackManager([cl.LlamaIndexCallbackHandler()])

    local_llm = OpenAILike(
        api_base="http://localhost:1234/v1",
        timeout=600,  # secs
        api_key="loremIpsum",
        # I'm using this: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta .
        model="zephyr beta 7B q5_k_m gguf",
        is_chat_model=True,
        is_function_calling_model=True,
        context_window=32768,
        callback_manager=callback_manager,
    )

    tool_spec = WikipediaToolSpec()
    agent = ReActAgent.from_tools(
        tools=tool_spec.to_tool_list(),
        llm=local_llm,
        callback_manager=callback_manager,
    )
    cl.user_session.set("agent", agent)


@cl.on_message
async def main(message: cl.Message):
    """ChainLit provides a web GUI for this application."""
    agent: ReActAgent = cl.user_session.get("agent")
    response = await cl.make_async(agent.chat)(message.content)
    response_message = cl.Message(content="")
    response_message.content = response.response
    await response_message.send()

Save the script above as main.py. Run in terminal:

chainlit run main.py -w

Ask the Agent "Search Wikipedia and summarize what is a cheesecake", and observe.

Relevant Logs/Tracbacks

No response

The text was updated successfully, but these errors were encountered:

dosubot · 2023-12-31T06:55:07Z

🤖

It's great that you've identified the issue with the function_call_response not adhering to OpenInference Tracing conventions. Your detailed analysis and references to the specific code locations and conventions are very helpful.

If you have a proposed solution or fix in mind, I would encourage you to go ahead and create a pull request with your changes. Your contribution will be valuable in improving LlamaIndex and ensuring it aligns with the OpenInference Tracing conventions.

Thank you for your dedication to improving LlamaIndex!

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich · 2023-12-31T17:14:10Z

@tslmy do you have any idea on what the fix here would be? Looking at arize's spec, I actually have no idea 😆 We should be populating a tool_calls attribute?

tslmy · 2023-12-31T20:02:27Z

@logan-markewich , tool_calls is closer, but it also doesn't seem to have a conventional way to specify tool outputs.

I opened an Issue to the Open Inference Specifications repo above, where I asked maintainers there to come up with a conventional way to specify tool outputs. Let's see what they will say.

dosubot · 2024-03-31T16:02:20Z

Hi, @tslmy,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the issue highlighted a bug where the output from function calls is not observable in Arize Phoenix and ChainLit due to a non-standard string used to identify responses from generic function calls with callback handlers. The resolution involved encouraging the author to create a pull request with their proposed changes, and there is ongoing discussion about the need for a conventional way to specify tool outputs, with an issue opened in the Open Inference Specifications repo for further discussion.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you!

tslmy · 2024-04-01T04:18:36Z

@dosubot shall we wait till Arize-ai/open-inference-spec#37 is closed before closing this?

teis-e · 2024-04-08T14:42:48Z

Hello, i am also trying to get a non OpenAI model working. Is OpenAILike the only way to do this, like you did. Or can i just use HuggingFaceLLM?

dosubot · 2024-04-08T17:55:55Z

Based on the context provided, OpenAILike is not the only option for integrating non-OpenAI models into your project. You can also use HuggingFaceLLM for this purpose. Both OpenAILike and HuggingFaceLLM are designed to provide interfaces for generating text completions and handling chat interactions, but they are tailored to work with different ecosystems.

OpenAILike provides an interface similar to OpenAI's API. It supports customization through various parameters such as model, api_key, context_window, max_tokens, and an optional tokenizer. It allows for generating text completions with the complete method and handling chat interactions with the chat method. You can find more details in the test_openai_like.py file.

On the other hand, HuggingFaceLLM is designed to integrate with Hugging Face's language models. It allows specifying a model and tokenizer from Hugging Face, with default options set to "StabilityAI/stablelm-tuned-alpha-3b". It supports custom configurations for the context window size, maximum number of new tokens to generate, system prompts, and query wrapper prompts. It also offers methods for both one-off text completions (complete) and streaming completions (stream_complete), as well as methods for handling chat interactions (chat and stream_chat). More details can be found in the huggingface.py file.

So, depending on the specific requirements of your project and the ecosystem you are working with, you can choose either OpenAILike or HuggingFaceLLM to integrate non-OpenAI models into your project.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich · 2024-04-08T19:54:57Z

@teis-e openailike is for models hosted on a server that has the same API as openai

HuggingfaceLLM is specifically for running a model locally (no server, just in the same process/program as llamaindex)

teis-e · 2024-04-08T20:00:13Z

@teis-e openailike is for models hosted on a server that has the same API as openai

HuggingfaceLLM is specifically for running a model locally (no server, just in the same process/program as llamaindex)

Right! Thnx, but the openailike gives the opurtunity to run agents and function calling etc without OpenAI. So just a local model, like the Zypher from above?

logan-markewich · 2024-04-10T21:50:18Z

Using openailike does not magically give access to function calling (I wish haha)

The server running the api has to actually have this implemented to handle the tools kwarg when running the model

tslmy added bug Something isn't working triage Issue needs to be triaged/prioritized labels Dec 31, 2023

tslmy mentioned this issue Dec 31, 2023

[Bug]: qdrant vector store does not handle nested payloads #9766

Closed

tslmy mentioned this issue Dec 31, 2023

Tools: should output from tools have a conventional key as well? Arize-ai/open-inference-spec#37

Open

tslmy changed the title ~~[Bug]: function_call_response does not adhere to OpenInference Tracing conventions~~ [Bug]: Output from tool calls are not very observable Dec 31, 2023

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 31, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Output from tool calls are not very observable #9770

[Bug]: Output from tool calls are not very observable #9770

tslmy commented Dec 31, 2023

dosubot bot commented Dec 31, 2023 •

edited

logan-markewich commented Dec 31, 2023

tslmy commented Dec 31, 2023 •

edited

dosubot bot commented Mar 31, 2024

tslmy commented Apr 1, 2024

teis-e commented Apr 8, 2024

dosubot bot commented Apr 8, 2024

About Dosu

logan-markewich commented Apr 8, 2024

teis-e commented Apr 8, 2024

logan-markewich commented Apr 10, 2024

[Bug]: Output from tool calls are not very observable #9770

[Bug]: Output from tool calls are not very observable #9770

Comments

tslmy commented Dec 31, 2023

Bug Description

What seems wrong

What I was expecting

My guess

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Dec 31, 2023 • edited

logan-markewich commented Dec 31, 2023

tslmy commented Dec 31, 2023 • edited

dosubot bot commented Mar 31, 2024

tslmy commented Apr 1, 2024

teis-e commented Apr 8, 2024

dosubot bot commented Apr 8, 2024

Sources

About Dosu

logan-markewich commented Apr 8, 2024

teis-e commented Apr 8, 2024

logan-markewich commented Apr 10, 2024

dosubot bot commented Dec 31, 2023 •

edited

tslmy commented Dec 31, 2023 •

edited