Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: simplify ContextVar and fix sub-span attribution when delegated to thread #13700

Merged
merged 4 commits into from
May 29, 2024

Conversation

RogerHYang
Copy link
Contributor

@RogerHYang RogerHYang commented May 23, 2024

Issue

Sub-spans delegated to threads cannot find its parent.

For example, this can happen here:

thread = Thread(
target=chat_response.write_response_to_history, args=(self._memory,)
)

Furthermore, in the thread shown above, the following event fires with a span_id from outside the thread, but the outside span may no longer be open when the event fires (when streaming is done).

dispatcher.event(
LLMChatEndEvent(

Before

Sub-spans delegated to threads are separated from parent span, resulting in multiple traces.

Screenshot 2024-05-22 at 12 27 28 PM

After

Sub-spans can now find their correct parent_span_id.

Screenshot 2024-05-22 at 12 39 45 PM

Original Behavior

The original behavior was captured in this notebook.

Screenshot 2024-05-22 at 12 35 07 PM

Code

Below is code producing the screenshots above.

import tempfile
from urllib.request import urlretrieve

from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.instrumentation.span_handlers import SimpleSpanHandler
from llama_index.llms.openai import OpenAI

handler = SimpleSpanHandler()
dispatcher = get_dispatcher()
dispatcher.add_span_handler(handler)

with tempfile.NamedTemporaryFile() as tf:
    urlretrieve(
        "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt",
        tf.name,
    )
    documents = SimpleDirectoryReader(input_files=[tf.name]).load_data()

index = VectorStoreIndex.from_documents(documents)
Settings.llm = OpenAI(model="gpt-3.5-turbo")

if __name__ == "__main__":
    query_engine = index.as_chat_engine()
    response_stream = query_engine.stream_chat("What did the author do growing up?")
    response_stream.print_response_stream()
    handler.print_trace_trees()

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label May 23, 2024
from llama_index.core.instrumentation.span.base import BaseSpan
from llama_index.core.instrumentation.span.simple import SimpleSpan

# ContextVar for managing active spans
active_span_id: ContextVar[Optional[str]] = ContextVar("active_span_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha. This is all we need eh? I didn't know that a single global ContextVar can be used to manage span_id's across both async tasks as well as threads. This definitely cleans up the manual shenanigans I resorted to using. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For threads, it also needs that Thread wrapper I added. For async it's automatic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I saw that was really nice. So we should resort to using llama_index.core.types.Thread from now on (CC: @logan-markewich).


token = active_span_id.set(id_)
parent_id = token.old_value
Copy link
Contributor

@nerdai nerdai May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RogerHYang: I'm running into some trouble when trying to execute your code with the Basic Usage Notebook, particularly when it gets to Streaming section.

If token.old_value returns Token.MISSING marker object indicating that the context var had not been set before (https://docs.python.org/3/library/contextvars.html#contextvars.Token.old_value), but I don't believe that to be true.

Any ideas as to why I might be running into this? The async stream_chat of this works fine...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it. I suspect it's because of the async nature of notebook. That's my oversight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries at all! Thanks very much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The root cause of the issue is that llama-index-agent-openai package is not from this PR and so is still using the regular Thread. Therefore the contextvar value of None didn't get carried cross the thread barrier when the openai agent runs. I have updated this PR to handle Token.MISSING so that it would be backward-compatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, makes senese. Thanks Roger for digging into that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR already updated all the import Thread statements via search and replace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. Great thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have also bumped the minimum version requirements on the llama-index-core. Sorry about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me submit a new PR for the version bump.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That's my miss too for not noticing.

Copy link
Contributor

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @RogerHYang for this!

I had tried using Thread and context.run() before (not nearly as nice as you got it here tho) and remember having some issues at that time with async version (I believe).

Really happy about this PR and the really great code improvements you made here! 🙏

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 29, 2024
@nerdai nerdai merged commit 1127908 into run-llama:main May 29, 2024
8 checks passed
@nerdai
Copy link
Contributor

nerdai commented May 29, 2024

FYI: going to cut a new release of llama-index later in the day.

DarkLight1337 added a commit to DarkLight1337/llama_index that referenced this pull request Jun 12, 2024
Mateusz-Switala pushed a commit to Mateusz-Switala/llama_index that referenced this pull request Jun 13, 2024
…d to thread (run-llama#13700)

* fix: simplify contextvar

* test trees in parallel

* fix Token.MISSING
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants