fix: simplify `ContextVar` and fix sub-span attribution when delegated to thread #13700

RogerHYang · 2024-05-23T16:49:21Z

Issue

Sub-spans delegated to threads cannot find its parent.

For example, this can happen here:

llama_index/llama-index-core/llama_index/core/chat_engine/simple.py

Lines 122 to 124 in 56a0e70

    
           thread = Thread( 
        
               target=chat_response.write_response_to_history, args=(self._memory,) 
        
           )

Furthermore, in the thread shown above, the following event fires with a span_id from outside the thread, but the outside span may no longer be open when the event fires (when streaming is done).

llama_index/llama-index-core/llama_index/core/llms/callbacks.py

Lines 97 to 98 in 56a0e70

    
           dispatcher.event( 
        
               LLMChatEndEvent(

Before

Sub-spans delegated to threads are separated from parent span, resulting in multiple traces.

After

Sub-spans can now find their correct parent_span_id.

Original Behavior

The original behavior was captured in this notebook.

Code

Below is code producing the screenshots above.

import tempfile
from urllib.request import urlretrieve

from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.instrumentation.span_handlers import SimpleSpanHandler
from llama_index.llms.openai import OpenAI

handler = SimpleSpanHandler()
dispatcher = get_dispatcher()
dispatcher.add_span_handler(handler)

with tempfile.NamedTemporaryFile() as tf:
    urlretrieve(
        "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt",
        tf.name,
    )
    documents = SimpleDirectoryReader(input_files=[tf.name]).load_data()

index = VectorStoreIndex.from_documents(documents)
Settings.llm = OpenAI(model="gpt-3.5-turbo")

if __name__ == "__main__":
    query_engine = index.as_chat_engine()
    response_stream = query_engine.stream_chat("What did the author do growing up?")
    response_stream.print_response_stream()
    handler.print_trace_trees()

nerdai · 2024-05-23T17:55:21Z

llama-index-core/llama_index/core/instrumentation/span/__init__.py

 from llama_index.core.instrumentation.span.base import BaseSpan
 from llama_index.core.instrumentation.span.simple import SimpleSpan

+# ContextVar for managing active spans
+active_span_id: ContextVar[Optional[str]] = ContextVar("active_span_id")


Ha. This is all we need eh? I didn't know that a single global ContextVar can be used to manage span_id's across both async tasks as well as threads. This definitely cleans up the manual shenanigans I resorted to using. Thanks!

For threads, it also needs that Thread wrapper I added. For async it's automatic.

Right. I saw that was really nice. So we should resort to using llama_index.core.types.Thread from now on (CC: @logan-markewich).

nerdai · 2024-05-23T18:42:13Z

llama-index-core/llama_index/core/instrumentation/dispatcher.py


+            token = active_span_id.set(id_)
+            parent_id = token.old_value


@RogerHYang: I'm running into some trouble when trying to execute your code with the Basic Usage Notebook, particularly when it gets to Streaming section.

If token.old_value returns Token.MISSING marker object indicating that the context var had not been set before (https://docs.python.org/3/library/contextvars.html#contextvars.Token.old_value), but I don't believe that to be true.

Any ideas as to why I might be running into this? The async stream_chat of this works fine...

I'll look into it. I suspect it's because of the async nature of notebook. That's my oversight.

No worries at all! Thanks very much.

OK. The root cause of the issue is that llama-index-agent-openai package is not from this PR and so is still using the regular Thread. Therefore the contextvar value of None didn't get carried cross the thread barrier when the openai agent runs. I have updated this PR to handle Token.MISSING so that it would be backward-compatible.

ahh, makes senese. Thanks Roger for digging into that!

This PR already updated all the import Thread statements via search and replace.

Ah okay. Great thanks!

I should have also bumped the minimum version requirements on the llama-index-core. Sorry about that.

Let me submit a new PR for the version bump.

Thanks. That's my miss too for not noticing.

nerdai

Thanks again @RogerHYang for this!

I had tried using Thread and context.run() before (not nearly as nice as you got it here tho) and remember having some issues at that time with async version (I believe).

Really happy about this PR and the really great code improvements you made here! 🙏

nerdai · 2024-05-29T14:20:55Z

FYI: going to cut a new release of llama-index later in the day.

…d to thread (run-llama#13700) * fix: simplify contextvar * test trees in parallel * fix Token.MISSING

fix: simplify contextvar

91afd84

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label May 23, 2024

nerdai reviewed May 23, 2024

View reviewed changes

RogerHYang added 3 commits May 28, 2024 12:13

Merge branch 'main' into fix-contextvar

3ed23a1

test trees in parallel

4739e05

fix Token.MISSING

639432b

nerdai approved these changes May 29, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label May 29, 2024

nerdai merged commit 1127908 into run-llama:main May 29, 2024
8 checks passed

RogerHYang mentioned this pull request May 29, 2024

fix: update minimum version dependency on llama-index-core (^0.10.41) #13808

Merged

nerdai mentioned this pull request Jun 2, 2024

Add backwards compatibility to Dispatcher.get_dispatch_event() method #13895

Merged

7 tasks

DarkLight1337 added a commit to DarkLight1337/llama_index that referenced this pull request Jun 12, 2024

Apply run-llama#13700

761c816

Mateusz-Switala pushed a commit to Mateusz-Switala/llama_index that referenced this pull request Jun 13, 2024

fix: simplify ContextVar and fix sub-span attribution when delegate…

fb373eb

…d to thread (run-llama#13700) * fix: simplify contextvar * test trees in parallel * fix Token.MISSING

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: simplify `ContextVar` and fix sub-span attribution when delegated to thread #13700

fix: simplify `ContextVar` and fix sub-span attribution when delegated to thread #13700

RogerHYang commented May 23, 2024 •

edited

Loading

nerdai May 23, 2024

RogerHYang May 23, 2024

nerdai May 23, 2024

nerdai May 23, 2024 •

edited

Loading

RogerHYang May 23, 2024

nerdai May 23, 2024

RogerHYang May 28, 2024

nerdai May 29, 2024

RogerHYang May 29, 2024 •

edited

Loading

nerdai May 29, 2024

RogerHYang May 29, 2024

RogerHYang May 29, 2024

nerdai May 29, 2024

nerdai left a comment

nerdai commented May 29, 2024

	thread = Thread(
	target=chat_response.write_response_to_history, args=(self._memory,)
	)

fix: simplify ContextVar and fix sub-span attribution when delegated to thread #13700

fix: simplify ContextVar and fix sub-span attribution when delegated to thread #13700

Conversation

RogerHYang commented May 23, 2024 • edited Loading

Issue

Before

After

Original Behavior

Code

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nerdai May 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerHYang May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nerdai left a comment

Choose a reason for hiding this comment

nerdai commented May 29, 2024

fix: simplify `ContextVar` and fix sub-span attribution when delegated to thread #13700

fix: simplify `ContextVar` and fix sub-span attribution when delegated to thread #13700

RogerHYang commented May 23, 2024 •

edited

Loading

nerdai May 23, 2024 •

edited

Loading

RogerHYang May 29, 2024 •

edited

Loading