Queue up log items when Client is not ready #36

znation · 2025-06-06T23:22:32Z

Instead of blocking while waiting for a Space to initialize, keep the client as None and attempt to initialize it lazily on each log call. When a client is successfully initialized, the log call will flush the queue.

Fixes #28

abidlabs · 2025-06-11T01:27:15Z

Cool @znation! This is definitely an improvement over the current behavior, but it's still somewhat blocking as it relies on the timeout from Client.__init__ kicking in. You'll see here for example, that there's still a big lag between the first and second epochs while the Client is still instantiatiting, that is not present after the subsequent epochs (after the Client has instantiated):

Screen.Recording.2025-06-10.at.6.04.06.PM.mov

Instead, I wonder if we can offload the Client instantiation to a background thread, and have log() check a shared flag or future to determine if the client is ready. Something along these lines:

        self._client_ready = threading.Event()
        self._client_lock = threading.Lock()
        self._client = client  # Might be None
        self.queued_logs = deque()

        if self._client is None:
            threading.Thread(target=self._init_client_background, daemon=True).start()
        else:
            self._client_ready.set()

    def _init_client_background(self):
        try:
            client = Client(self.url, verbose=False)
            with self._client_lock:
                self._client = client
            self._client_ready.set()
        except Exception as e:
            print(f"[trackio] Failed to initialize client in background: {e}")
            # Do not set _client_ready, keep it false so log() knows to queue

One other thing to keep in mind is that the experiment might finish before the Client has instantiated so perhaps we should also, in the background thread, log everything that is currently in the queue, after the Client has instantiated. In this case, we might be accessing the queue from multiple threads, so we should wrap it with a threading.Lock

znation · 2025-06-11T07:59:35Z

@abidlabs I debated doing that (see comment thread) but I guess I should. I'll update this PR to use a background thread and drain the queue when the client is instantiated.

znation · 2025-06-11T08:02:36Z

I don't think we need a threading.Event since we don't want to block any thread waiting for any other -- log calls should proceed and either enqueue or use the non-null client, so I think the threading.Lock is sufficient.

abidlabs · 2025-06-11T19:30:30Z

Thanks @znation sorry I missed the thread earlier, but this approach you suggested sounds good to me

znation · 2025-06-20T09:52:23Z

@abidlabs I implemented the threading.Lock suggestion and now neither init nor log block while the client is instantiating. Please re-review at your convenience. Thanks!

trackio/run.py

abidlabs · 2025-06-20T13:50:00Z

Hmm I'm seeing a

[trackio] Failed to initialize client in background: can't create new thread at interpreter shutdown

error when running e.g. python examples/deploy-on-spaces.py

znation · 2025-06-20T20:30:54Z

Thanks for finding that @abidlabs, apparently I should test scripts in addition to in a Jupyter notebook. I'll fix that.

znation · 2025-06-21T00:26:25Z

Thanks @abidlabs, I think with this commit it should work from both a script and Jupyter notebook. In order to avoid errors and data loss we have to block at some point and make sure the client is instantiated, so I have done this on the finish call to make sure the queue flushes by the end of finish.

pyproject.toml

abidlabs · 2025-06-22T01:44:42Z

Nice @znation! Code looks good I tested with python examples/deploy-on-spaces.py, and it didn't block the code execution while waiting for the Space to start.

The only thing I noticed was that it kept printing [trackio] Failed to initialize client in background: The read operation timed out every couple of seconds, while the Space was starting.

In the end, my console looked like this:

[trackio] Failed to initialize client in background: The read operation timed out
[trackio] Failed to initialize client in background: The read operation timed out
[trackio] Failed to initialize client in background: The read operation timed out
[trackio] Failed to initialize client in background: The read operation timed out
[trackio] Successfully initialized client in background.

imo it'd be better to suppress these. otherwise, lgtm

abidlabs · 2025-06-23T08:32:34Z

Just a heads up that trackio.import_csv now includes these two calls as well:

        deploy.create_space_if_not_exists(space_id, dataset_id)
        deploy.upload_db_to_space(project, space_id)

which we should queue

Instead of blocking while waiting for a Space to initialize, keep the client as None and attempt to initialize it lazily on each log call. When a client is successfully initialized, the log call will flush the queue. Fixes gradio-app#28 ruff format ruff check --fix ruff check --fix Init client on background thread ruff check --fix ruff format Update trackio/run.py Co-authored-by: Abubakar Abid <abubakar@huggingface.co> Update trackio/run.py Co-authored-by: Abubakar Abid <abubakar@huggingface.co> Wait for background thread on `finish` In case we still haven't created the client by the time `finish` is called, block and wait for the thread to finish which will flush the queue. Otherwise it's likely a script will exit without having flushed the queue. ruff format

znation requested review from Saba9 and abidlabs June 6, 2025 23:22

znation force-pushed the zn/pr-28-queue-logs branch 3 times, most recently from fd6380c to a5598fb Compare June 20, 2025 09:49

abidlabs reviewed Jun 20, 2025

View reviewed changes

trackio/run.py Outdated Show resolved Hide resolved

abidlabs reviewed Jun 20, 2025

View reviewed changes

trackio/run.py Outdated Show resolved Hide resolved

znation requested a review from abidlabs June 21, 2025 00:25

abidlabs reviewed Jun 22, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

abidlabs approved these changes Jun 22, 2025

View reviewed changes

abidlabs mentioned this pull request Jun 23, 2025

Allow importing Trackio dashboard from a CSV #62

Merged

znation force-pushed the zn/pr-28-queue-logs branch from d0713e2 to dd1415f Compare June 24, 2025 07:34

znation added 3 commits June 24, 2025 00:37

remove backoff dependency

8adc1fc

Remove unneeded logging

e0c9ba9

Add wait_until_space_exists, call from deploy.py

8c69e21

znation merged commit 6651342 into gradio-app:main Jun 24, 2025
2 checks passed

znation deleted the zn/pr-28-queue-logs branch June 24, 2025 08:14

Queue up log items when Client is not ready #36

Queue up log items when Client is not ready #36

Uh oh!

Conversation

znation commented Jun 6, 2025

Uh oh!

abidlabs commented Jun 11, 2025

Uh oh!

znation commented Jun 11, 2025

Uh oh!

znation commented Jun 11, 2025

Uh oh!

abidlabs commented Jun 11, 2025

Uh oh!

znation commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

abidlabs commented Jun 20, 2025

Uh oh!

znation commented Jun 20, 2025

Uh oh!

znation commented Jun 21, 2025

Uh oh!

Uh oh!

abidlabs commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abidlabs commented Jun 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abidlabs commented Jun 22, 2025 •

edited

Loading