Skip to content

Conversation

@znation
Copy link
Collaborator

@znation znation commented Jun 6, 2025

Instead of blocking while waiting for a Space to initialize, keep the client as None and attempt to initialize it lazily on each log call. When a client is successfully initialized, the log call will flush the queue.

Fixes #28

@znation znation requested review from Saba9 and abidlabs June 6, 2025 23:22
@abidlabs
Copy link
Member

Cool @znation! This is definitely an improvement over the current behavior, but it's still somewhat blocking as it relies on the timeout from Client.__init__ kicking in. You'll see here for example, that there's still a big lag between the first and second epochs while the Client is still instantiatiting, that is not present after the subsequent epochs (after the Client has instantiated):

Screen.Recording.2025-06-10.at.6.04.06.PM.mov

Instead, I wonder if we can offload the Client instantiation to a background thread, and have log() check a shared flag or future to determine if the client is ready. Something along these lines:

        self._client_ready = threading.Event()
        self._client_lock = threading.Lock()
        self._client = client  # Might be None
        self.queued_logs = deque()

        if self._client is None:
            threading.Thread(target=self._init_client_background, daemon=True).start()
        else:
            self._client_ready.set()

    def _init_client_background(self):
        try:
            client = Client(self.url, verbose=False)
            with self._client_lock:
                self._client = client
            self._client_ready.set()
        except Exception as e:
            print(f"[trackio] Failed to initialize client in background: {e}")
            # Do not set _client_ready, keep it false so log() knows to queue

One other thing to keep in mind is that the experiment might finish before the Client has instantiated so perhaps we should also, in the background thread, log everything that is currently in the queue, after the Client has instantiated. In this case, we might be accessing the queue from multiple threads, so we should wrap it with a threading.Lock

@znation
Copy link
Collaborator Author

znation commented Jun 11, 2025

@abidlabs I debated doing that (see comment thread) but I guess I should. I'll update this PR to use a background thread and drain the queue when the client is instantiated.

@znation
Copy link
Collaborator Author

znation commented Jun 11, 2025

I don't think we need a threading.Event since we don't want to block any thread waiting for any other -- log calls should proceed and either enqueue or use the non-null client, so I think the threading.Lock is sufficient.

@abidlabs
Copy link
Member

Thanks @znation sorry I missed the thread earlier, but this approach you suggested sounds good to me

@znation znation force-pushed the zn/pr-28-queue-logs branch 3 times, most recently from fd6380c to a5598fb Compare June 20, 2025 09:49
@znation
Copy link
Collaborator Author

znation commented Jun 20, 2025

@abidlabs I implemented the threading.Lock suggestion and now neither init nor log block while the client is instantiating. Please re-review at your convenience. Thanks!

@abidlabs
Copy link
Member

Hmm I'm seeing a

[trackio] Failed to initialize client in background: can't create new thread at interpreter shutdown

error when running e.g. python examples/deploy-on-spaces.py

@znation
Copy link
Collaborator Author

znation commented Jun 20, 2025

Thanks for finding that @abidlabs, apparently I should test scripts in addition to in a Jupyter notebook. I'll fix that.

@znation znation requested a review from abidlabs June 21, 2025 00:25
@znation
Copy link
Collaborator Author

znation commented Jun 21, 2025

Thanks @abidlabs, I think with this commit it should work from both a script and Jupyter notebook. In order to avoid errors and data loss we have to block at some point and make sure the client is instantiated, so I have done this on the finish call to make sure the queue flushes by the end of finish.

@abidlabs
Copy link
Member

abidlabs commented Jun 22, 2025

Nice @znation! Code looks good I tested with python examples/deploy-on-spaces.py, and it didn't block the code execution while waiting for the Space to start.

The only thing I noticed was that it kept printing [trackio] Failed to initialize client in background: The read operation timed out every couple of seconds, while the Space was starting.

In the end, my console looked like this:

[trackio] Failed to initialize client in background: The read operation timed out
[trackio] Failed to initialize client in background: The read operation timed out
[trackio] Failed to initialize client in background: The read operation timed out
[trackio] Failed to initialize client in background: The read operation timed out
[trackio] Successfully initialized client in background.

imo it'd be better to suppress these. otherwise, lgtm

@abidlabs
Copy link
Member

Just a heads up that trackio.import_csv now includes these two calls as well:

        deploy.create_space_if_not_exists(space_id, dataset_id)
        deploy.upload_db_to_space(project, space_id)

which we should queue

Instead of blocking while waiting for a Space to initialize, keep the
client as None and attempt to initialize it lazily on each log call.
When a client is successfully initialized, the log call will flush the
queue.

Fixes gradio-app#28

ruff format

ruff check --fix

ruff check --fix

Init client on background thread

ruff check --fix

ruff format

Update trackio/run.py

Co-authored-by: Abubakar Abid <abubakar@huggingface.co>

Update trackio/run.py

Co-authored-by: Abubakar Abid <abubakar@huggingface.co>

Wait for background thread on `finish`

In case we still haven't created the client by the time `finish` is
called, block and wait for the thread to finish which will flush the
queue. Otherwise it's likely a script will exit without having flushed
the queue.

ruff format
@znation znation force-pushed the zn/pr-28-queue-logs branch from d0713e2 to dd1415f Compare June 24, 2025 07:34
@znation znation merged commit 6651342 into gradio-app:main Jun 24, 2025
2 checks passed
@znation znation deleted the zn/pr-28-queue-logs branch June 24, 2025 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Queue /logs until they can be called on the remote Space

2 participants