Skip to content

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Sep 25, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes b/409390651
Towards b/409104302
🦕

@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Sep 25, 2025
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. label Sep 25, 2025
tswast and others added 4 commits September 26, 2025 15:07
* feat: Render more BigQuery events in progress bar

This change updates bigframes/formatting_helpers.py to render more event types from bigframes/core/events.py.

Specifically, it adds rendering support for:
- BigQueryRetryEvent
- BigQueryReceivedEvent
- BigQueryFinishedEvent
- BigQueryUnknownEvent

This provides users with more detailed feedback during query execution in both notebook (HTML) and terminal (plaintext) environments.

* feat: Render more BigQuery events in progress bar

This change updates bigframes/formatting_helpers.py to render more event types from bigframes/core/events.py.

Specifically, it adds rendering support for:
- BigQueryRetryEvent
- BigQueryReceivedEvent
- BigQueryFinishedEvent
- BigQueryUnknownEvent

This provides users with more detailed feedback during query execution in both notebook (HTML) and terminal (plaintext) environments.

Unit tests have been added to verify the rendering of each new event type.

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Sep 26, 2025
@tswast tswast marked this pull request as ready for review September 26, 2025 18:06
@tswast tswast requested review from a team as code owners September 26, 2025 18:06
@tswast tswast requested a review from shuoweil September 26, 2025 18:06
@tswast tswast requested review from TrevorBergeron and removed request for shuoweil September 26, 2025 19:39
@tswast
Copy link
Collaborator Author

tswast commented Sep 26, 2025

Tested manually with several ways of launching queries. All of which show some output now:

image

Comment on lines +79 to +80
class ExecutionStarted(Event):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an execution_id or similar so we can correlate all the events tied to a single request?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could help if we start doing async / background query execution. I don't think it's needed right now, though.


@dataclasses.dataclass(frozen=True)
class Subscriber:
callback_ref: weakref.ref
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its more intuitive to just keep subscribers alive? What is the scenario we are imagining? I could imagine this ref could deleted even when subscriber is still alive, because they created an ephemeral function that went out of scope, though the target of said function is still around.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the scenario we are imagining?

The anywidget table widgets. When they re-run the cell, the TableWidget we create is no longer needed and should go out of scope, but as far as I know, we don't really have an opportunity to unsubscribe at that time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do think the weakref here will have quite unintuitive results. I think there are some other options, basically the main thing is we want to unsubscribe before the callback becomes invalid (because it points at resources that no longer exist, most crucially). The subscriber itself should be able to time this best, and it may not be quite correlated with python object cleanup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, maybe any widget could make some short lived subscribers before/after a call to to_pandas_batches()? I can give it a try.

callback(event)


publisher = Publisher()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we absolutely need global state? And if we do need it, can it at least be passed into sessions as a field?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we don't have a need for it right now, as we've moved the cell-level visualization to bigframes anywidget mode. I can try to refactor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._subscribers: List[Subscriber] = []
self._subscribers_lock = threading.Lock()

def subscribe(self, callback):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should subscribers specify what they are subscribing too in terms of event type enum or class? Otherwise new event types will change the behavior of existing subcribers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a need for this right now. The only purpose so far is progress bars, which should receive everything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess as long as this is internal-use only, we can tack on features like this later, and refactor callers as needed.

class Publisher:
def __init__(self):
self._subscribers: List[Subscriber] = []
self._subscribers_lock = threading.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single lock on a global could cause unexpected cross-session contention.

@GarrettWu GarrettWu removed their assignment Sep 30, 2025
@tswast
Copy link
Collaborator Author

tswast commented Oct 6, 2025

e2e failure: FAILED tests/system/large/blob/test_function.py::test_blob_transcribe[gemini-2.0-flash-lite-001-False] appears to be unrelated

self._datset_ref = bf_io_bigquery.create_bq_dataset_reference(
self.bqclient,
location=self._location,
publisher=self._publisher,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why here but no publishing in eg create_temp_table? How do we determine what types of jobs we are publishing for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real answer is that create_temp_table isn't calling start_query_with_client, so it didn't show up in my refactor. As discussed in chat, it also breaks some of the assumptions about a query being associated with an execution.

In the interest of transparency, we probably should be notifying the user when we run such queries, though. I'll do a search for client.query / client.query_and_wait and see what I find.

array_value: bigframes.core.ArrayValue,
execution_spec: ex_spec.ExecutionSpec,
) -> executor.ExecuteResult:
self._publisher.publish(bigframes.core.events.ExecutionStarted())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action now, but maybe we just end up doing a decorator/wrapper for the execute method if we want to generalize across executor types?

return subscriber

def unsubscribe(self, subscriber: Subscriber):
self._subscribers.remove(subscriber)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might cause errors if done concurrently with iteration?

def __init__(self, callback: Callable[[Event], None], *, publisher: Publisher):
self._publisher = publisher
self._callback = callback
self._subscriber_id = str(uuid.uuid4())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we even need to convert to string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't. It's hashable without use of UUID.

In [1]: import uuid

In [2]: type(uuid.uuid4())
Out[2]: uuid.UUID

In [3]: hash(uuid.uuid4())
Out[3]: 695624583744666396

return value._subscriber_id == self._subscriber_id

def close(self):
self._publisher.unsubscribe(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative to explicitly removing from subscriber list in a blocking way is to just flag oneself as closed, and the publisher can then remove at its convenience. I think this approach works fine though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of that because it would mean a circular reference would hang around until the next time an event is published, but I suppose that'd be OK.

@tswast tswast merged commit 1f48d3a into main Oct 8, 2025
19 of 25 checks passed
@tswast tswast deleted the b409390651-progress-bar branch October 8, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: xl Pull request size is extra large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants