-
Notifications
You must be signed in to change notification settings - Fork 61
docs: remove import bigframes.pandas as bpd boilerplate from many samples #2147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ples Also, fixes several constructors that didn't take a session for compatibility with multi-session applications.
bigframes/conftest.py
Outdated
doctest_namespace["np"] = np | ||
doctest_namespace["pd"] = pd | ||
doctest_namespace["pa"] = pa | ||
doctest_namespace["bpd"] = polars_session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if instead, we should just inject the polars session as global session? Not sure all the methods are the same, but I guess it works so far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, there's quite a bit that isn't supported yet on the Polars session. Doing it this way means that we can override bpd
to be the BQ version in the samples itself with a simple import.
# These are included so that Session and bigframes.pandas can be used | ||
# interchangeably. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is purely for doctests, or can we just inject session for doctests? or are we trying to enable some other stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it's important for consistency. I actually uncovered a few cases where the session should have been supplied to things like to_datetime()
with local data but wasn't.
) -> Union[pandas.Timestamp, datetime.datetime, bigframes.series.Series]: | ||
return global_session.with_default_session( | ||
bigframes.session.Session.to_datetime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this change/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_datetime()
has code paths that take local data. It was using the global session implicitly when it constructed the Series objects. Now it can take a session explicitly.
**Examples:** | ||
>>> import bigframes.pandas as bpd | ||
>>> bpd.options.display.progress_bar = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we still need some of the bpd imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how I made some samples use the BQ session instead of the Polars session. In this case, hash()
is unimplemented:
third_party/bigframes_vendored/pandas/core/generic.py ..........F. [100%]
================================================================ FAILURES ================================================================
______________________________ [doctest] third_party.bigframes_vendored.pandas.core.generic.NDFrame.sample _______________________________
558 dog 4 0 2
559 spider 8 0 1
560 fish 0 0 8
561 <BLANKLINE>
562 [4 rows x 3 columns]
563
564 Fetch one random row from the DataFrame (Note that we use `random_state`
565 to ensure reproducibility of the examples):
566
567 >>> df.sample(random_state=1)
UNEXPECTED EXCEPTION: NotImplementedError("Polars compiler hasn't implemented hash()")
Traceback (most recent call last):
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
File "<doctest third_party.bigframes_vendored.pandas.core.generic.NDFrame.sample[2]>", line 1, in <module>
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 197, in wrapper
raise e
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 182, in wrapper
return method(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/dataframe.py", line 794, in __repr__
pandas_df, row_count, query_job = self._block.retrieve_repr_request_results(
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/blocks.py", line 1658, in retrieve_repr_request_results
head_result = self.session._executor.execute(
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/testing/polars_session.py", line 48, in execute
lazy_frame: polars.LazyFrame = self.compiler.compile(array_value.node)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 577, in compile
return self.compile_node(node)
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
return self.compile_node(node.child).select(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 601, in compile_filter
return self.compile_node(node.child).filter(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 649, in compile_offsets
return self.compile_node(node.child).with_columns(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
return self.compile_node(node.child).select(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 601, in compile_filter
return self.compile_node(node.child).filter(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 649, in compile_offsets
return self.compile_node(node.child).with_columns(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
return self.compile_node(node.child).select(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 607, in compile_orderby
frame = self.compile_node(node.child)
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
return self.compile_node(node.child).select(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 639, in compile_projection
new_col = self.expr_compiler.compile_expression(bound_expr).alias(name.sql)
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 180, in _
return self.compile_op(op, *args)
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 184, in compile_op
raise NotImplementedError(f"Polars compiler hasn't implemented {op}")
NotImplementedError: Polars compiler hasn't implemented hash()
/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/third_party/bigframes_vendored/pandas/core/generic.py:567: UnexpectedException
Aside: As much as possible I'd like to encourage us BigFrames devs to implement our ops in the Polars session as well as BQ, so defaulting to Polars is a subtle nudge in that direction.
) -> Union[pandas.Timestamp, datetime.datetime, bigframes.series.Series]: | ||
return global_session.with_default_session( | ||
bigframes.session.Session.to_datetime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_datetime()
has code paths that take local data. It was using the global session implicitly when it constructed the Series objects. Now it can take a session explicitly.
bigframes/session/__init__.py
Outdated
MultiIndex.from_tuples = bigframes.core.indexes.MultiIndex.from_tuples # type: ignore | ||
MultiIndex.from_frame = bigframes.core.indexes.MultiIndex.from_frame # type: ignore | ||
MultiIndex.from_arrays = bigframes.core.indexes.MultiIndex.from_arrays # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: these should probably take a Session argument, too.
bigframes/conftest.py
Outdated
doctest_namespace["np"] = np | ||
doctest_namespace["pd"] = pd | ||
doctest_namespace["pa"] = pa | ||
doctest_namespace["bpd"] = polars_session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, there's quite a bit that isn't supported yet on the Polars session. Doing it this way means that we can override bpd
to be the BQ version in the samples itself with a simple import.
**Examples:** | ||
>>> import bigframes.pandas as bpd | ||
>>> bpd.options.display.progress_bar = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how I made some samples use the BQ session instead of the Polars session. In this case, hash()
is unimplemented:
third_party/bigframes_vendored/pandas/core/generic.py ..........F. [100%]
================================================================ FAILURES ================================================================
______________________________ [doctest] third_party.bigframes_vendored.pandas.core.generic.NDFrame.sample _______________________________
558 dog 4 0 2
559 spider 8 0 1
560 fish 0 0 8
561 <BLANKLINE>
562 [4 rows x 3 columns]
563
564 Fetch one random row from the DataFrame (Note that we use `random_state`
565 to ensure reproducibility of the examples):
566
567 >>> df.sample(random_state=1)
UNEXPECTED EXCEPTION: NotImplementedError("Polars compiler hasn't implemented hash()")
Traceback (most recent call last):
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
File "<doctest third_party.bigframes_vendored.pandas.core.generic.NDFrame.sample[2]>", line 1, in <module>
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 197, in wrapper
raise e
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 182, in wrapper
return method(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/dataframe.py", line 794, in __repr__
pandas_df, row_count, query_job = self._block.retrieve_repr_request_results(
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/blocks.py", line 1658, in retrieve_repr_request_results
head_result = self.session._executor.execute(
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/testing/polars_session.py", line 48, in execute
lazy_frame: polars.LazyFrame = self.compiler.compile(array_value.node)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 577, in compile
return self.compile_node(node)
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
return self.compile_node(node.child).select(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 601, in compile_filter
return self.compile_node(node.child).filter(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 649, in compile_offsets
return self.compile_node(node.child).with_columns(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
return self.compile_node(node.child).select(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 601, in compile_filter
return self.compile_node(node.child).filter(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 649, in compile_offsets
return self.compile_node(node.child).with_columns(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
return self.compile_node(node.child).select(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 607, in compile_orderby
frame = self.compile_node(node.child)
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
return self.compile_node(node.child).select(
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 639, in compile_projection
new_col = self.expr_compiler.compile_expression(bound_expr).alias(name.sql)
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 180, in _
return self.compile_op(op, *args)
File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 184, in compile_op
raise NotImplementedError(f"Polars compiler hasn't implemented {op}")
NotImplementedError: Polars compiler hasn't implemented hash()
/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/third_party/bigframes_vendored/pandas/core/generic.py:567: UnexpectedException
Aside: As much as possible I'd like to encourage us BigFrames devs to implement our ops in the Polars session as well as BQ, so defaulting to Polars is a subtle nudge in that direction.
… tswast-doctest-boilerplate
dummy.pkl
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
This reverts commit 765b678.
c3a23ea
to
8b069e8
Compare
Also, fixes several constructors that didn't take a session for compatibility with multi-session applications.
🦕