Skip to content

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Nov 19, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue 344019989 🦕

@tswast tswast requested review from a team as code owners November 19, 2024 17:15
@tswast tswast requested a review from chelsea-lin November 19, 2024 17:15
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 19, 2024
as the values in the ``column_to_search`` column. Can only be set when query is a DataFrame.
top_k (int, default 10):
top_k (int):
Sepecifies the number of nearest neighbors to return. Default to 10.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it is a breaking change but it is not after digging into the BigQuery SQL document to confirm this. To ensure a smooth user experience, can we keep the default explicit to make the behavior immediately apparent without relying on external documentation? I am open to learning more your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user_brute_force is also applied for this comment. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I want to keep BigFrames pretty lightweight. If users don't specify a value, I want to make sure we use the server-side default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this approach. It's important that the docstring clearly states the default value of 10, so users can easily understand the parameter's behavior without needing to consult server-side documentation. Because of that, I'm fine with the current implementation.

"""
import bigframes.series

if not fraction_lists_to_search and use_brute_force is True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep this ValueError exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sudipto was encountering this error thinking it was a BigFrames bug. I'd rather remove the client-side check and let the server handle this case.

Copy link
Contributor

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@tswast
Copy link
Collaborator Author

tswast commented Nov 21, 2024

e2e failure test_read_gbq_table_clustered_with_filter seems unrelated. Merging.

@tswast tswast merged commit 131edc3 into main Nov 21, 2024
22 of 23 checks passed
@tswast tswast deleted the fraction-to-search branch November 21, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants