New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENHANCEMENT] BigQuery integration should use a view instead of a table #2082
[ENHANCEMENT] BigQuery integration should use a view instead of a table #2082
Conversation
@alessandrolacorte Thank you for submitting this PR! We will review it this week. |
@alessandrolacorte GE stores the results of a query in a table to make it faster to run queries against it (during validation). Do you know if using a view instead of a table will preserve the execution speed? |
Hello! I can guarantee that the view will preserve the execution speed. We can go into the details of how BigQuery works and how Dremel (the BigQuery engine) generates the execution plan, but that might be of an overkill. |
@alessandrolacorte Awesome - will merge and it will go out with today's release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…le (great-expectations#2082) * Change from using a table to a view * Updating changelog.rst Co-authored-by: Eugene Mandel <eugene.mandel@gmail.com> Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Changes proposed in this pull request:
The BigQuery Integration should use a view instead of a table, for cost reasons. With the current implementation, it creates a temporary table, which could be a copy of the original table. In BigQuery, you pay for the amount of data scanned, so in the case of tables with TBs of data, it can become very expensive. The alternative is to use a view, which incurs in no cost of duplicating the source table.