Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] BigQuery integration should use a view instead of a table #2082

Conversation

alessandrolacorte
Copy link
Contributor

Changes proposed in this pull request:

The BigQuery Integration should use a view instead of a table, for cost reasons. With the current implementation, it creates a temporary table, which could be a copy of the original table. In BigQuery, you pay for the amount of data scanned, so in the case of tables with TBs of data, it can become very expensive. The alternative is to use a view, which incurs in no cost of duplicating the source table.

@alessandrolacorte alessandrolacorte marked this pull request as ready for review November 25, 2020 11:26
@eugmandel
Copy link
Contributor

@alessandrolacorte Thank you for submitting this PR! We will review it this week.

@eugmandel
Copy link
Contributor

@alessandrolacorte GE stores the results of a query in a table to make it faster to run queries against it (during validation). Do you know if using a view instead of a table will preserve the execution speed?

@alessandrolacorte
Copy link
Contributor Author

@alessandrolacorte GE stores the results of a query in a table to make it faster to run queries against it (during validation). Do you know if using a view instead of a table will preserve the execution speed?

Hello! I can guarantee that the view will preserve the execution speed. We can go into the details of how BigQuery works and how Dremel (the BigQuery engine) generates the execution plan, but that might be of an overkill.
In essence, views in BigQuery are not materialized and they have zero execution penalty, it is as fast as querying the underlying table.

@eugmandel
Copy link
Contributor

@alessandrolacorte Awesome - will merge and it will go out with today's release.

Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexsherstinsky alexsherstinsky merged commit bbace6f into great-expectations:develop Dec 15, 2020
alexsherstinsky added a commit to alexsherstinsky/great_expectations that referenced this pull request Feb 19, 2021
…le (great-expectations#2082)

* Change from using a table to a view

* Updating changelog.rst

Co-authored-by: Eugene Mandel <eugene.mandel@gmail.com>
Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants