Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add BigQuery test coverage #3135

Closed

Conversation

jdimatteo
Copy link
Contributor

@jdimatteo jdimatteo commented Jul 29, 2021

Add tests for BigQuery with v3 API, as first step of #3132.

E.g.

$ export GOOGLE_APPLICATION_CREDENTIALS=/home/jdimatteo/Downloads/great-expectations-bigquery-ci-service-account.json
$ time pytest tests/test_definitions/test_expectations_cfe.py --bigquery -k bigquery --no-postgresql --no-spark -v
...
================================================================================= 169 passed, 264 skipped, 932 deselected, 155 warnings in 691.71s (0:11:31) =================================================================================

real	11m35.464s
user	0m30.987s
sys	0m3.213s

Changes proposed in this pull request:

  • Add BigQuery to v3 API expectation test suite

Open Questions / Todos

  1. How to cleanup test BigQuery tables created during test execution, e.g. so we don't pay for storage while these randomly named tables accumulate?
  • Option A. Create dataset with Default table expiration set to .1 days, so temp tables are automatically deleted, and document this
  • Option B. Who cares? Let tables accumulate, and the cost will likely be trivial (<$1 a month)
  • Option C. Delete at end of test run
    • delete dataset, all tables in the dataset, or just the tables that were created?
    • what is the interaction if there are concurrent tests running with the same BigQuery project, e.g. if two different PRs have BigQuery checks running simultaneously?
  1. What Google Cloud account should be used for hosting the BigQuery dataset and running the tests?
  • Option A. Use a specially created greatexpectationsbigqueryci@gmail.com account with @jdimatteo's credit card / billing
    • upload a secret of the security account for access by azure pipeline
  • Option B. Use a Great Expectations managed google cloud account with a corporate credit card
  1. Are the tests too slow / how to speed them up?
  • Option A. Who cares? The bigquery tests currently run in about 11 minutes, which is similar to the mysql test runtime.
  • Option B. Run tests in parallel
  • Option C. Optimize the sequential run
    • e.g. it seems like creating the tables take the bulk of the time -- maybe there is a way to re-use previously created tables by including a hash of the contents in the table name or something?
  1. Should we add tests for the v2 API as well as the v3 API?
  • Option A. No, only focus on v3 API.
  • Option B. Yes, don't merge this PR until we include tests for V2 and V3 API.
  • Option C. Add a todo for the V2 API, but only focus on tests for V3 API first and don't hold up this PR waiting for V2 API test coverage.

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

pytest tests/test_definitions/test_expectations_cfe.py --no-postgresql --no-spark -k expect_column_values_to_not_be_null -v
@netlify
Copy link

netlify bot commented Jul 29, 2021

✔️ Deploy Preview for knoxpod ready!

🔨 Explore the source changes: 3b0246c

🔍 Inspect the deploy log: https://app.netlify.com/sites/knoxpod/deploys/610412f665f45d00087268df

😎 Browse the preview: https://deploy-preview-3135--knoxpod.netlify.app

@jdimatteo jdimatteo changed the title [FEATURE] bigquery tests [FEATURE] Add BigQuery test coverage Jul 29, 2021
jdimatteo added a commit to jdimatteo/great_expectations that referenced this pull request Jul 30, 2021
1. added documentation
2. using helper function _create_bigquery_engine with gcp project environment variable
3. added bigquery specific list in candidate_test_is_on_temporary_notimplemented_list_cfe
4. removed bigquery specific test files, so we can just use the already existing ones (along with bigquery specific notimplemented list per great-expectations#3)
Shinnnyshinshin pushed a commit that referenced this pull request Jul 30, 2021
#3158)

* Merged in my work from #3135:

1. added documentation
2. using helper function _create_bigquery_engine with gcp project environment variable
3. added bigquery specific list in candidate_test_is_on_temporary_notimplemented_list_cfe
4. removed bigquery specific test files, so we can just use the already existing ones (along with bigquery specific notimplemented list per #3)

* enabled the expect_column_values_to_be_unique test, which passes now with Will's changes
@jdimatteo
Copy link
Contributor Author

Closed in favor of #3158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant