-
Notifications
You must be signed in to change notification settings - Fork 59
Multiple queries for integration test #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I made some comments about making this more easily extensible (especially for when we add macro substitutions). Please also ensure pylint checks pass (just run pylint gcp_variant_transforms, you may need to install pylint pip install pylint).
| input_pattern, | ||
| validation_query, | ||
| expected_query_result, | ||
| test_cases, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that this is inside the TestCase object, so it seems a bit odd to have a nested test_cases. Consider renaming this to assertions (both here and in the json files). The assertions name also matches python's unittest module (i.e. these are like assertEqual methods that are run after the test case is setup). I also think validation_query can be renamed to assertion_query (or just query) as a result.
| def validate_table(self): | ||
| """Runs a simple query against the output table and verifies aggregates.""" | ||
| client = bigquery.Client(project=self._project) | ||
| # TODO(bashir2): Create macros for common queries and add the option for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this TODO :)
| if len(rows) != 1: | ||
| raise TestCaseFailure('Expected one row in query result, got {}'.format( | ||
| for test_case in self._test_cases: | ||
| query = (" ").join(test_case['validation_query']).format(TABLE_NAME=self._table_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please use a single quote here (i.e. ' '.join() as we should be consistent in using single quotes in the file (should have been done in the original change as well).
| if len(rows) != 1: | ||
| raise TestCaseFailure('Expected one row in query result, got {}'.format( | ||
| for test_case in self._test_cases: | ||
| query = (" ").join(test_case['validation_query']).format(TABLE_NAME=self._table_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider creating an Assertion class. You can then have a run_assertion method in that class and move all of the logic from the for loop body to there (you'd need to pass BigQuery client to the constructor). In addition, you should make a QueryFormatter class that can take care of any macro substitutions and replacements of table_name (you can start with just replacement of table_name). You can either pass an instance of this formatter object to the Assertion class, or format the query outside of the class and just pass in the formatted query (I'm ok with both approaches). The nice thing about formatting the query inside the Assertion class is that you can use **kwargs and don't need to hard-code the JSON keys as you've done now (see how TestCase works).
469bc46 to
e9f0bf9
Compare
Pull Request Test Coverage Report for Build 358
💛 - Coveralls |
cac124a to
0e85924
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better! Thanks! Just a few more nits about commenting. Otherwise, LGTM.
| # having a list of queries instead of just one. | ||
| query = self._validation_query.format(TABLE_NAME=self._table_name) | ||
| query_job = client.query(query) | ||
| # TODO(bashir2): Create macros for common queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please change the TODO to be your username :)
| # TODO(bashir2): Create macros for common queries | ||
| query_formatter = QueryFormatter(self._table_name) | ||
| for assertion_config in self._assertion_configs: | ||
| query = query_formatter.format_query(assertion_config['query']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a good way to verify that 'query' and 'expected_result' fields exist in the _validate_test method below (i.e. support nested keys). For now, I think you can just assert that query and expected_result are in assertion_config and add a TODO to move it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add a separate method _validate_assertion_config for now to verify that 'query' and 'expected_result' fields exist.
|
|
||
|
|
||
| class QueryFormatter(object): | ||
| """Formats a query.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a bit more description of what "format" does in a high level (e.g. replaces keywords such as TABLE_NAME and eventually macros).
| self._table_name = table_name | ||
|
|
||
| def format_query(self, query): | ||
| return (' ').join(query).format(TABLE_NAME=self._table_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a method comment with a more detailed description of what formatting does (you can enumerate all replacement logics; for now, it's just replacing TABLE_NAME) and an 'Args' section (mainly to explain why query is actually a list of strings rather than a single string).
See this for an example.
|
|
||
|
|
||
| class Assertion(object): | ||
| """Runs a simple query against the output table and verifies aggregates.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please rephrase the comment to something like "Runs a query and verifies that the output matches the expected result".
0e85924 to
5aee53d
Compare
arostamianfar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final nits about comments :)
| class QueryFormatter(object): | ||
| """Formats a query. | ||
| Replace keywords TABLE_NAME and eventually macros in the query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/Replace/Replaces (there is some styleguide reference about why verbs should be like this :p)
| """Replace TABLE_NAME in the query. | ||
| Args: | ||
| query (List[str]): a list of strings to be concatenated as one query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add 'Returns' section as well.
| self._table_name = table_name | ||
|
|
||
| def format_query(self, query): | ||
| """Replace TABLE_NAME in the query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Consider rephrasing as
"""Formats the given ``query``.
Formatting logic is as follows:
- Concatenates
``query``parts into one string. - Replaces
TABLE_NAMEwith the table associated for the query.
Args:
..query (List[str]): ...
Returns:
..Formatted query as a single string.
"""
5aee53d to
91eafcf
Compare
|
|
||
| def validate_table(self): | ||
| """Runs a simple query against the output table and verifies aggregates.""" | ||
| """Runs queries against the output table and verifies aggregates.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Now that the queries are general, change 'aggregates' to 'results'.
| Replaces TABLE_NAME with the table associated for the query. | ||
| Args: | ||
| query (List[str]): a list of strings to be concatenated as one query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of documenting the type in the 'Args:' section, please move them to a comment line after function definition, so line 197, 198 would like this:
def format_query(self, query):
# type: (List[str]) -> str
This is the new style we are adopting but I have not managed to update the whole code yet (see Issue #108 for details).
| Replaces keyword TABLE_NAME and eventually macros in the query. | ||
| """ | ||
|
|
||
| def __init__(self, table_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider documenting the type (with # type:, see next comment) for anything that is not "private". Since you already use IntelliJ with PyCharm, you should get nice warnings when there are type issues (we will enforce fixing these warnings later as a presubmit check). Check processed_variants.py for examples of these and what/how to add imports needed for type checking.
|
|
||
|
|
||
| def _validate_test(test, filename): | ||
| # TODO(yifangchen): validate 'query' and 'expected_result' fields exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: validate -> Validate
| "num_features": 3 | ||
| } | ||
| "assertion_configs": [ | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have the support, please add a second query to this config, just checking the aggregates like all other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. One of the next tasks is to design more test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but still it would have been nice to have at least one case that it does multiple queries in this very PR (specially because this one lacked the simple query in every other test). Anyways, up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| # TODO(yifangchen): Create macros for common queries | ||
| query_formatter = QueryFormatter(self._table_name) | ||
| for assertion_config in self._assertion_configs: | ||
| _validate_assertion_config(assertion_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the wrong place for validating the config. Note this is after the test is run and tables are created which is not quick, while these validations can be done before the test is run to quickly figure formatting issues in the test config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
|
|
||
| def _validate_test(test, filename): | ||
| # TODO(yifangchen): validate 'query' and 'expected_result' fields exist. | ||
| # For now, used a separate method _validate_assertion_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why you have separated _validate_assertion_config from this function? Can't you do both type of validations here? This fixes the other comment I have made above about the config validation being done too late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of not special-casing 'assertion_configs' here and have a more generic validation logic that supports nested keys. But I do see your point, so perhaps we can add the special logic for assertion_configs with a TODO to replace it with a more generic option.
| Formatted query as a single string. | ||
| """ | ||
|
|
||
| return (' ').join(query).format(TABLE_NAME=self._table_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a full class for this simple method seems a little bit of over-designing things. If you are expecting this to become more complex in future, maybe add a comment in either this or the __init__ function. Otherwise I prefer if we use a simple function (with no new class) for now and only change this when it is really needed (to think more concretely about this, it seems to me that your QueryFormatter instantiation on line 152 is just extra complexity which can be avoided).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial objective for this class is to add macros here for common queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I originally suggested this approach to have a more central place for doing more complex operations like macro support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I prefer to avoid pre-designing code for future needs and only do extra layers of encapsulation when there is a pressing need. But I understand that you are going to make these other changes very soon, so I suppose this is fine.
| assertion.run_assertion() | ||
|
|
||
|
|
||
| class Assertion(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: TableAssertion or QueryAssertion?
|
Thanks Allie for adding this support and my apologies for my comments coming in 3 emails, I am still getting used to the GitHub review process :-) |
|
No worries Bashir, thank you for your detailed comment! I appreciate all suggestions to help me improve my coding. :) |
91eafcf to
69f38c5
Compare
bashir2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments/replies; please feel free to submit this as is or with doing suggested changes (in that case, I can take another look).
| Formatted query as a single string. | ||
| """ | ||
|
|
||
| return (' ').join(query).format(TABLE_NAME=self._table_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I prefer to avoid pre-designing code for future needs and only do extra layers of encapsulation when there is a pressing need. But I understand that you are going to make these other changes very soon, so I suppose this is fine.
| "num_features": 3 | ||
| } | ||
| "assertion_configs": [ | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but still it would have been nice to have at least one case that it does multiple queries in this very PR (specially because this one lacked the simple query in every other test). Anyways, up to you.
|
Done. PTAL :) |
arostamianfar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks! Last round of nits, I promise :)
|
|
||
| def format_query(self, query): | ||
| # type: (List[str]) -> str | ||
| """Formats the given ''query''. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these should be backquotes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
| """Formats the given ''query''. | ||
| Formatting logic is as follows: | ||
| Concatenates ''query'' parts into one string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please add a '-' in front of these listed items (i think they would show up as actual lists in pydoc generator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| } | ||
| } | ||
| ] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please add a new line here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
PTAL. |
arostamianfar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
2. Update the Assertion constructor.
1. Change the quotes to backquotes. 2. Add a newline at the end of one JSON file.
c5c76bf to
f86adb6
Compare
|
Synced with the upstream. PTAL. |
* Add multiple query support to integration tests googlegenomics#120. update all tests in integration small_tests to support multiple query. update the validate_table in run_tests, add a loop for all test cases in the test file. changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases". Ran ./deploy_and_run_tests.sh and all integration tests passed. Update the development guide doc (googlegenomics#124) Update the development guide doc. Add IntelliJ IDE setup. Add more details. Added an INFO message for the full command. Tested: Ran manually and checked the new log message. Uses the macros to replace the common queries. (googlegenomics#127) Define NUM_ROWS, SUM_START, SUM_END in QueryFormatter, and replaces them in the query to avoid duplicate code. TESTED: deploy_and_run_tests. Define SchemaDescriptor class. Provides serialization and lookup API for type/mode of schema fields. Tested: unit test
* Add multiple query support to integration tests googlegenomics#120. update all tests in integration small_tests to support multiple query. update the validate_table in run_tests, add a loop for all test cases in the test file. changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases". Ran ./deploy_and_run_tests.sh and all integration tests passed.
* Add multiple query support to integration tests googlegenomics#120. update all tests in integration small_tests to support multiple query. update the validate_table in run_tests, add a loop for all test cases in the test file. changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases". Ran ./deploy_and_run_tests.sh and all integration tests passed.
Add multiple query support to integration tests (#120).
Ran ./deploy_and_run_tests.sh and all integration tests passed.