Skip to content

Conversation

@allieychen
Copy link
Contributor

Add multiple query support to integration tests (#120).

  • update all tests in integration small_tests to support multiple query.
  • update the validate_table in run_tests, add a loop for all test cases in the test file.
  • changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases".

Ran ./deploy_and_run_tests.sh and all integration tests passed.

Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I made some comments about making this more easily extensible (especially for when we add macro substitutions). Please also ensure pylint checks pass (just run pylint gcp_variant_transforms, you may need to install pylint pip install pylint).

input_pattern,
validation_query,
expected_query_result,
test_cases,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that this is inside the TestCase object, so it seems a bit odd to have a nested test_cases. Consider renaming this to assertions (both here and in the json files). The assertions name also matches python's unittest module (i.e. these are like assertEqual methods that are run after the test case is setup). I also think validation_query can be renamed to assertion_query (or just query) as a result.

def validate_table(self):
"""Runs a simple query against the output table and verifies aggregates."""
client = bigquery.Client(project=self._project)
# TODO(bashir2): Create macros for common queries and add the option for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this TODO :)

if len(rows) != 1:
raise TestCaseFailure('Expected one row in query result, got {}'.format(
for test_case in self._test_cases:
query = (" ").join(test_case['validation_query']).format(TABLE_NAME=self._table_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please use a single quote here (i.e. ' '.join() as we should be consistent in using single quotes in the file (should have been done in the original change as well).

if len(rows) != 1:
raise TestCaseFailure('Expected one row in query result, got {}'.format(
for test_case in self._test_cases:
query = (" ").join(test_case['validation_query']).format(TABLE_NAME=self._table_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider creating an Assertion class. You can then have a run_assertion method in that class and move all of the logic from the for loop body to there (you'd need to pass BigQuery client to the constructor). In addition, you should make a QueryFormatter class that can take care of any macro substitutions and replacements of table_name (you can start with just replacement of table_name). You can either pass an instance of this formatter object to the Assertion class, or format the query outside of the class and just pass in the formatted query (I'm ok with both approaches). The nice thing about formatting the query inside the Assertion class is that you can use **kwargs and don't need to hard-code the JSON keys as you've done now (see how TestCase works).

@coveralls
Copy link

coveralls commented Feb 26, 2018

Pull Request Test Coverage Report for Build 358

  • 7 of 29 (24.14%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.4%) to 90.335%

Changes Missing Coverage Covered Lines Changed/Added Lines %
gcp_variant_transforms/testing/integration/run_tests.py 7 29 24.14%
Files with Coverage Reduction New Missed Lines %
gcp_variant_transforms/testing/integration/run_tests.py 1 24.21%
Totals Coverage Status
Change from base Build 345: -0.4%
Covered Lines: 3075
Relevant Lines: 3404

💛 - Coveralls

@allieychen allieychen force-pushed the multiple-queries branch 2 times, most recently from cac124a to 0e85924 Compare February 26, 2018 19:41
Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better! Thanks! Just a few more nits about commenting. Otherwise, LGTM.

# having a list of queries instead of just one.
query = self._validation_query.format(TABLE_NAME=self._table_name)
query_job = client.query(query)
# TODO(bashir2): Create macros for common queries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please change the TODO to be your username :)

# TODO(bashir2): Create macros for common queries
query_formatter = QueryFormatter(self._table_name)
for assertion_config in self._assertion_configs:
query = query_formatter.format_query(assertion_config['query'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a good way to verify that 'query' and 'expected_result' fields exist in the _validate_test method below (i.e. support nested keys). For now, I think you can just assert that query and expected_result are in assertion_config and add a TODO to move it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a separate method _validate_assertion_config for now to verify that 'query' and 'expected_result' fields exist.



class QueryFormatter(object):
"""Formats a query."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a bit more description of what "format" does in a high level (e.g. replaces keywords such as TABLE_NAME and eventually macros).

self._table_name = table_name

def format_query(self, query):
return (' ').join(query).format(TABLE_NAME=self._table_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a method comment with a more detailed description of what formatting does (you can enumerate all replacement logics; for now, it's just replacing TABLE_NAME) and an 'Args' section (mainly to explain why query is actually a list of strings rather than a single string).
See this for an example.



class Assertion(object):
"""Runs a simple query against the output table and verifies aggregates."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please rephrase the comment to something like "Runs a query and verifies that the output matches the expected result".

Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final nits about comments :)

class QueryFormatter(object):
"""Formats a query.
Replace keywords TABLE_NAME and eventually macros in the query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/Replace/Replaces (there is some styleguide reference about why verbs should be like this :p)

"""Replace TABLE_NAME in the query.
Args:
query (List[str]): a list of strings to be concatenated as one query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add 'Returns' section as well.

self._table_name = table_name

def format_query(self, query):
"""Replace TABLE_NAME in the query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider rephrasing as
"""Formats the given ``query``.

Formatting logic is as follows:

  • Concatenates ``query`` parts into one string.
  • Replaces TABLE_NAME with the table associated for the query.

Args:
..query (List[str]): ...
Returns:
..Formatted query as a single string.
"""


def validate_table(self):
"""Runs a simple query against the output table and verifies aggregates."""
"""Runs queries against the output table and verifies aggregates."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Now that the queries are general, change 'aggregates' to 'results'.

Replaces TABLE_NAME with the table associated for the query.
Args:
query (List[str]): a list of strings to be concatenated as one query.
Copy link
Member

@bashir2 bashir2 Feb 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of documenting the type in the 'Args:' section, please move them to a comment line after function definition, so line 197, 198 would like this:

def format_query(self, query):
# type: (List[str]) -> str

This is the new style we are adopting but I have not managed to update the whole code yet (see Issue #108 for details).

Replaces keyword TABLE_NAME and eventually macros in the query.
"""

def __init__(self, table_name):
Copy link
Member

@bashir2 bashir2 Feb 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider documenting the type (with # type:, see next comment) for anything that is not "private". Since you already use IntelliJ with PyCharm, you should get nice warnings when there are type issues (we will enforce fixing these warnings later as a presubmit check). Check processed_variants.py for examples of these and what/how to add imports needed for type checking.



def _validate_test(test, filename):
# TODO(yifangchen): validate 'query' and 'expected_result' fields exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: validate -> Validate

"num_features": 3
}
"assertion_configs": [
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have the support, please add a second query to this config, just checking the aggregates like all other tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. One of the next tasks is to design more test cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but still it would have been nice to have at least one case that it does multiple queries in this very PR (specially because this one lacked the simple query in every other test). Anyways, up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# TODO(yifangchen): Create macros for common queries
query_formatter = QueryFormatter(self._table_name)
for assertion_config in self._assertion_configs:
_validate_assertion_config(assertion_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the wrong place for validating the config. Note this is after the test is run and tables are created which is not quick, while these validations can be done before the test is run to quickly figure formatting issues in the test config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!


def _validate_test(test, filename):
# TODO(yifangchen): validate 'query' and 'expected_result' fields exist.
# For now, used a separate method _validate_assertion_config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why you have separated _validate_assertion_config from this function? Can't you do both type of validations here? This fixes the other comment I have made above about the config validation being done too late.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of not special-casing 'assertion_configs' here and have a more generic validation logic that supports nested keys. But I do see your point, so perhaps we can add the special logic for assertion_configs with a TODO to replace it with a more generic option.

Formatted query as a single string.
"""

return (' ').join(query).format(TABLE_NAME=self._table_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a full class for this simple method seems a little bit of over-designing things. If you are expecting this to become more complex in future, maybe add a comment in either this or the __init__ function. Otherwise I prefer if we use a simple function (with no new class) for now and only change this when it is really needed (to think more concretely about this, it seems to me that your QueryFormatter instantiation on line 152 is just extra complexity which can be avoided).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial objective for this class is to add macros here for common queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I originally suggested this approach to have a more central place for doing more complex operations like macro support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I prefer to avoid pre-designing code for future needs and only do extra layers of encapsulation when there is a pressing need. But I understand that you are going to make these other changes very soon, so I suppose this is fine.

assertion.run_assertion()


class Assertion(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TableAssertion or QueryAssertion?

@bashir2
Copy link
Member

bashir2 commented Feb 27, 2018

Thanks Allie for adding this support and my apologies for my comments coming in 3 emails, I am still getting used to the GitHub review process :-)

@allieychen
Copy link
Contributor Author

No worries Bashir, thank you for your detailed comment! I appreciate all suggestions to help me improve my coding. :)

bashir2
bashir2 previously approved these changes Feb 28, 2018
Copy link
Member

@bashir2 bashir2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments/replies; please feel free to submit this as is or with doing suggested changes (in that case, I can take another look).

Formatted query as a single string.
"""

return (' ').join(query).format(TABLE_NAME=self._table_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I prefer to avoid pre-designing code for future needs and only do extra layers of encapsulation when there is a pressing need. But I understand that you are going to make these other changes very soon, so I suppose this is fine.

"num_features": 3
}
"assertion_configs": [
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but still it would have been nice to have at least one case that it does multiple queries in this very PR (specially because this one lacked the simple query in every other test). Anyways, up to you.

@allieychen
Copy link
Contributor Author

allieychen commented Feb 28, 2018

Done. PTAL :)

bashir2
bashir2 previously approved these changes Feb 28, 2018
Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks! Last round of nits, I promise :)


def format_query(self, query):
# type: (List[str]) -> str
"""Formats the given ''query''.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these should be backquotes :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

"""Formats the given ''query''.
Formatting logic is as follows:
Concatenates ''query'' parts into one string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add a '-' in front of these listed items (i think they would show up as actual lists in pydoc generator).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}
]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add a new line here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@allieychen
Copy link
Contributor Author

PTAL.

arostamianfar
arostamianfar previously approved these changes Mar 1, 2018
Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@allieychen allieychen changed the title Multiple queries Multiple queries for integration test Mar 1, 2018
2. Update the Assertion constructor.
1. Change the quotes to backquotes.
2. Add a newline at the end of one JSON file.
@allieychen
Copy link
Contributor Author

Synced with the upstream. PTAL.

@allieychen allieychen merged commit 027500c into googlegenomics:master Mar 1, 2018
nmousavi pushed a commit to nmousavi/gcp-variant-transforms that referenced this pull request Mar 9, 2018
* Add multiple query support to integration tests googlegenomics#120.
update all tests in integration small_tests to support multiple query.
update the validate_table in run_tests, add a loop for all test cases in the test file.
changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases".
Ran ./deploy_and_run_tests.sh and all integration tests passed.

Update the development guide doc (googlegenomics#124)

Update the development guide doc.

Add IntelliJ IDE setup.
Add more details.

Added an INFO message for the full command.

Tested: Ran manually and checked the new log message.

Uses the macros to replace the common queries. (googlegenomics#127)

Define NUM_ROWS, SUM_START, SUM_END in QueryFormatter, and replaces them in the query to avoid duplicate code.

TESTED:
deploy_and_run_tests.

Define SchemaDescriptor class.

Provides serialization and lookup API for type/mode of schema fields.

Tested:
  unit test
@allieychen allieychen deleted the multiple-queries branch March 13, 2018 14:14
mhsaul pushed a commit to mhsaul/gcp-variant-transforms that referenced this pull request Mar 29, 2018
* Add multiple query support to integration tests googlegenomics#120.
update all tests in integration small_tests to support multiple query.
update the validate_table in run_tests, add a loop for all test cases in the test file.
changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases".
Ran ./deploy_and_run_tests.sh and all integration tests passed.
mhsaul pushed a commit to mhsaul/gcp-variant-transforms that referenced this pull request Mar 29, 2018
* Add multiple query support to integration tests googlegenomics#120.
update all tests in integration small_tests to support multiple query.
update the validate_table in run_tests, add a loop for all test cases in the test file.
changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases".
Ran ./deploy_and_run_tests.sh and all integration tests passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants