Multiple queries for integration test #121

allieychen · 2018-02-23T22:03:38Z

Add multiple query support to integration tests (#120).

update all tests in integration small_tests to support multiple query.
update the validate_table in run_tests, add a loop for all test cases in the test file.
changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases".

Ran ./deploy_and_run_tests.sh and all integration tests passed.

arostamianfar

Thanks! I made some comments about making this more easily extensible (especially for when we add macro substitutions). Please also ensure pylint checks pass (just run pylint gcp_variant_transforms, you may need to install pylint pip install pylint).

arostamianfar · 2018-02-24T23:39:01Z

gcp_variant_transforms/testing/integration/run_tests.py

               input_pattern,
-               validation_query,
-               expected_query_result,
+               test_cases,


I just noticed that this is inside the TestCase object, so it seems a bit odd to have a nested test_cases. Consider renaming this to assertions (both here and in the json files). The assertions name also matches python's unittest module (i.e. these are like assertEqual methods that are run after the test case is setup). I also think validation_query can be renamed to assertion_query (or just query) as a result.

arostamianfar · 2018-02-24T23:46:09Z

gcp_variant_transforms/testing/integration/run_tests.py

  def validate_table(self):
    """Runs a simple query against the output table and verifies aggregates."""
    client = bigquery.Client(project=self._project)
    # TODO(bashir2): Create macros for common queries and add the option for


Please update this TODO :)

arostamianfar · 2018-02-24T23:51:45Z

gcp_variant_transforms/testing/integration/run_tests.py

-    if len(rows) != 1:
-      raise TestCaseFailure('Expected one row in query result, got {}'.format(
+    for test_case in self._test_cases:
+      query = (" ").join(test_case['validation_query']).format(TABLE_NAME=self._table_name)


nit: please use a single quote here (i.e. ' '.join() as we should be consistent in using single quotes in the file (should have been done in the original change as well).

arostamianfar · 2018-02-24T23:56:17Z

gcp_variant_transforms/testing/integration/run_tests.py

-    if len(rows) != 1:
-      raise TestCaseFailure('Expected one row in query result, got {}'.format(
+    for test_case in self._test_cases:
+      query = (" ").join(test_case['validation_query']).format(TABLE_NAME=self._table_name)


Consider creating an Assertion class. You can then have a run_assertion method in that class and move all of the logic from the for loop body to there (you'd need to pass BigQuery client to the constructor). In addition, you should make a QueryFormatter class that can take care of any macro substitutions and replacements of table_name (you can start with just replacement of table_name). You can either pass an instance of this formatter object to the Assertion class, or format the query outside of the class and just pass in the formatted query (I'm ok with both approaches). The nice thing about formatting the query inside the Assertion class is that you can use **kwargs and don't need to hard-code the JSON keys as you've done now (see how TestCase works).

coveralls · 2018-02-26T18:15:50Z

Pull Request Test Coverage Report for Build 358

7 of 29 (24.14%) changed or added relevant lines in 1 file are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-0.4%) to 90.335%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
gcp_variant_transforms/testing/integration/run_tests.py	7	29	24.14%

Files with Coverage Reduction	New Missed Lines	%
gcp_variant_transforms/testing/integration/run_tests.py	1	24.21%

Totals
Change from base Build 345:	-0.4%
Covered Lines:	3075
Relevant Lines:	3404

💛 - Coveralls

arostamianfar

Looks much better! Thanks! Just a few more nits about commenting. Otherwise, LGTM.

arostamianfar · 2018-02-26T21:08:07Z

gcp_variant_transforms/testing/integration/run_tests.py

-    # having a list of queries instead of just one.
-    query = self._validation_query.format(TABLE_NAME=self._table_name)
-    query_job = client.query(query)
+    # TODO(bashir2): Create macros for common queries


nit: please change the TODO to be your username :)

arostamianfar · 2018-02-26T21:10:10Z

gcp_variant_transforms/testing/integration/run_tests.py

+    # TODO(bashir2): Create macros for common queries
+    query_formatter = QueryFormatter(self._table_name)
+    for assertion_config in self._assertion_configs:
+      query = query_formatter.format_query(assertion_config['query'])


I wonder if there is a good way to verify that 'query' and 'expected_result' fields exist in the _validate_test method below (i.e. support nested keys). For now, I think you can just assert that query and expected_result are in assertion_config and add a TODO to move it there.

I add a separate method _validate_assertion_config for now to verify that 'query' and 'expected_result' fields exist.

arostamianfar · 2018-02-26T21:11:38Z

gcp_variant_transforms/testing/integration/run_tests.py

+
+
+class QueryFormatter(object):
+  """Formats a query."""


Please add a bit more description of what "format" does in a high level (e.g. replaces keywords such as TABLE_NAME and eventually macros).

arostamianfar · 2018-02-26T21:14:21Z

gcp_variant_transforms/testing/integration/run_tests.py

+    self._table_name = table_name
+
+  def format_query(self, query):
+    return (' ').join(query).format(TABLE_NAME=self._table_name)


Please add a method comment with a more detailed description of what formatting does (you can enumerate all replacement logics; for now, it's just replacing TABLE_NAME) and an 'Args' section (mainly to explain why query is actually a list of strings rather than a single string).
See this for an example.

arostamianfar · 2018-02-26T21:18:33Z

gcp_variant_transforms/testing/integration/run_tests.py

+
+
+class Assertion(object):
+  """Runs a simple query against the output table and verifies aggregates."""


nit: please rephrase the comment to something like "Runs a query and verifies that the output matches the expected result".

arostamianfar

Final nits about comments :)

arostamianfar · 2018-02-27T17:35:41Z

gcp_variant_transforms/testing/integration/run_tests.py

+class QueryFormatter(object):
+  """Formats a query.
+
+  Replace keywords TABLE_NAME and eventually macros in the query.


nit: s/Replace/Replaces (there is some styleguide reference about why verbs should be like this :p)

arostamianfar · 2018-02-27T17:37:05Z

gcp_variant_transforms/testing/integration/run_tests.py

+    """Replace TABLE_NAME in the query.
+
+    Args:
+      query (List[str]): a list of strings to be concatenated as one query.


Please add 'Returns' section as well.

arostamianfar · 2018-02-27T17:46:32Z

gcp_variant_transforms/testing/integration/run_tests.py

+    self._table_name = table_name
+
+  def format_query(self, query):
+    """Replace TABLE_NAME in the query.


nit: Consider rephrasing as
"""Formats the given ``query``.

Formatting logic is as follows:

Concatenates ``query`` parts into one string.

Replaces TABLE_NAME with the table associated for the query.

Args:
..query (List[str]): ...
Returns:
..Formatted query as a single string.
"""

bashir2 · 2018-02-27T20:33:15Z

gcp_variant_transforms/testing/integration/run_tests.py


  def validate_table(self):
-    """Runs a simple query against the output table and verifies aggregates."""
+    """Runs queries against the output table and verifies aggregates."""


nit: Now that the queries are general, change 'aggregates' to 'results'.

bashir2 · 2018-02-27T20:40:24Z

gcp_variant_transforms/testing/integration/run_tests.py

+      Replaces TABLE_NAME with the table associated for the query.
+
+    Args:
+      query (List[str]): a list of strings to be concatenated as one query.


Instead of documenting the type in the 'Args:' section, please move them to a comment line after function definition, so line 197, 198 would like this:

def format_query(self, query):
# type: (List[str]) -> str

This is the new style we are adopting but I have not managed to update the whole code yet (see Issue #108 for details).

bashir2 · 2018-02-27T20:41:43Z

gcp_variant_transforms/testing/integration/run_tests.py

+  Replaces keyword TABLE_NAME and eventually macros in the query.
+  """
+
+  def __init__(self, table_name):


Consider documenting the type (with # type:, see next comment) for anything that is not "private". Since you already use IntelliJ with PyCharm, you should get nice warnings when there are type issues (we will enforce fixing these warnings later as a presubmit check). Check processed_variants.py for examples of these and what/how to add imports needed for type checking.

bashir2 · 2018-02-27T20:42:30Z

gcp_variant_transforms/testing/integration/run_tests.py



 def _validate_test(test, filename):
+  # TODO(yifangchen): validate 'query' and 'expected_result' fields exist.


nit: validate -> Validate

bashir2 · 2018-02-27T20:44:08Z

gcp_variant_transforms/testing/integration/small_tests/valid_4_2_VEP.json

-    "num_features": 3
-  }
+  "assertion_configs": [
+    {


Now that we have the support, please add a second query to this config, just checking the aggregates like all other tests.

Correct. One of the next tasks is to design more test cases.

Agreed, but still it would have been nice to have at least one case that it does multiple queries in this very PR (specially because this one lacked the simple query in every other test). Anyways, up to you.

bashir2 · 2018-02-27T20:54:59Z

gcp_variant_transforms/testing/integration/run_tests.py

+    # TODO(yifangchen): Create macros for common queries
+    query_formatter = QueryFormatter(self._table_name)
+    for assertion_config in self._assertion_configs:
+      _validate_assertion_config(assertion_config)


I think this is the wrong place for validating the config. Note this is after the test is run and tables are created which is not quick, while these validations can be done before the test is run to quickly figure formatting issues in the test config.

Good point!

bashir2 · 2018-02-27T20:56:43Z

gcp_variant_transforms/testing/integration/run_tests.py


 def _validate_test(test, filename):
+  # TODO(yifangchen): validate 'query' and 'expected_result' fields exist.
+  # For now, used a separate method _validate_assertion_config


I am curious why you have separated _validate_assertion_config from this function? Can't you do both type of validations here? This fixes the other comment I have made above about the config validation being done too late.

I was thinking of not special-casing 'assertion_configs' here and have a more generic validation logic that supports nested keys. But I do see your point, so perhaps we can add the special logic for assertion_configs with a TODO to replace it with a more generic option.

bashir2 · 2018-02-27T21:05:24Z

gcp_variant_transforms/testing/integration/run_tests.py

+      Formatted query as a single string.
+    """
+
+    return (' ').join(query).format(TABLE_NAME=self._table_name)


Having a full class for this simple method seems a little bit of over-designing things. If you are expecting this to become more complex in future, maybe add a comment in either this or the __init__ function. Otherwise I prefer if we use a simple function (with no new class) for now and only change this when it is really needed (to think more concretely about this, it seems to me that your QueryFormatter instantiation on line 152 is just extra complexity which can be avoided).

The initial objective for this class is to add macros here for common queries.

Correct. I originally suggested this approach to have a more central place for doing more complex operations like macro support.

In general, I prefer to avoid pre-designing code for future needs and only do extra layers of encapsulation when there is a pressing need. But I understand that you are going to make these other changes very soon, so I suppose this is fine.

bashir2 · 2018-02-27T21:06:41Z

gcp_variant_transforms/testing/integration/run_tests.py

+      assertion.run_assertion()
+
+
+class Assertion(object):


nit: TableAssertion or QueryAssertion?

bashir2 · 2018-02-27T21:11:16Z

Thanks Allie for adding this support and my apologies for my comments coming in 3 emails, I am still getting used to the GitHub review process :-)

allieychen · 2018-02-27T21:16:37Z

No worries Bashir, thank you for your detailed comment! I appreciate all suggestions to help me improve my coding. :)

bashir2

Just some minor comments/replies; please feel free to submit this as is or with doing suggested changes (in that case, I can take another look).

bashir2 · 2018-02-28T22:04:56Z

gcp_variant_transforms/testing/integration/run_tests.py

+      Formatted query as a single string.
+    """
+
+    return (' ').join(query).format(TABLE_NAME=self._table_name)


In general, I prefer to avoid pre-designing code for future needs and only do extra layers of encapsulation when there is a pressing need. But I understand that you are going to make these other changes very soon, so I suppose this is fine.

bashir2 · 2018-02-28T22:06:25Z

gcp_variant_transforms/testing/integration/small_tests/valid_4_2_VEP.json

-    "num_features": 3
-  }
+  "assertion_configs": [
+    {


Agreed, but still it would have been nice to have at least one case that it does multiple queries in this very PR (specially because this one lacked the simple query in every other test). Anyways, up to you.

allieychen · 2018-02-28T22:28:29Z

Done. PTAL :)

arostamianfar

Nice! Thanks! Last round of nits, I promise :)

arostamianfar · 2018-03-01T15:37:19Z

gcp_variant_transforms/testing/integration/run_tests.py

+
+  def format_query(self, query):
+    # type: (List[str]) -> str
+    """Formats the given ''query''.


nit: these should be backquotes :)

Done. Thanks!

arostamianfar · 2018-03-01T15:37:51Z

gcp_variant_transforms/testing/integration/run_tests.py

+    """Formats the given ''query''.
+
+    Formatting logic is as follows:
+      Concatenates ''query'' parts into one string.


nit: please add a '-' in front of these listed items (i think they would show up as actual lists in pydoc generator).

arostamianfar · 2018-03-01T15:38:37Z

gcp_variant_transforms/testing/integration/small_tests/valid_4_0.json

+      }
+    }
+  ]
+}


nit: please add a new line here

allieychen · 2018-03-01T16:14:59Z

PTAL.

arostamianfar

LGTM! Thanks!

2. Update the Assertion constructor.

1. Change the quotes to backquotes. 2. Add a newline at the end of one JSON file.

allieychen · 2018-03-01T16:48:48Z

Synced with the upstream. PTAL.

* Add multiple query support to integration tests googlegenomics#120. update all tests in integration small_tests to support multiple query. update the validate_table in run_tests, add a loop for all test cases in the test file. changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases". Ran ./deploy_and_run_tests.sh and all integration tests passed. Update the development guide doc (googlegenomics#124) Update the development guide doc. Add IntelliJ IDE setup. Add more details. Added an INFO message for the full command. Tested: Ran manually and checked the new log message. Uses the macros to replace the common queries. (googlegenomics#127) Define NUM_ROWS, SUM_START, SUM_END in QueryFormatter, and replaces them in the query to avoid duplicate code. TESTED: deploy_and_run_tests. Define SchemaDescriptor class. Provides serialization and lookup API for type/mode of schema fields. Tested: unit test

* Add multiple query support to integration tests googlegenomics#120. update all tests in integration small_tests to support multiple query. update the validate_table in run_tests, add a loop for all test cases in the test file. changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases". Ran ./deploy_and_run_tests.sh and all integration tests passed.

allieychen requested a review from arostamianfar February 23, 2018 22:03

arostamianfar suggested changes Feb 26, 2018

View reviewed changes

arostamianfar requested a review from bashir2 February 26, 2018 06:41

allieychen closed this Feb 26, 2018

allieychen reopened this Feb 26, 2018

allieychen force-pushed the multiple-queries branch from 469bc46 to e9f0bf9 Compare February 26, 2018 18:12

allieychen force-pushed the multiple-queries branch 2 times, most recently from cac124a to 0e85924 Compare February 26, 2018 19:41

arostamianfar suggested changes Feb 26, 2018

View reviewed changes

allieychen force-pushed the multiple-queries branch from 0e85924 to 5aee53d Compare February 27, 2018 14:26

arostamianfar suggested changes Feb 27, 2018

View reviewed changes

allieychen force-pushed the multiple-queries branch from 5aee53d to 91eafcf Compare February 27, 2018 19:06

bashir2 reviewed Feb 27, 2018

View reviewed changes

allieychen force-pushed the multiple-queries branch from 91eafcf to 69f38c5 Compare February 28, 2018 14:36

bashir2 previously approved these changes Feb 28, 2018

View reviewed changes

allieychen dismissed bashir2’s stale review via 50f1b7a February 28, 2018 22:25

bashir2 previously approved these changes Feb 28, 2018

View reviewed changes

arostamianfar suggested changes Mar 1, 2018

View reviewed changes

allieychen dismissed bashir2’s stale review via c5c76bf March 1, 2018 16:12

arostamianfar previously approved these changes Mar 1, 2018

View reviewed changes

allieychen changed the title ~~Multiple queries~~ Multiple queries for integration test Mar 1, 2018

allieychen added 2 commits March 1, 2018 11:43

Add multiple query support to integration tests googlegenomics#120.

1bc1eda

multiple queries

615f7b9

allieychen added 4 commits March 1, 2018 11:43

remove unnecessary files

841e293

1. Rename assertions to assertion_configs.

db13f4d

2. Update the Assertion constructor.

Add one more assertion config in valid_4_2_VEP.

f7bfeb3

Correct some format issues.

f86adb6

1. Change the quotes to backquotes. 2. Add a newline at the end of one JSON file.

allieychen dismissed arostamianfar’s stale review via f86adb6 March 1, 2018 16:43

allieychen force-pushed the multiple-queries branch from c5c76bf to f86adb6 Compare March 1, 2018 16:43

arostamianfar approved these changes Mar 1, 2018

View reviewed changes

allieychen merged commit 027500c into googlegenomics:master Mar 1, 2018

allieychen deleted the multiple-queries branch March 13, 2018 14:14



		class Assertion(object):
		"""Runs a simple query against the output table and verifies aggregates."""



		def _validate_test(test, filename):
		# TODO(yifangchen): validate 'query' and 'expected_result' fields exist.

Multiple queries for integration test #121

Multiple queries for integration test #121

Uh oh!

Conversation

allieychen commented Feb 23, 2018

Uh oh!

arostamianfar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Feb 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 358

💛 - Coveralls

Uh oh!

arostamianfar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arostamianfar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bashir2 Feb 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bashir2 Feb 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arostamianfar left a comment •

edited

Loading

coveralls commented Feb 26, 2018 •

edited

Loading

arostamianfar left a comment •

edited

Loading

bashir2 Feb 27, 2018 •

edited

Loading

bashir2 Feb 27, 2018 •

edited

Loading

allieychen commented Feb 28, 2018 •

edited

Loading