Skip to content

Conversation

@allieychen
Copy link
Contributor

Define NUM_ROWS, SUM_START, SUM_END in QueryFormatter, and replaces them in the query to avoid duplicate code.

TESTED:
deploy_and_run_tests.

@allieychen
Copy link
Contributor Author

Advice is needed for line 213 and line 214 in run_tests.py. It is quite ugly to replace TABLE_NAME twice.

@coveralls
Copy link

coveralls commented Mar 2, 2018

Pull Request Test Coverage Report for Build 379

  • 8 of 14 (57.14%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.1%) to 90.202%

Changes Missing Coverage Covered Lines Changed/Added Lines %
gcp_variant_transforms/testing/integration/run_tests.py 8 14 57.14%
Files with Coverage Reduction New Missed Lines %
gcp_variant_transforms/testing/integration/run_tests.py 1 26.6%
Totals Coverage Status
Change from base Build 374: -0.1%
Covered Lines: 3084
Relevant Lines: 3419

💛 - Coveralls

- Replaces TABLE_NAME with the table associated for the query.
"""
return (' ').join(query).format(TABLE_NAME=self._table_name)
return (' ').join(query).format(NUM_ROWS=self.NUM_ROWS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider breaking this up into two helper methods: _replace_macros and _replace_variables.
so, it would be return _replace_variables(_replace_macros(' '.join(query))
you can optionally create meaningful variable names for the intermediate strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Replaces keyword TABLE_NAME and macros in the query.
"""

NUM_ROWS = 'SELECT COUNT(0) AS num_rows FROM {TABLE_NAME}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to make these distinct from the rest of the variables (e.g. TABLE_NAME), what do you think of adding a _QUERY suffix to all of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Replaces keyword TABLE_NAME and macros in the query.
"""

NUM_ROWS = 'SELECT COUNT(0) AS num_rows FROM {TABLE_NAME}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cleaner to create an enum of these macros. It both provides a natural grouping of these constants and also is easier to loop through them (e.g. adding a new macro is just as easy as adding a new entry in that Enum).
So, it would be something like:

import enum
class _QueryMacros(enum.Enum):  # nested under the QueryFormatter class
  NUM_ROWS_QUERY = 'SELECT ...'
  SUM_START_QUERY = 'SELECT ...'

def _replace_macros(query):
  for macro in _QueryMacros:
    if macro.name == query:
      return macro.value
  else:
    return query 

This also means that you'd no longer need the surrounding {} brackets in the test def, which further distinguishes macros from regular variable names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for such a detailed comment! DONE.

Copy link
Contributor Author

@allieychen allieychen Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually got a warning on "for macro in _QueryMacros", says "Expected expected collections.iterable got Type[_QueryMacros] instead. Any idea how to fix this? I have found one way which loops through the members.items, but I don't think it is as readable as current code.

" SUM(start_position) AS sum_start, ",
" SUM(end_position) AS sum_end ",
"FROM {TABLE_NAME}"
"{NUM_ROWS}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it's cleaner to just put these on one line now that you don't have multiple lines.
e.g.

{
  "query": [ "NUM_ROWS_QUERY" ],
  "expected_result": { "num_rows": 5 }
},
{
  "query": [ "SUM_START_QUERY" ],
  "expected_result": { "sum_start": 55 }
},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!


def format_query(self, query):
# type: (List[str]) -> str
# type: (list[str]) -> str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Is the correct format list instead of List?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, List actually gives a warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashir2 FYI, looks like we should be using list instead of List

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really; I mean, I think IntelliJ is smart enough to understand list but the right choice is List according to PEP 484, AFAIK. The reason that you have been getting warnings for List is because you have not imported it from the typing module, check processed_variant.py for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I will correct it.


def format_query(self, query):
# type: (List[str]) -> str
# type: (list[str]) -> str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really; I mean, I think IntelliJ is smart enough to understand list but the right choice is List according to PEP 484, AFAIK. The reason that you have been getting warnings for List is because you have not imported it from the typing module, check processed_variant.py for an example.

Formatting logic is as follows:
- Concatenates ``query`` parts into one string.
- Replaces macro NUM_ROWS_QUERY/SUM_START_QUERY/SUM_END_QUERY with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider not listing the members of _QueryMacros here such that adding a new query or changing the name of one in _QueryMacros does not an update of this documentation (which will probably not happen making the documentation obsolete in future).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@allieychen
Copy link
Contributor Author

PTAL.

Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks! Just some minor nits.


class _QueryMacros(enum.Enum):
NUM_ROWS_QUERY = 'SELECT COUNT(0) AS num_rows FROM {TABLE_NAME}'
SUM_START_QUERY = ('SELECT SUM(start_position) AS sum_start FROM {'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider the following format as it's more readable (in general, try to avoid breaking lines as much as possible).

   SUM_START_QUERY = (
       'SELECT SUM(start_position) AS sum_start FROM {TABLE_NAME}')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

- Replaces TABLE_NAME with the table associated for the query.
"""
return (' ').join(query).format(TABLE_NAME=self._table_name)
return self._replace_variables(self._replace_macros((' ').join(query)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do you need the brackets around ' '? i.e. does ..._macros(' '.join(query))) work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""Formats a query.
Replaces keyword TABLE_NAME and eventually macros in the query.
Replaces macros and variable TABLE_NAME in the query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: by the same token as Bashir's comment, please remove 'table_name' from here as well. (just "Replaces macros and variables in the query").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@allieychen
Copy link
Contributor Author

Thanks. PTAL :)

Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@allieychen allieychen merged commit b8a6c8c into googlegenomics:master Mar 5, 2018
@allieychen allieychen deleted the query-macros branch March 5, 2018 18:26
nmousavi pushed a commit to nmousavi/gcp-variant-transforms that referenced this pull request Mar 9, 2018
* Add multiple query support to integration tests googlegenomics#120.
update all tests in integration small_tests to support multiple query.
update the validate_table in run_tests, add a loop for all test cases in the test file.
changed the required keys for the .json file. Remove "validation_query" and "expected_query_result", and add "test_cases".
Ran ./deploy_and_run_tests.sh and all integration tests passed.

Update the development guide doc (googlegenomics#124)

Update the development guide doc.

Add IntelliJ IDE setup.
Add more details.

Added an INFO message for the full command.

Tested: Ran manually and checked the new log message.

Uses the macros to replace the common queries. (googlegenomics#127)

Define NUM_ROWS, SUM_START, SUM_END in QueryFormatter, and replaces them in the query to avoid duplicate code.

TESTED:
deploy_and_run_tests.

Define SchemaDescriptor class.

Provides serialization and lookup API for type/mode of schema fields.

Tested:
  unit test
mhsaul pushed a commit to mhsaul/gcp-variant-transforms that referenced this pull request Mar 29, 2018
Define NUM_ROWS, SUM_START, SUM_END in QueryFormatter, and replaces them in the query to avoid duplicate code.

TESTED:
deploy_and_run_tests.
mhsaul pushed a commit to mhsaul/gcp-variant-transforms that referenced this pull request Mar 29, 2018
Define NUM_ROWS, SUM_START, SUM_END in QueryFormatter, and replaces them in the query to avoid duplicate code.

TESTED:
deploy_and_run_tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants