-
Notifications
You must be signed in to change notification settings - Fork 59
Add integration tests for some options #106 #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration tests for some options #106 #130
Conversation
Pull Request Test Coverage Report for Build 447
💛 - Coveralls |
678c5a3 to
96f2563
Compare
1. reference_names: tested on valid-4.2.vcf. Used "20 Y" as the reference_names. 2. split_alternate_allele_info_fields: set this field to false and validate there is a column AF in BQ. 3. allow_malformed_records: created a new file "invalid-4.0-POS-empty" based on "valid-4.0.vcf" by removing the POS value for the first entry. 4. representative_header_file: created a new file "invalid-4.0-AF-field-removed.vcf" by removing "AF" field from "valid-4.0.vcf". 5. merge_option_none: created a new folder containing three new .vcf files for merge test. By default, there is no merge. Verified that it successfully read three files in the folder and form one BQ Table. 6. merge_option_move_to_calls: set variant_merge_stragegy to MOVE_TO_CALLS. Verified that the three files are merged on the keys. 7. info_keys_to_move_to_calls_regex: set this field to be "NS" and verified thatthe call.NS value is copied from each VCF file. 8. copy_quality_to_calls: set to true. Verified that call.quality is copied from each VCF file. 9. copy_filter_to_calls: set to true. Verified that call.filter is copied from each VCF file. All the created .vcf files are uploaded to the gcp-variant-transforms-test. Integration tests are passed.
96f2563 to
5f54d7d
Compare
arostamianfar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Allie! This looks great!
Could you please also add a short README.md file to the data/vcf folder that summarizes the contents and purpose of each file/folder?
| "table_name": "merge_option_info_keys_to_move_to_calls_regex", | ||
| "input_pattern": "gs://gcp-variant-transforms-testfiles/small_tests/merge/*.vcf", | ||
| "variant_merge_strategy": "MOVE_TO_CALLS", | ||
| "info_keys_to_move_to_calls_regex": "NS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should be ^NS$ as otherwise it would match anything that has "NS" as a substring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT call.NS AS num_samples ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please keep this as "NS" (num_samples can get confused with num_rows or other inferred results).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
018f6a0 to
704a0a0
Compare
|
Thanks for the comment. PTAL. |
arostamianfar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Allie! Looks good. I think the readme needs a bit more love, so I'll let Bashir make comments as necessary.
bashir2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these tests; some minor comments.
|
|
||
| `invalid-4.0-AF-field-removed.vcf` is created by removing `AF` field definition | ||
| from the meta-information based on `valid-4.0.vcf`. It is used to test `AF` | ||
| field can be parsed correctly given a representative_header_file contains `AF`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "contains" -> "that contains" OR "containing"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected.
|
|
||
| `invalid-4.0-POS-empty.vcf` is created based on `valid-4.0.vcf` by removing the | ||
| POS value for the first entry. It is used to test when `allow_malformed_records` | ||
| is enabled, failed VCF record reads will not raise errors and the BQ can still |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "BQ can still generate" -> "BigQuery table can still be generated"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected.
| current folder. | ||
|
|
||
| `valid-4.0.vcf`, `valid-4.0.vcf.gz`, `valid-4.0.vcf.bz2` are used to test | ||
| version 4.0 uncompressed, gzip, bzip VCF file, respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify what do you mean by ver. 4.0, i.e., the VCF format version number. Maybe it is good to have a link to this page too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| @@ -0,0 +1,27 @@ | |||
| This file summarizes the contents and the purpose for each files/folder within | |||
| current folder. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "current folder", does it mean that you intend to copy valid-4.* files here as well? They are not copied in this PR, are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new .vcf files are added in the folder testing\data.vcf\, which already inlcudes valid-4.* files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, that's my bad, I should have copied the valid-4.2_VEP.vcf test file to here as well.
| The folder `merge` is created to test the merge options. Three .vcf files are | ||
| created. `merge1.vcf` contains two samples, while `merge2.vcf` and `merge3.vcf` | ||
| contain one other sample, respectively. When MERGE_TO_CALLS is selected, the | ||
| variant call where `POS = 14370` is meant to merge across three files, while the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "where" -> "with" (also in the next line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Corrected.
| "WHERE start_position = 14369 AND call.name ='NA00001' ", | ||
| "AND 'q10' IN UNNEST (call.filter)" | ||
| ], | ||
| "expected_result": {"num_rows": 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having 4 queries returning "1" each, what do you think about having a single query and instead of COUNT, selects call.name with an ORDER BY at the end (to make the output unique); for the expected_results you then verity the list of call names, i.e., "NA*"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For one thing, I think it is expected to have only one row in the query result. Another concern I have is that if we use call.name instead of some queries based on call.filter, the test will pass with/without --copy_filter_to_calls.
| "query": [ | ||
| "SELECT COUNT(0) AS num_rows ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00001' ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider verifying position 1234567 as well since you expect different number of calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not trying to test the number of calls. Instead, what I try to test is that we have a column call.filter in the BQ table when copy_filter_to_calls is set to true, and the value is copied from the original vcf file. Am I in the wrong direction?
Do you mean add those test cases for move_to_calls test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, I misunderstood what you are trying to do; and yes, both for this comment and the next one, I was thinking about testing MOVE_TO_CALLS merge strategy and checking that calls are merged. Up to you whether you want to test call names in that test or not, I am okay either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I have added a test case for merge_option_move _to_calls to check that the calls are merged for one specific position. Meanwhile, I added a similar test case in merge_option_none to validate that the calls are not merged.
bashir2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| @@ -0,0 +1,27 @@ | |||
| This file summarizes the contents and the purpose for each files/folder within | |||
| current folder. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, that's my bad, I should have copied the valid-4.2_VEP.vcf test file to here as well.
| "query": [ | ||
| "SELECT COUNT(0) AS num_rows ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00001' ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, I misunderstood what you are trying to do; and yes, both for this comment and the next one, I was thinking about testing MOVE_TO_CALLS merge strategy and checking that calls are merged. Up to you whether you want to test call names in that test or not, I am okay either way.
|
BTW, consider referencing Issue #106 in the PR description to indicate under that issue that this PR is for fixing that. |
6600826 to
078a5f8
Compare
|
PTAL :) |
| "SELECT COUNT(0) AS num_rows FROM {TABLE_NAME} ", | ||
| "WHERE start_position = 14369" | ||
| ], | ||
| "expected_result": {"num_rows": 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the same start_position, you can also count number of calls and verify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean SELECT COUNT(0) AS num_rows FROM {TABLE_NAME} AS t, t.call AS call WHERE start_position = 14369?
In this case, there is no difference with or without merge, i.e, the expected rows will be 3 for both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, they are the same. I meant that in combination with what you already have, i.e., first you are counting that there is only one row in the table with start_position = 14369 and then you verify that on that single row, there are three calls.
Not a big deal either way, so please feel free to submit as is, if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Thanks for the details.
| "SELECT COUNT(0) AS num_rows FROM {TABLE_NAME} ", | ||
| "WHERE start_position = 14369" | ||
| ], | ||
| "expected_result": {"num_rows": 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, they are the same. I meant that in combination with what you already have, i.e., first you are counting that there is only one row in the table with start_position = 14369 and then you verify that on that single row, there are three calls.
Not a big deal either way, so please feel free to submit as is, if you prefer.
…mics#130) Add integration test for the following options: - reference_names: tested on valid-4.2.vcf. Used "20 Y" as the reference_names. - split_alternate_allele_info_fields: set this field to false and validate there is a column AF in BQ. - allow_malformed_records: created a new file "invalid-4.0-POS-empty" based on "valid-4.0.vcf" by removing the POS value for the first entry. - representative_header_file: created a new file "invalid-4.0-AF-field-removed.vcf" by removing "AF" field from "valid-4.0.vcf". - merge_option_none: created a new folder containing three new .vcf files for merge test. By default, there is no merge. Verified that it successfully read three files in the folder and form one BQ Table. - merge_option_move_to_calls: set variant_merge_stragegy to MOVE_TO_CALLS. Verified that the three files are merged on the keys. - info_keys_to_move_to_calls_regex: set this field to be "NS" and verified thatthe call.NS value is copied from each VCF file. - copy_quality_to_calls: set to true. Verified that call.quality is copied from each VCF file. - copy_filter_to_calls: set to true. Verified that call.filter is copied from each VCF file. - All the created .vcf files are uploaded to the gcp-variant-transforms-test. - Integration tests are passed. Issue: googlegenomics#106
…mics#130) Add integration test for the following options: - reference_names: tested on valid-4.2.vcf. Used "20 Y" as the reference_names. - split_alternate_allele_info_fields: set this field to false and validate there is a column AF in BQ. - allow_malformed_records: created a new file "invalid-4.0-POS-empty" based on "valid-4.0.vcf" by removing the POS value for the first entry. - representative_header_file: created a new file "invalid-4.0-AF-field-removed.vcf" by removing "AF" field from "valid-4.0.vcf". - merge_option_none: created a new folder containing three new .vcf files for merge test. By default, there is no merge. Verified that it successfully read three files in the folder and form one BQ Table. - merge_option_move_to_calls: set variant_merge_stragegy to MOVE_TO_CALLS. Verified that the three files are merged on the keys. - info_keys_to_move_to_calls_regex: set this field to be "NS" and verified thatthe call.NS value is copied from each VCF file. - copy_quality_to_calls: set to true. Verified that call.quality is copied from each VCF file. - copy_filter_to_calls: set to true. Verified that call.filter is copied from each VCF file. - All the created .vcf files are uploaded to the gcp-variant-transforms-test. - Integration tests are passed. Issue: googlegenomics#106
Add integration test for the following options:
Issue: #106