-
Notifications
You must be signed in to change notification settings - Fork 59
Add integration tests for some options #106 #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5f54d7d
704a0a0
ad7e1e9
078a5f8
4868961
d4fcdc6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| This file summarizes the contents and the purpose for each files/folder within | ||
| current folder. | ||
|
|
||
| `valid-4.0.vcf`, `valid-4.0.vcf.gz`, `valid-4.0.vcf.bz2` are used to test | ||
| Variant Call Format version 4.0 files in the form of uncompressed, gzip and | ||
| bzip, respectively. For more details on the VCF format version specifications, | ||
| please refer to [VCF Specification](https://samtools.github.io/hts-specs/). | ||
|
|
||
| `valid-4.1-large.vcf`, `valid-4.1-large.vcf.gz` are used to test version 4.1 | ||
| uncompressed, gzip VCF file, respectively. | ||
|
|
||
| `valid-4.2.vcf`, `valid-4.2.vcf.gz` are used to test version 4.2 uncompressed, | ||
| gzip VCF file, respectively. | ||
|
|
||
| `invalid-4.0-AF-field-removed.vcf` is created by removing `AF` field definition | ||
| from the meta-information based on `valid-4.0.vcf`. It is used to test `AF` | ||
| field can be parsed correctly given a representative_header_file containing | ||
| `AF`. | ||
|
|
||
| `invalid-4.0-POS-empty.vcf` is created based on `valid-4.0.vcf` by removing the | ||
| POS value for the first entry. It is used to test when `allow_malformed_records` | ||
| is enabled, failed VCF record reads will not raise errors and the BigQuery table | ||
| can still be generated. | ||
|
|
||
| The folder `merge` is created to test the merge options. Three .vcf files are | ||
| created. `merge1.vcf` contains two samples, while `merge2.vcf` and `merge3.vcf` | ||
| contain one other sample, respectively. When MERGE_TO_CALLS is selected, the | ||
| variant call with `POS = 14370` is meant to merge across three files, while the | ||
| call with `POS = 1234567` is designed to be merged for `merge1.vcf` and | ||
| `merge2.vcf`. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| ##fileformat=VCFv4.0 | ||
| ##fileDate=20090805 | ||
| ##source=myImputationProgramV3.1 | ||
| ##reference=1000GenomesPilot-NCBI36 | ||
| ##phasing=partial | ||
| ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> | ||
| ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> | ||
| ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> | ||
| ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> | ||
| ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> | ||
| ##FILTER=<ID=q10,Description="Quality below 10"> | ||
| ##FILTER=<ID=s50,Description="Less than 50% of samples have data"> | ||
| ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
| ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> | ||
| ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> | ||
| ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> | ||
| #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 | ||
| 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. | ||
| 20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3 | ||
| 20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4 | ||
| 20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2 | ||
| 19 1234567 microsat1 GTCT G,GTACT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| ##fileformat=VCFv4.0 | ||
| ##fileDate=20090805 | ||
| ##source=myImputationProgramV3.1 | ||
| ##reference=1000GenomesPilot-NCBI36 | ||
| ##phasing=partial | ||
| ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> | ||
| ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> | ||
| ##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency"> | ||
| ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> | ||
| ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> | ||
| ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> | ||
| ##FILTER=<ID=q10,Description="Quality below 10"> | ||
| ##FILTER=<ID=s50,Description="Less than 50% of samples have data"> | ||
| ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
| ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> | ||
| ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> | ||
| ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> | ||
| #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 | ||
| 20 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. | ||
| 20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3 | ||
| 20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4 | ||
| 20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2 | ||
| 19 1234567 microsat1 GTCT G,GTACT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| ##fileformat=VCFv4.0 | ||
| ##fileDate=20090805 | ||
| ##source=myImputationProgramV3.1 | ||
| ##reference=1000GenomesPilot-NCBI36 | ||
| ##phasing=partial | ||
| ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> | ||
| ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> | ||
| ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"> | ||
| ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> | ||
| ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> | ||
| ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> | ||
| ##FILTER=<ID=q10,Description="Quality below 10"> | ||
| ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
| ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> | ||
| ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> | ||
| ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> | ||
| #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 | ||
| 20 14370 rs6054257 G A 10 q10 NS=2;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 | ||
| 20 17290 . T A 3 q10 NS=2;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 | ||
| 19 1234567 microsat1 GTCT G,GTACT 50 PASS NS=2;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| ##fileformat=VCFv4.0 | ||
| ##fileDate=20090805 | ||
| ##source=myImputationProgramV3.1 | ||
| ##reference=1000GenomesPilot-NCBI36 | ||
| ##phasing=partial | ||
| ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> | ||
| ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> | ||
| ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"> | ||
| ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> | ||
| ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> | ||
| ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> | ||
| ##FILTER=<ID=q10,Description="Quality below 10"> | ||
| ##FILTER=<ID=s50,Description="Less than 50% of samples have data"> | ||
| ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
| ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> | ||
| ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> | ||
| ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> | ||
| #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00003 | ||
| 20 14370 rs6054257 G A 29 PASS NS=1;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 1/1:43:5:.,. | ||
| 20 17330 . T A 3 q10 NS=1;DP=11;AF=0.017 GT:GQ:DP:HQ 0/0:41:3 | ||
| 19 1234567 microsat2 GTCT G,GTACT 50 PASS NS=1;DP=9;AA=G GT:GQ:DP 1/1:40:3 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| ##fileformat=VCFv4.0 | ||
| ##fileDate=20090805 | ||
| ##source=myImputationProgramV3.1 | ||
| ##reference=1000GenomesPilot-NCBI36 | ||
| ##phasing=partial | ||
| ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> | ||
| ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> | ||
| ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"> | ||
| ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> | ||
| ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> | ||
| ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> | ||
| ##FILTER=<ID=q10,Description="Quality below 10"> | ||
| ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
| ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> | ||
| ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> | ||
| ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> | ||
| #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00004 | ||
| 20 14370 rs6054257 G A 30 PASS NS=1;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| { | ||
| "test_name": "merge-option-copy-filter-to-calls", | ||
| "table_name": "merge_option_copy_filter_to_calls", | ||
| "input_pattern": "gs://gcp-variant-transforms-testfiles/small_tests/merge/*.vcf", | ||
| "variant_merge_strategy": "MOVE_TO_CALLS", | ||
| "copy_filter_to_calls": true, | ||
| "runner": "DataflowRunner", | ||
| "assertion_configs": [ | ||
| { | ||
| "query": ["NUM_ROWS_QUERY"], | ||
| "expected_result": {"num_rows": 4} | ||
| }, | ||
| { | ||
| "query": ["SUM_START_QUERY"], | ||
| "expected_result": {"sum_start": 1283553} | ||
| }, | ||
| { | ||
| "query": ["SUM_END_QUERY"], | ||
| "expected_result": {"sum_end": 1283560} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT COUNT(0) AS num_rows ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00001' ", | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider verifying position 1234567 as well since you expect different number of calls.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not trying to test the number of calls. Instead, what I try to test is that we have a column call.filter in the BQ table when Do you mean add those test cases for
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for clarifying, I misunderstood what you are trying to do; and yes, both for this comment and the next one, I was thinking about testing
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. I have added a test case for |
||
| "AND 'q10' IN UNNEST (call.filter)" | ||
| ], | ||
| "expected_result": {"num_rows": 1} | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of having 4 queries returning "1" each, what do you think about having a single query and instead of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For one thing, I think it is expected to have only one row in the query result. Another concern I have is that if we use |
||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT COUNT(0) AS num_rows ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00002' ", | ||
| "AND 'q10' IN UNNEST (call.filter)" | ||
| ], | ||
| "expected_result": {"num_rows": 1} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT COUNT(0) AS num_rows ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00003' ", | ||
| "AND 'PASS' IN UNNEST (call.filter)" | ||
| ], | ||
| "expected_result": {"num_rows": 1} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT COUNT(0) AS num_rows ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00004' ", | ||
| "AND 'PASS' IN UNNEST (call.filter)" | ||
| ], | ||
| "expected_result": {"num_rows": 1} | ||
| } | ||
| ] | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| { | ||
| "test_name": "merge-option-copy-quality-to-calls", | ||
| "table_name": "merge_option_copy_quality_to_calls", | ||
| "input_pattern": "gs://gcp-variant-transforms-testfiles/small_tests/merge/*.vcf", | ||
| "variant_merge_strategy": "MOVE_TO_CALLS", | ||
| "copy_quality_to_calls": true, | ||
| "runner": "DataflowRunner", | ||
| "assertion_configs": [ | ||
| { | ||
| "query": ["NUM_ROWS_QUERY"], | ||
| "expected_result": {"num_rows": 4} | ||
| }, | ||
| { | ||
| "query": ["SUM_START_QUERY"], | ||
| "expected_result": {"sum_start": 1283553} | ||
| }, | ||
| { | ||
| "query": ["SUM_END_QUERY"], | ||
| "expected_result": {"sum_end": 1283560} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT call.quality AS quality ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00001'" | ||
| ], | ||
| "expected_result": {"quality": 10.0} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT call.quality AS quality ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00002'" | ||
| ], | ||
| "expected_result": {"quality": 10.0} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT call.quality AS quality ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00003'" | ||
| ], | ||
| "expected_result": {"quality": 29.0} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT call.quality AS quality ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00004'" | ||
| ], | ||
| "expected_result": {"quality": 30.0} | ||
| } | ||
| ] | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| { | ||
| "test_name": "merge-option-info-keys-to-move-to-calls-regex", | ||
| "table_name": "merge_option_info_keys_to_move_to_calls_regex", | ||
| "input_pattern": "gs://gcp-variant-transforms-testfiles/small_tests/merge/*.vcf", | ||
| "variant_merge_strategy": "MOVE_TO_CALLS", | ||
| "info_keys_to_move_to_calls_regex": "^NS$", | ||
| "runner": "DataflowRunner", | ||
| "assertion_configs": [ | ||
| { | ||
| "query": ["NUM_ROWS_QUERY"], | ||
| "expected_result": {"num_rows": 4} | ||
| }, | ||
| { | ||
| "query": ["SUM_START_QUERY"], | ||
| "expected_result": {"sum_start": 1283553} | ||
| }, | ||
| { | ||
| "query": ["SUM_END_QUERY"], | ||
| "expected_result": {"sum_end": 1283560} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT call.NS AS NS ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00001'" | ||
| ], | ||
| "expected_result": {"NS": 2} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT call.NS AS NS ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00002'" | ||
| ], | ||
| "expected_result": {"NS": 2} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT call.NS AS NS ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00003'" | ||
| ], | ||
| "expected_result": {"NS": 1} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT call.NS AS NS ", | ||
| "FROM {TABLE_NAME} AS t, t.call as call ", | ||
| "WHERE start_position = 14369 AND call.name ='NA00004'" | ||
| ], | ||
| "expected_result": {"NS": 1} | ||
| } | ||
| ] | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| { | ||
| "test_name": "merge-option-move-to-calls", | ||
| "table_name": "merge_option_move_to_calls", | ||
| "input_pattern": "gs://gcp-variant-transforms-testfiles/small_tests/merge/*.vcf", | ||
| "runner": "DataflowRunner", | ||
| "variant_merge_strategy": "MOVE_TO_CALLS", | ||
| "assertion_configs": [ | ||
| { | ||
| "query": ["NUM_ROWS_QUERY"], | ||
| "expected_result": {"num_rows": 4} | ||
| }, | ||
| { | ||
| "query": ["SUM_START_QUERY"], | ||
| "expected_result": {"sum_start": 1283553} | ||
| }, | ||
| { | ||
| "query": ["SUM_END_QUERY"], | ||
| "expected_result": {"sum_end": 1283560} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT COUNT(0) AS num_rows FROM {TABLE_NAME} ", | ||
| "WHERE start_position = 14369" | ||
| ], | ||
| "expected_result": {"num_rows": 1} | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the same
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct, they are the same. I meant that in combination with what you already have, i.e., first you are counting that there is only one row in the table with Not a big deal either way, so please feel free to submit as is, if you prefer.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it! Thanks for the details. |
||
| } | ||
| ] | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| { | ||
| "test_name": "merge-option-none", | ||
| "table_name": "merge_option_none", | ||
| "input_pattern": "gs://gcp-variant-transforms-testfiles/small_tests/merge/*.vcf", | ||
| "runner": "DataflowRunner", | ||
| "assertion_configs": [ | ||
| { | ||
| "query": ["NUM_ROWS_QUERY"], | ||
| "expected_result": {"num_rows": 7} | ||
| }, | ||
| { | ||
| "query": ["SUM_START_QUERY"], | ||
| "expected_result": {"sum_start": 2546857} | ||
| }, | ||
| { | ||
| "query": ["SUM_END_QUERY"], | ||
| "expected_result": {"sum_end": 2546870} | ||
| }, | ||
| { | ||
| "query": [ | ||
| "SELECT COUNT(0) AS num_rows FROM {TABLE_NAME} ", | ||
| "WHERE start_position = 14369" | ||
| ], | ||
| "expected_result": {"num_rows": 3} | ||
| } | ||
| ] | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| { | ||
| "test_name": "option-allow-malformed-records", | ||
| "table_name": "option_allow_malformed_records", | ||
| "input_pattern": "gs://gcp-variant-transforms-testfiles/small_tests/invalid-4.0-POS-empty.vcf", | ||
| "allow_malformed_records": true, | ||
| "runner": "DataflowRunner", | ||
| "assertion_configs": [ | ||
| { | ||
| "query": ["NUM_ROWS_QUERY"], | ||
| "expected_result": {"num_rows": 4} | ||
| }, | ||
| { | ||
| "query": ["SUM_START_QUERY"], | ||
| "expected_result": {"sum_start": 3592826} | ||
| }, | ||
| { | ||
| "query": ["SUM_END_QUERY"], | ||
| "expected_result": {"sum_end": 3592833} | ||
| } | ||
| ] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "current folder", does it mean that you intend to copy valid-4.* files here as well? They are not copied in this PR, are they?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new .vcf files are added in the folder
testing\data.vcf\, which already inlcudes valid-4.* files.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, that's my bad, I should have copied the valid-4.2_VEP.vcf test file to here as well.