-
Notifications
You must be signed in to change notification settings - Fork 14
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Here is an test file with one row of data where the 2nd DQB1 type is a floating point number.
pyard-reduce-csv does not recognize this as an error and even puts the float into the glstring!
input data
Note the second DQB1 typ is a floating point number
$ cat data/test.csv
id,self_report_race,self_report_ethnicity,genomic_ancestry,R_A_TYP1,R_A_TYP2,R_B_TYP1,R_B_TYP2,R_C_TYP1,R_C_TYP2,R_DPB1_TYP1,R_DPB1_TYP2,R_DQB1_TYP1,R_DQB1_TYP2,R_DRB1_TYP1,R_DRB1_TYP2,R_DRBX_1
16,WHITE,NOT HISPANIC,EUR,02:01,02:01,13:02,50:01:00,06:02,06:02,04:02,02:01,02:02,0.478472222,07:01,13:01,DRB3*01:01
pyard-reduce-csv
It runs with "no errors"
$ ~/src/git/py-ard/scripts/pyard-reduce-csv -c conf/test.json
Using config file: conf/test.json
Column:R_A_TYP1 =>
A*02:01 => A*02:01
Column:R_A_TYP2 =>
A*02:01 => A*02:01
Column:R_B_TYP1 =>
B*13:02 => B*13:02
Column:R_B_TYP2 =>
B*50:01:00 => B*50:01
Column:R_C_TYP1 =>
C*06:02 => C*06:02
Column:R_C_TYP2 =>
C*06:02 => C*06:02
Column:R_DRB1_TYP1 =>
DRB1*07:01 => DRB1*07:01
Column:R_DRB1_TYP2 =>
DRB1*13:01 => DRB1*13:01
Column:R_DQB1_TYP1 =>
DQB1*02:02 => DQB1*02:01
Column:R_DQB1_TYP2 =>
No Errors
Saved result to file:data/test.gl.csv.gz
DQB1_TYP2 is silently output as empty.
output
But in the column and in the glstring we find DQB10.478472222
$ gzcat data/test.gl.csv.gz
id,R_A_TYP1,R_A_TYP2,R_B_TYP1,R_B_TYP2,R_C_TYP1,R_C_TYP2,R_DQB1_TYP1,R_DQB1_TYP2,R_DRB1_TYP1,R_DRB1_TYP2,patient_gl
16,A*02:01,A*02:01,B*13:02,B*50:01,C*06:02,C*06:02,DQB1*02:01,DQB10.478472222,DRB1*07:01,DRB1*13:01,A*02:01+A*02:01^B*13:02+B*50:01^C*06:02+C*06:02^DRB1*07:01+DRB1*13:01^DQB1*02:01+DQB10.478472222
config file
$ cat conf/test.json
{
"in_csv_filename": "data/test.csv",
"out_csv_filename": "data/test.gl.csv",
"columns_from_csv": [
"id",
"R_A_TYP1",
"R_A_TYP2",
"R_B_TYP1",
"R_B_TYP2",
"R_C_TYP1",
"R_C_TYP2",
"R_DRB1_TYP1",
"R_DRB1_TYP2",
"R_DQB1_TYP1",
"R_DQB1_TYP2"
],
"locus_column_mapping": {
"patient": {
"A": [
"R_A_TYP1",
"R_A_TYP2"
],
"B": [
"R_B_TYP1",
"R_B_TYP2"
],
"C": [
"R_C_TYP1",
"R_C_TYP2"
],
"DRB1": [
"R_DRB1_TYP1",
"R_DRB1_TYP2"
],
"DQB1": [
"R_DQB1_TYP1",
"R_DQB1_TYP2"
]
}
},
"redux_type": "lgx",
"reduce_serology": false,
"reduce_v2": true,
"convert_v2_to_v3": false,
"reduce_2field": true,
"reduce_3field": true,
"reduce_P": true,
"reduce_XX": true,
"reduce_MAC": true,
"locus_in_allele_name": false,
"keep_locus_in_allele_name": true,
"output_file_format": "csv",
"new_column_for_redux": false,
"map_drb345_to_drbx": false,
"apply_compression": "gzip",
"generate_glstring": true,
"verbose_log": true
}
I also tried adding the locus to the HLA fields and got the same result for this input file:
id,self_report_race,self_report_ethnicity,genomic_ancestry,R_A_TYP1,R_A_TYP2,R_B_TYP1,R_B_TYP2,R_C_TYP1,R_C_TYP2,R_DPB1_TYP1,R_DPB1_TYP2,R_DQB1_TYP1,R_DQB1_TYP2,R_DRB1_TYP1,R_DRB1_TYP2,R_DRBX_1
16,WHITE,NOT HISPANIC,EUR,A*02:01,A*02:01,B*13:02,B*50:01:00,C*06:02,C*06:02,DPB1*04:02,DPB1*02:01,DQB1*02:02,DQB1*0.478472222,DRB1*07:01,DRB1*13:01,DRB3*01:01
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working