Skip to content

pyard-reduce-csv doesn't validate #339

@mmaiers-nmdp

Description

@mmaiers-nmdp

Here is an test file with one row of data where the 2nd DQB1 type is a floating point number.

pyard-reduce-csv does not recognize this as an error and even puts the float into the glstring!

input data

Note the second DQB1 typ is a floating point number

$ cat data/test.csv
id,self_report_race,self_report_ethnicity,genomic_ancestry,R_A_TYP1,R_A_TYP2,R_B_TYP1,R_B_TYP2,R_C_TYP1,R_C_TYP2,R_DPB1_TYP1,R_DPB1_TYP2,R_DQB1_TYP1,R_DQB1_TYP2,R_DRB1_TYP1,R_DRB1_TYP2,R_DRBX_1
16,WHITE,NOT HISPANIC,EUR,02:01,02:01,13:02,50:01:00,06:02,06:02,04:02,02:01,02:02,0.478472222,07:01,13:01,DRB3*01:01

pyard-reduce-csv

It runs with "no errors"

$ ~/src/git/py-ard/scripts/pyard-reduce-csv -c conf/test.json
Using config file: conf/test.json
Column:R_A_TYP1 =>
	A*02:01 => A*02:01
Column:R_A_TYP2 =>
	A*02:01 => A*02:01
Column:R_B_TYP1 =>
	B*13:02 => B*13:02
Column:R_B_TYP2 =>
	B*50:01:00 => B*50:01
Column:R_C_TYP1 =>
	C*06:02 => C*06:02
Column:R_C_TYP2 =>
	C*06:02 => C*06:02
Column:R_DRB1_TYP1 =>
	DRB1*07:01 => DRB1*07:01
Column:R_DRB1_TYP2 =>
	DRB1*13:01 => DRB1*13:01
Column:R_DQB1_TYP1 =>
	DQB1*02:02 => DQB1*02:01
Column:R_DQB1_TYP2 =>
No Errors
Saved result to file:data/test.gl.csv.gz

DQB1_TYP2 is silently output as empty.

output

But in the column and in the glstring we find DQB10.478472222

$ gzcat data/test.gl.csv.gz
id,R_A_TYP1,R_A_TYP2,R_B_TYP1,R_B_TYP2,R_C_TYP1,R_C_TYP2,R_DQB1_TYP1,R_DQB1_TYP2,R_DRB1_TYP1,R_DRB1_TYP2,patient_gl
16,A*02:01,A*02:01,B*13:02,B*50:01,C*06:02,C*06:02,DQB1*02:01,DQB10.478472222,DRB1*07:01,DRB1*13:01,A*02:01+A*02:01^B*13:02+B*50:01^C*06:02+C*06:02^DRB1*07:01+DRB1*13:01^DQB1*02:01+DQB10.478472222

config file

$ cat conf/test.json 
{
  "in_csv_filename": "data/test.csv",
  "out_csv_filename": "data/test.gl.csv",
  "columns_from_csv": [
    "id",
    "R_A_TYP1",
    "R_A_TYP2",
    "R_B_TYP1",
    "R_B_TYP2",
    "R_C_TYP1",
    "R_C_TYP2",
    "R_DRB1_TYP1",
    "R_DRB1_TYP2",
    "R_DQB1_TYP1",
    "R_DQB1_TYP2"
  ],  
  "locus_column_mapping": {
    "patient": {
      "A": [
        "R_A_TYP1",
        "R_A_TYP2"
      ],
      "B": [
        "R_B_TYP1",
        "R_B_TYP2"
      ],
      "C": [
        "R_C_TYP1",
        "R_C_TYP2"
      ],
      "DRB1": [
        "R_DRB1_TYP1",
        "R_DRB1_TYP2"
      ],
      "DQB1": [
        "R_DQB1_TYP1",
        "R_DQB1_TYP2"
      ]
    }
  },
  "redux_type": "lgx",
  "reduce_serology": false,
  "reduce_v2": true,
  "convert_v2_to_v3": false,
  "reduce_2field": true,
  "reduce_3field": true,
  "reduce_P": true,
  "reduce_XX": true,
  "reduce_MAC": true,
  "locus_in_allele_name": false,
  "keep_locus_in_allele_name": true,
  "output_file_format": "csv",
  "new_column_for_redux": false,
  "map_drb345_to_drbx": false,
  "apply_compression": "gzip",
  "generate_glstring": true,
  "verbose_log": true
}

I also tried adding the locus to the HLA fields and got the same result for this input file:

id,self_report_race,self_report_ethnicity,genomic_ancestry,R_A_TYP1,R_A_TYP2,R_B_TYP1,R_B_TYP2,R_C_TYP1,R_C_TYP2,R_DPB1_TYP1,R_DPB1_TYP2,R_DQB1_TYP1,R_DQB1_TYP2,R_DRB1_TYP1,R_DRB1_TYP2,R_DRBX_1
16,WHITE,NOT HISPANIC,EUR,A*02:01,A*02:01,B*13:02,B*50:01:00,C*06:02,C*06:02,DPB1*04:02,DPB1*02:01,DQB1*02:02,DQB1*0.478472222,DRB1*07:01,DRB1*13:01,DRB3*01:01

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions