Skip to content

Metadata General Sanity Checks

lindaxiang edited this page Sep 8, 2022 · 4 revisions
Validation Script Validation Rule Note
c110_rg_id_uniqueness Each read_group has an unique submitter_read_group_id value
c120_permissible_char_in_rg_id submitter_read_group_id must conform to regex:^[a-zA-Z0-9\\-_:\\.]+$
c130_one_sample Metadata payload contains only one sample
c140_platform_unit_uniqueness Each read_group has an unique platform_unit value
c150_rg_count_match The number of read_groups equals read_group_count
c160_file_r1_r2_check Per read_group record has a value is_paired_end
- If is_paired_end==True: file_r1 and file_r2 must be populated
- If is_paired_end==True: only file_r1 must be populated
c170_fq_uniqueness_in_rgs Each FASTQ file type provided is uniquely named
c180_file_uniqueness Each entry in files section is uniquely named
c190_no_extra_files Each entry in the files section is mentioned in read_groups section and no additional files exist
c200_rg_id_in_bam_uniqueness Each read_group has an unique read_group_id_in_bam value
c210_no_path_in_filename Each files consists of only basename and no path
c220_no_rg_id_in_bam_for_fq When read_group consist of FASTQ files, read_group_id_in_bam is not populated
c230_files_info_data_category Verifies each entry of files has: 'info':{'data_category': 'Sequencing_Reads'}
c240_submitter_rg_id_collide_with_rg_id_in_bam When a read_group's read_group_id_in_bam is not provided, the corresponding submitter_read_group_id does not match other read_group_id_bams within the BAM file
c250_file_data_type Each entry of files contains 'dataType':'Submitted Reads'
c260_filename_pattern files are appropriately typed and conforms to the following regex ^[A-Za-z0-9]{1}[A-Za-z0-9_\.\-]*\.(bam|fq\.gz|fastq\.gz|fq\.bz2|fastq\.bz2)$