Only add genomes to final RKI FASTA and report.csv that meet QC criteria #31

hoelzer · 2021-01-22T18:06:14Z

Question/Thought

I think currently in #28 all final consensus seqs are added to the all_genomes.fasta and rki_report.csv?

Whereas this is fine (because then all information is in one place) we could also think of just writing sequences to these summary files that also meet the consensus QC. Otherwise, they might be anyway rejected when submitted to RKI.

However, people might want to work also with sequences that don't meet the QC criteria internally and would otherwise miss them when they are not part of the summary files?

And: when QC thresholds change, this must be also reflected in poreCov to not reject sequences that might actually pass the later QC

The text was updated successfully, but these errors were encountered:

replikation · 2021-01-22T18:18:02Z

oh good point. forgot about that. are the rki criteria correct now via president? or can I take straight from the "valid" column?

replikation · 2021-01-22T18:18:54Z

need to extract the "valid" sequence name. i think this can also be done based on channels

hoelzer · 2021-01-22T18:25:54Z

Should be fine for now, but might be subject to changes over the next days/weeks :)

Maybe you can write to the summary rki folder:

genomes_valid.fasta
genomes_invalid.fasta
rki_valid_report.csv (that only contains then the IDs of the valid genomes?)

By that, it should be also clear to the user what he can expect in every file?

replikation · 2021-01-22T20:44:14Z

good idea. yeah i think ill add that tomorrow or so

replikation · 2021-01-23T12:53:05Z

rki/
    valid/
        genomes_valid.fasta
        rki_valid_report.csv
    invalid/
        genomes_invalid.fasta
        rki_invalid_report.csv

hoelzer · 2021-01-27T17:05:10Z

@replikation is there still an output multi FASTA file that has ALL sequences? Actually, we would need that as well for downstream processing. However, for other uses the direct split into valid and invalid is good so what I can also do is:

cat rki/valid/genomes_valid.fasta rki/invalid/genomes_invalid.fasta > all_fasta

for my downstream tasks

hoelzer added the question Further information is requested label Jan 22, 2021

replikation self-assigned this Jan 22, 2021

replikation added the bug Something isn't working label Jan 22, 2021

replikation linked a pull request Jan 27, 2021 that will close this issue

Dev replikation #32

Merged

2 tasks

replikation closed this as completed in #32 Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only add genomes to final RKI FASTA and report.csv that meet QC criteria #31

Only add genomes to final RKI FASTA and report.csv that meet QC criteria #31

hoelzer commented Jan 22, 2021

replikation commented Jan 22, 2021

replikation commented Jan 22, 2021

hoelzer commented Jan 22, 2021

replikation commented Jan 22, 2021

replikation commented Jan 23, 2021

hoelzer commented Jan 27, 2021

Only add genomes to final RKI FASTA and report.csv that meet QC criteria #31

Only add genomes to final RKI FASTA and report.csv that meet QC criteria #31

Comments

hoelzer commented Jan 22, 2021

replikation commented Jan 22, 2021

replikation commented Jan 22, 2021

hoelzer commented Jan 22, 2021

replikation commented Jan 22, 2021

replikation commented Jan 23, 2021

hoelzer commented Jan 27, 2021