Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only add genomes to final RKI FASTA and report.csv that meet QC criteria #31

Closed
hoelzer opened this issue Jan 22, 2021 · 6 comments · Fixed by #32
Closed

Only add genomes to final RKI FASTA and report.csv that meet QC criteria #31

hoelzer opened this issue Jan 22, 2021 · 6 comments · Fixed by #32
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@hoelzer
Copy link
Collaborator

hoelzer commented Jan 22, 2021

Question/Thought

I think currently in #28 all final consensus seqs are added to the all_genomes.fasta and rki_report.csv?

Whereas this is fine (because then all information is in one place) we could also think of just writing sequences to these summary files that also meet the consensus QC. Otherwise, they might be anyway rejected when submitted to RKI.

However, people might want to work also with sequences that don't meet the QC criteria internally and would otherwise miss them when they are not part of the summary files?

And: when QC thresholds change, this must be also reflected in poreCov to not reject sequences that might actually pass the later QC

@hoelzer hoelzer added the question Further information is requested label Jan 22, 2021
@replikation
Copy link
Owner

  • oh good point. forgot about that. are the rki criteria correct now via president? or can I take straight from the "valid" column?

@replikation replikation self-assigned this Jan 22, 2021
@replikation replikation added the bug Something isn't working label Jan 22, 2021
@replikation
Copy link
Owner

need to extract the "valid" sequence name. i think this can also be done based on channels

@hoelzer
Copy link
Collaborator Author

hoelzer commented Jan 22, 2021

Should be fine for now, but might be subject to changes over the next days/weeks :)

Maybe you can write to the summary rki folder:

  • genomes_valid.fasta
  • genomes_invalid.fasta
  • rki_valid_report.csv (that only contains then the IDs of the valid genomes?)

By that, it should be also clear to the user what he can expect in every file?

@replikation
Copy link
Owner

good idea. yeah i think ill add that tomorrow or so

@replikation
Copy link
Owner

rki/
    valid/
        genomes_valid.fasta
        rki_valid_report.csv
    invalid/
        genomes_invalid.fasta
        rki_invalid_report.csv

@replikation replikation linked a pull request Jan 27, 2021 that will close this issue
2 tasks
@hoelzer
Copy link
Collaborator Author

hoelzer commented Jan 27, 2021

@replikation is there still an output multi FASTA file that has ALL sequences? Actually, we would need that as well for downstream processing. However, for other uses the direct split into valid and invalid is good so what I can also do is:

cat rki/valid/genomes_valid.fasta rki/invalid/genomes_invalid.fasta > all_fasta 

for my downstream tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants