Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove replacement of dashes in sample name with underscores #234

Closed
peterk87 opened this issue Aug 4, 2021 · 5 comments
Closed

Remove replacement of dashes in sample name with underscores #234

peterk87 opened this issue Aug 4, 2021 · 5 comments
Labels
enhancement Improvement for existing functionality
Milestone

Comments

@peterk87
Copy link

peterk87 commented Aug 4, 2021

Is your feature request related to a problem? Please describe

I'd like dashes to be preserved as dashes in sample names.

Describe the solution you'd like

Any of the following or similar solutions would be great:

  • Workflow param to skip replacing dashes
  • Sample name invalid characters regex workflow param (e.g. --sample_name_invalid_chars "\W")
  • Sample name "whitelist" regex of valid characters allowed workflow param (e.g. --sample_name_regex "[\w\-]+")

Describe alternatives you've considered

Commenting out the following code:

if sample.find("-") != -1:
print(
f"WARNING: Dashes have been replaced by underscores for sample: {sample}"
)
sample = sample.replace("-", "_")

@peterk87 peterk87 added the enhancement Improvement for existing functionality label Aug 4, 2021
@drpatelh
Copy link
Member

drpatelh commented Aug 5, 2021

Hi @peterk87. I had to add this sanity re-naming because QUAST does an internal conversion of dashes to underscores which breaks the collection and reporting of the summary metrics generated by MultiQC including consensus type metrics for the variant calling. This is why it made sense to do this renaming at the offset rather than getting weird discrepancies in sample names. It took me a while to track this down and provide a solution because it wasn't an obvious one.

Maybe we should push this issue upstream to QUAST and we can remove this conversion altogether? If you try to run one of the QUAST processes manually using dashes in the names you will see what I mean.

I am not really working at the mo and won't be until September so will try and catch up when I get a moment. PRs welcome too :)

@peterk87
Copy link
Author

peterk87 commented Aug 5, 2021

Thanks for clarifying @drpatelh!

I think pushing the issue upstream to QUAST is a great idea. I'll look into what QUAST is doing and see if the conversion can be made optional or modified.

@peterk87
Copy link
Author

peterk87 commented Aug 5, 2021

Issue created for QUAST: ablab/quast#179

@heuermh
Copy link
Contributor

heuermh commented Jul 7, 2022

Note this issue was fixed upstream in ablab/quast@3d32101 and released in version 5.2.0, which is available from bioconda/biocontainers https://quay.io/repository/biocontainers/quast?tab=tags

@drpatelh drpatelh added this to the 2.5 milestone Jul 8, 2022
@drpatelh drpatelh changed the title Option to disable replacement of sample name dashes with underscores or change sample name validation Remove replacement of dashes in sample name with underscores Jul 11, 2022
drpatelh added a commit to drpatelh/nf-core-viralrecon that referenced this issue Jul 11, 2022
@drpatelh
Copy link
Member

I have confirmed that this works as expected 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality
Projects
None yet
Development

No branches or pull requests

3 participants