Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with lower case nucleotides: document and potentially auto-convert #1341

Open
corneliusroemer opened this issue Nov 17, 2023 · 0 comments
Labels
proposal Proposals that warrant further discussion

Comments

@corneliusroemer
Copy link
Member

corneliusroemer commented Nov 17, 2023

Current Behavior

Our auspice.json schema expects nucleotides to be upper case and fails if mutations are lower case:

"description": "nucleotide mutations",
"type": "array",
"items": {
"type": "string",
"pattern": "^[ATCGNYRWSKMDVHB-][0-9]+[ATCGNYRWSKMDVHB-]$"
}

We don't seem to consistently validate this before export though and it can cause subtle bugs as our code often implicitly assumes nucleotides to be upper case.

It would be good to either validate this in various augur commands or do auto-conversion. I think auto-conversion is fine as long as it's documented.

I don't know exactly which tools use lower case but I have encountered lower case occasionally.

This is related to #1323 which does auto-conversion, but only in one place. There might be others that would benefit from auto-conversion.

Nextclade silently converts everything to upper case.

@corneliusroemer corneliusroemer added bug Something isn't working proposal Proposals that warrant further discussion and removed bug Something isn't working labels Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Proposals that warrant further discussion
Projects
None yet
Development

No branches or pull requests

1 participant