Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow deletions and insertions in virus properties input files #1126

Open
ammaraziz opened this issue Mar 16, 2023 · 2 comments
Open

Allow deletions and insertions in virus properties input files #1126

ammaraziz opened this issue Mar 16, 2023 · 2 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment t:feat Type: request of a new feature, functionality, enchancement

Comments

@ammaraziz
Copy link

Virus properties only supports labeling of substitutions. There are mutations of interest, such as antiviral resistance, Ngene, which are deletions.

My usecase for virus properties might be outside the scope of what was envisioned. Essentially I have a set of genetic mutations (snps, indels) that are linked to some phenotype (antiviral resistance, Ngene mutations affecting RAT tests) that I am interested in. I would like to use Nextclade to identify these mutations.

Thanks!

@ammaraziz ammaraziz added good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment t:feat Type: request of a new feature, functionality, enchancement labels Mar 16, 2023
@corneliusroemer
Copy link
Member

Thanks for the suggestion. That's a reasonable extension. One limitation we have at the moment is that we don't have a "private deletions" feature yet.

Labeled mutations are a subset of private mutations, so to keep with the logic we'd first need to support private deletions. Not at all unreasonable and actually something I've been thinking about as well.

One challenge with deletions is that if you keep them as single bases they can get overwhelmingly many. If you make them ranges, single indel difference artefacts can make a large difference.

I'm leaning towards using ranges nonetheless.

To be symmetric, it would make sense to add private insertions.

@corneliusroemer
Copy link
Member

Having read your ubio post @ammaraziz, I think you are suggesting two things:

  1. Extend concept of labeled private mutations (substitutions) to labeled private indels. For this we first need to introduce the concept/feature of private indels. Then split the private indels into "reversions, labeled, neither of the two" just as we currently do with substitutions.

  2. Extend the concept of labeled mutations beyond private mutations (private mutations are those mutations that differ wrt nearest neighbour sequence on reference tree). This makes sense for things where you are looking at broader trends and are not so concerned with the individual sequence and whether it is good quality or not (private mutations originated as QC metric, and the labeled/reversion feature is an extension, still coming from the QC perspective).

We already calculate some custom global (non-private only) metrics like:

Reporting presence of certain mutations (irrespective of whether they are private or not) for antiviral resistance, RAT escape etc would make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment t:feat Type: request of a new feature, functionality, enchancement
Projects
No open projects
Development

No branches or pull requests

2 participants