Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--fill-gaps eventually adds a furin cleavage site to everyone #6

Closed
babarlelephant opened this issue Feb 21, 2021 · 2 comments
Closed
Labels
enhancement New feature or request

Comments

@babarlelephant
Copy link

babarlelephant commented Feb 21, 2021

Hi, in augur align you are calling --fill-gaps which is just doing

 _seq = _seq.replace('-', 'N')

so that in the nextstrain tree everyone gets a FCS.

I don't have access to filtered_H5Nx_HA.fasta and metadata-with-clade_H5Nx_HA.tsv so I can't make a proposal of modification of the Snakemake with a python script in order to leave the FCS indels but if I did I would be happy to make some tries.

The only experiment I did was to download a few thousands of H5Nx sequences from genbank, to label with LPAI those with a deletion at the FCS, then to remove from the alignment the FCS region, to run mega, and I got that the LPAI sequences where quite clustered together.

@babarlelephant babarlelephant added the enhancement New feature or request label Feb 21, 2021
@jameshadfield
Copy link
Member

I think this is due to the default behavior of augur translate being to infer ambiguous nucleotides, and could be remedied by adding --keep-ambiguous to the ancestral rule.

@lmoncla
Copy link
Collaborator

lmoncla commented Nov 23, 2021

Thanks for flagging this! This is definitely an error caused by the combination of --fill-gaps in augur align and not having --keep-ambiguous in augur translate. This has been fixed with this commit, which does a few things:

  1. Removes --fill-gaps in augur align
  2. Adds --keep-ambiguous in augur translate
  3. Adds a new rule, rule cleavage-site, which reads in the HA alignment, translates it, and infers whether the 4 amino acids immediately preceding HA2 encode a furin cleavage motif. Here, I've defined a furin cleavage motif as the sequence R-X-K/R-R, where X is any amino acid. This results in 2 new colorby options: furin cleavage motif, which can be present or absent, and cleavage site sequence, which displays the sequence of the 4 amino acids immediately preceding HA2.

As expected, we now display that some high-path H5N1 and H5Nx lineages have furin cleavage sites, but that the vast majority of H9N2 and H7N9 sequences do not.

@lmoncla lmoncla closed this as completed Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants