chronic_infection_python

Overview

This application was developed by the Computational Analysis, Modelling and Evolutionary Outcomes (CAMEO) pillar of Canada's Coronavirus Variants Rapid Response Network (CoVaRR-Net). Data analysis, code and maintenance of the application are conducted by Erin E. Gill, Fiona S.L. Brinkman, and Sarah Otto.

Given a user-provided set of SARS-CoV-2 nucleotide mutations, this application compares the probability of generating this set from the following three distributions:

Mutations observed during the first nine months of the pandemic (pre-VoC) (global pre-VoC distribution)
Mutations observed during the Omicron era (global Omicron distribution)
Mutations observed in chronic infections (chronic distribution)
Mutations observed in zoonotic spillovers from humans to white-tailed deer (deer distribution) In addition, the application will inform the user if the mutation pattern is:
Consistent with molnupiravir use (via examination of the transition:transversion ratio)
A mutator lineage (contains a mutation in nsp14 that is known to increase the mutation rate of the lineage) See Application Notes tab for more information.

Background

SARS-CoV-2 evolution exhibits a strong clock-like signature with mutational changes accumulating over time, but this pattern is punctuated by “saltational changes”, where lineages appear with a higher number of mutations than expected from their divergence time from other lineages (Neher (2022)). Such unusual lineages are thought to reflect long passage times within immunocompromised individuals, sharing many of the same signatures seen in chronic infections (Harari et al. (2022)).

When unusual lineages arise, however, it is challenging to know the evolutionary history leading to the observed genomic changes. Other processes, including passage through animals, (Bashor et al. 2021, Naderi et al. (2023)) mutator lineages with error-prone polymerases (Takeda et al. (2023)), and exposure to mutagens such as molnupiravir (Gruber et al. (2024)), can also leave unusual genomic signatures.

Given a user-provided set of nucleotide mutations defining an unusual lineage of SARS-CoV-2, this application compares the probability of generating this set from the following four distributions:

The list of mutations observed during the first nine months of the pandemic, prior to the spread of VoC Harari et al. (2022).
The list of mutations observed in Omicron-era sequences by Harari et al., included submission dates only up to 25 May 2022.
The list of mutations compiled from 27 chronic infections of immunocompromised individuals Harari et al. (2022).
The list of mutations inferred from 109 separate zoonotic spillovers from humans to white-tailed deer Feng et al. (2023).

In the first paper, the authors demonstrate that specific lineage-defining mutation patterns occur in SARS-CoV-2 genomes that are sequenced from chronic infections vs. mutations that occurred in SARS-CoV-2 genomes sequenced around the globe at the start of the pandemic (before the rise of Variants of Concern (VOCs)). They also analyzed lineage-defining mutation patterns in VOCs, and concluded that “mutations in chronic infections are predictive of lineage-defining mutations of VOCs”.

Feng et al. sequenced hundreds of SARS-CoV-2 samples obtained from white-tailed deer in the United States. They observed Alpha, Gamma, Delta and Omicron VOCs and determined that the deer infections arose from a minimum of 109 separate transmission events from humans. In addition, the deer were then able to transmit the virus to each other. Deer infections resulted in three documented human zoonoses. The SARS-CoV-2 virus displayed specific adaptation patterns in deer, which differ from adaptations seen in humans.

In addition, the app informs the user whether the data contain signals consistent with:

Past molnupiravir Use: The transition-to-transversion ratio of mutations is calculated in the focal lineage and compared to a background ratio of ~2:1 for SARS-CoV-2 and to case-control cohort studies indicate a ratio of ~14:1 under molnupiravir treatment (Gruber et al. (2024)). A high ratio may thus suggest past exposure to molnupiravir or a similar factor inducing transitions.
Mutator lineages: Mutator alleles may contribute to the unusual features of a lineage by increasing the rate and type of mutation. Known mutators have been observed in nsp14 within the ExoN proofreading domain of SARS-CoV-2. P203L in nsp14 was shown to have an elevated substitution rate in phylogenetic analyses, which was confirmed to double the mutation rate when passaged through hamsters (Takeda et al. (2023)). Sites F60S and C39F in nsp14 were associated with a 22-fold and 6-fold higher substitution rate in phylogenetic analyses (Mack et al. (2023)). We considered mutations at sites 39, 60, and 203 in nsp14 to be known mutators and mutations in sites 90, 92, 191, 268, and 273, which fall within the ExoN proofreading domain of nsp14, to be potential mutators.

Table 1: Mutator Sites. Known and Potential mutator sites (denoted by “Confirmed” and “Potential” in the “Site Type” column, respectively) are listed in the table below. Known sites have been confirmed experimentally, and the specific amino acid / nucleotide changes leading to mutator phenotypes are shown. Potential sites lie within the ExoN proofreading domain of nsp14 (as shown in Mack et al. 2023). The wild type amino acids, their positions within the mature nsp14 protein, encoding nucleotides and genomic locations are shown for these sites, but changes that would lead to mutator phenotypes have not been confirmed.

Gene	Amino Acid Change	Nucleotide Change	Site Type	Reference
nsp14	C39F	G18,155T	Confirmed	(Mack et al. 2023)
nsp14	F60S	T18,218C	Confirmed	(Takada et al. 2023)
nsp14	P203L	C18,647T	Confirmed	(Mack et al. 2023)
nsp14	D90	18,307-18,309 (GAT)	Potential	(Mack et al. 2023)
nsp14	E92	18,313-18,315 (GAG)	Potential	(Mack et al. 2023)
nsp14	E191	18,610-18,612 (GAG)	Potential	(Mack et al. 2023)
nsp14	H268	18,841-18,843 (CAT)	Potential	(Mack et al. 2023)
nsp14	D273	18,856-18,858 (GAT)	Potential	(Mack et al. 2023)

Application Use

This application accepts a list of comma separated nucleotide positions in a SARS-CoV-2 genome where lineage-defining mutations occur. Lineage-defining mutations are the subset of mutations in a lineage that have occurred since divergence from the larger SARS-CoV-2 tree. A list of lineage-defining mutations (the “mutation set”) for pangolin-designated SARS-CoV-2 lineages can be found here.

The application determines the likelihood of observing the mutation set as a random draw from each distribution (chronic infection, deer-specific mutations, global (pre-VOC) and global (Omicron era)). The log likelihood of observing the mutation set from each distribution is displayed (in natural log units)12.

Because the mutational data sets are sparse, the method bins sites across the genome when calculating likelihoods. The user can define the bin of interest: genes, genes splitting the spike protein into regions of interest, genome split into 500 nucleotide windows, or genome split into 1000 nucleotide windows. For a given bin choice, the log-likelihood of drawing the user-defined mutation set from each distribution is calculated from the multinomial distribution as:

sum(log(((distribution bin counts + 1) / sum(distribution bin counts + 1))^user bin counts))

The addition of one to each bin ensures that there are no bins lacking data.

Notes on Input

Your list can be formatted with or without nucleotide abbreviations. e.g. C897A, G3431T, A7842G, C8293T,... OR 897, 3431, 7842, 8293,...
These coordinates MUST be genomic coordinates, not gene coordinates like S:G107Y
Indels should be reported by including the first position only e.g. ins21608 NOT ins21608TCATGCCGCTGT
If you have an unaligned SARS-CoV-2 genome sequence and would like to use this tool, you must first place it into a phylogeny so that you can detect lineage-defining mutations. To get started, you may wish to access the tools associated with the UCSC SARS-CoV-2 Genome Browser.
If you would like to convert gene coordinates to nucleotide coordinates, try using Theo Sanderson’s tool.

Feedback

We're pleased to accept any feedback you have. You can submit an issue in the GitHub repository here. You can also email questions, comments or suggestions to erin.gill81(at)gmail.com. You can also leave comments in the Discussions tab.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
covid-mutation-distribution		covid-mutation-distribution
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chronic_infection_python

Overview

Background

Application Use

Notes on Input

Feedback

About

Releases

Packages

Languages

License

eringill/chronic_infection_python

Folders and files

Latest commit

History

Repository files navigation

chronic_infection_python

Overview

Background

Application Use

Notes on Input

Feedback

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages