Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Likely artefactual SNPs T76A, T78G in ~4k sequences from CNR Virus des Infections Respiratoires, France from 2022-10 to 2023-12 #1

Open
corneliusroemer opened this issue Jan 21, 2024 · 0 comments

Comments

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Jan 21, 2024

Collection date range: October 2022 to December 2023
Submission date range: 2022-11-01 to 2023-12-22
Submitter: Hadrien Regue
Submitting institution: CNR Virus des Infections Respiratoires, France SUD
Country: France
Total number affected: ~4100
Status: Submitter has been contacted via GISAID message (2023-12-02)
GISAID query: 76A,78G

Small subset of GISAID accessions
EPI_ISL_15810455
EPI_ISL_15810437
EPI_ISL_15810436
EPI_ISL_15810434
EPI_ISL_15810433
EPI_ISL_15810432
EPI_ISL_15810431
EPI_ISL_15810429
EPI_ISL_15810402
EPI_ISL_15810400
EPI_ISL_15810397
EPI_ISL_15810396
EPI_ISL_15810393
EPI_ISL_15810392
EPI_ISL_15810391
EPI_ISL_15810387
EPI_ISL_15810385
EPI_ISL_15810384
EPI_ISL_15810382
EPI_ISL_15810381
EPI_ISL_15810371
EPI_ISL_15810368
EPI_ISL_15810355
EPI_ISL_15810354
EPI_ISL_15810353
EPI_ISL_15810352
EPI_ISL_15810351
EPI_ISL_15810350
EPI_ISL_15810349
EPI_ISL_15810346
EPI_ISL_15810345
EPI_ISL_15810344
EPI_ISL_15810343
EPI_ISL_15810342
EPI_ISL_15810334
EPI_ISL_15810331
EPI_ISL_15810326
EPI_ISL_18676891
EPI_ISL_18676901
EPI_ISL_18676904
EPI_ISL_18676907
EPI_ISL_18676908
EPI_ISL_18676912
EPI_ISL_18676930
EPI_ISL_18676934
EPI_ISL_18676942
EPI_ISL_18676947
EPI_ISL_18676952
EPI_ISL_18676972
EPI_ISL_18676984
EPI_ISL_18677004

During routine sequence review, I noticed that a lot of sequences across many lineages had a double nuc mutation T76A, T78G.

This double mutation is likely a sequencing artefact for the following reasons:

  • It appears almost solely in sequences from one particular submitting lab, were it real, it would appear in a variety of labs
  • It appears and disappears fairly suddenly, which is typical of artefacts as they don't grow and disappear like normal lineages. They can appear suddenly with a new primer set or assembly software and disappear when those are changed.
  • It appears across the entire phylogenetic tree, i.e. is very homoplasic. Were the double SNP real, it would occur concentrated in one or a few lineages.

The double mutation T76A, T78G has appeared almost solely in France with very sudden occurrence and disappearance:
image

It is very homoplasic, occurring in many clades:
image

See cov-Spectrum

I messaged the submitters through GISAID's "contact submitter" field

Full set of accessions: epi_isls.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant