Skip to content

jbloom/SARS-CoV-2_Shen_et_al

Repository files navigation

Analysis of SARS-CoV-2 deep sequencing from Shen et al (2020)

Background

There are discrepancies in dates in the sequence collection for the eight samples: they were originally described as being from December 18 to 29, 2019. But then at some post peer-review stage after posting of the early access version, this was changed to January 2020. Specifically:

Final published version says January 2020

The final version of Shen et al (2020) was published in Clinical Infectious Diseases on May-5-2020. That final version (archived on the Wayback Machine here) describes the sequences coming from "Eight COVID-19 pneumonia samples were collected from hospitals in Wuhan on January 2020".

PubMed Central version says December 18 to 29, 2019

However, the PubMed Central version of Shen et al (2020) describes the samples as "Eight COVID-19 pneumonia samples were collected from hospitals in Wuhan from December 18 to 29, 2019". This PubMed Central version appears to have been built from the Clinical Infectious Diseases original online-early access version that posted on March-9-2020. I archive the Pubmed Central version on the Wayback Machine here, and also downloaded a copy of the PDF here.

Early access version from journal says December 18 to 29, 2019

The overall article history provided by Clinical Infectious Diseases indicates the paper was received Feb-18-2020, accepted on Feb-25-2020, first published on March-9-2020, and corrected and typeset on May-5-2020. I was able to find the early access version for Clinical Infectious Diseases archived on the Wayback Machine here, and it also refers to the samples as from "December 18 to 29, 2019".

The supplementary data links for the early access version is archived here. However, the actual links to the supplementary material are broken, so I cannot access the actual supplementary data for the early access version.

Analysis pipeline

The analysis pipeline in Snakefile gets the accessions from both the SRA uploaded BAMs and SRA files, and alignms them to Wuhan-Hu-1 and calls variants. Before doing this, it splits into separate files / Illumina runs and analyzes those independently.

The final variant calls are in results/aggregated_variants/all_samples.csv.

Then the Jupyter notebook analysis.ipynb is run manually outside the snakemake pipeline to look at mutations. Although some of the samples have a lot of mutations, they are confusing and don't seem to be ones to more ancestral variants (nothing at 8782, 18060, 28144, or 29095).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published