Analysis of SARS-CoV-2 deep sequencing from Shen et al (2020)

Background

There are discrepancies in dates in the sequence collection for the eight samples: they were originally described as being from December 18 to 29, 2019. But then at some post peer-review stage after posting of the early access version, this was changed to January 2020. Specifically:

Final published version says January 2020

The final version of Shen et al (2020) was published in Clinical Infectious Diseases on May-5-2020. That final version (archived on the Wayback Machine here) describes the sequences coming from "Eight COVID-19 pneumonia samples were collected from hospitals in Wuhan on January 2020".

PubMed Central version says December 18 to 29, 2019

However, the PubMed Central version of Shen et al (2020) describes the samples as "Eight COVID-19 pneumonia samples were collected from hospitals in Wuhan from December 18 to 29, 2019". This PubMed Central version appears to have been built from the Clinical Infectious Diseases original online-early access version that posted on March-9-2020. I archive the Pubmed Central version on the Wayback Machine here, and also downloaded a copy of the PDF here.

Early access version from journal says December 18 to 29, 2019

The overall article history provided by Clinical Infectious Diseases indicates the paper was received Feb-18-2020, accepted on Feb-25-2020, first published on March-9-2020, and corrected and typeset on May-5-2020. I was able to find the early access version for Clinical Infectious Diseases archived on the Wayback Machine here, and it also refers to the samples as from "December 18 to 29, 2019".

The supplementary data links for the early access version is archived here. However, the actual links to the supplementary material are broken, so I cannot access the actual supplementary data for the early access version.

Analysis pipeline

The analysis pipeline in Snakefile gets the accessions from both the SRA uploaded BAMs and SRA files, and alignms them to Wuhan-Hu-1 and calls variants. Before doing this, it splits into separate files / Illumina runs and analyzes those independently.

The final variant calls are in results/aggregated_variants/all_samples.csv.

Then the Jupyter notebook analysis.ipynb is run manually outside the snakemake pipeline to look at mutations. Although some of the samples have a lot of mutations, they are confusing and don't seem to be ones to more ancestral variants (nothing at 8782, 18060, 28144, or 29095).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
results/aggregated_variants		results/aggregated_variants
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
analysis.ipynb		analysis.ipynb
config.yaml		config.yaml
environment.yml		environment.yml
run_Hutch_cluster.bash		run_Hutch_cluster.bash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

results/aggregated_variants

results/aggregated_variants

scripts

scripts

.gitignore

.gitignore

README.md

README.md

Snakefile

Snakefile

analysis.ipynb

analysis.ipynb

config.yaml

config.yaml

environment.yml

environment.yml

run_Hutch_cluster.bash

run_Hutch_cluster.bash

Repository files navigation

Analysis of SARS-CoV-2 deep sequencing from Shen et al (2020)

Background

Final published version says January 2020

PubMed Central version says December 18 to 29, 2019

Early access version from journal says December 18 to 29, 2019

Analysis pipeline

About

Releases

Packages

Languages

jbloom/SARS-CoV-2_Shen_et_al

Folders and files

Latest commit

History

Repository files navigation

Analysis of SARS-CoV-2 deep sequencing from Shen et al (2020)

Background

Final published version says January 2020

PubMed Central version says December 18 to 29, 2019

Early access version from journal says December 18 to 29, 2019

Analysis pipeline

About

Resources

Stars

Watchers

Forks

Languages