Skip to content

rhysinward/Mexico_subsampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mexico_subsampling

Repository description:

This code and data were used for the analysis presented in “Comparing the evolutionary dynamics of predominant SARS-CoV-2 virus lineages co-circulating in Mexico” (link to preprint).

The repository contains the following elements:

  1. Data

Contains some of the data (migration data and sequence data has been excluded due to permissions) used for analysis that is:

  • Sequence metadata (downloaded from GISAID 11th December 2021)
  • Sequence data
  • Reference sequence
  • EPI_ISL_summary_table
  1. Data Processing script

Contains the scripts needed for processing sequence metadata and sequence data using R.

  1. Treemmer Script

Contains modified script for Treemmer which allows for pairs to be protected using Python.

  1. Results

Contains the collated and processed results of the analysis.

Running the code

In this example we show how to run the process for Delta.

Run the R script Code/updtaed_delta_sampling.R

  • The output of this is a migration-informed sequenced data

Align sequences to reference sequence using minimap2 embedded within Pangolin

conda activate pangolin pangolin --alignment gisaid_hcov-19_2022_01_20_09ta_processedNames.fas

Quality control through the NextClade pipeline

nextclade
--in-order
--input-fasta data/sars-cov-2/sequences.aln.fasta
--output-tsv output/nextclade.tsv
--input-dataset data/sars-cov-2
--output-tree output/nextclade.auspice.json
--output-dir output/
--output-basename nextclade

Run the R script Code/Quality_Control

  • Output of this is QC migration-informed sequence data.

If wanting to use phylodyanmic subsampling

Create ML-tree using IQtree

iqtree -s -m GTR+I+G -alrt 1000

Run the R script Code/treemmer_metadata.R

  • Output of this is metadata needed to run treemmer

Run the Python Script Code/Treemmer_v0.3_mod.py

python3 Code/Treemmer_v0.3_mod.py -lm Data/treemmer_metadata_519.txt -lmc Data/metadata_proportions.txt Data/Mex_B_1_1_519_12_07_2021.fasta.treefile -pp -X 4000

Run R script Code/prune_treemmer.R

Re-run IQtree to get final ML-tree

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published