Snakemake pipeline to reproduce the wg-GWAS analysis on the genetic determinants of E. coli bloodstream infections (BSI) and commensal isolates. Users wishing to replicate this study should place the data containig the phenotype and covariates within the data/
directory. Further details can be found in the preprint.
The input assemblies can be found under the following bioproject accessions:
Phenotype | Bioproject accession |
---|---|
BSI | PRJEB39260, PRJEB35745 |
Commensal | PRJEB38489, PRJEB44819, PRJEB44872, PRJEB39252, PRJEB55584 |
The fasta files should be placed inside the data/fastas
directory, and the annotated gff files in the data/gffs
directory.
To make a dry run of the analyis:
snakemake --use-conda --cores 36 -n -p
Snakemake will install the appropriate packages for each step as conda environments when running it without the -n
flag.
A symbolic link to a directory containing the eggnog-mapper database should be placed in data/eggnog-mapper
, as well as one to the unzipped fasta file from uniref50 (data/uniref50.fasta
).
Judit Burgaya (BurgayaVentura.Judit@mh-hannover.de)