This is more or less a one-stop shop for every result generated in Patel et al., 2024. This Snakefile can generate conditional frequency spectra from empirical data as well as from theoretical models, using either SLiM simulations or fastDTWF, for both generic two-population models and the Jouganous et al. 2017 out-of-Africa model (ft. some slight modifications).
data/gwas
contains GWAS summary statistics for 106 complex traits analyzed in this paper. The summary statistics were curated by Yuval Simons, Hakhamanesh Mostafavi, and Julie Zhu based on the Neale lab UK Biobank GWAS.data/distributions
contains all frequency spectra analyzed in this paper if you want to tinker with some distributions.
scripts/probability_dist.py
is really nice if you need to do any kind of manipulations with (discretized) frequency spectra or other probability distributions.scripts/empirical_cfs.py
will generate conditional frequency spectra for whatever empirical data you want to analyze.
data/CADD_bestfit
- B-statistics, downloadable from the Sella lab GitHub (note: these are in GRCh37 so everything must be in GRCh37 unless you want to convert the B-maps)data/combined_gwas.txt
- table of GWAS summary statistics with columns SNP (chr:bp:ref:alt), effect, and trait_idxdata/snp_ID_conversion_table.txt
- table mapping SNP (chr:bp:ref:alt) to rsIDdata/1KG_sample_info.txt
- table mapping 1K Genomes sample IDs to population labelsdata/1KGenomes
- 1K Genomes GRCh37 VCFs for each chromosomedata/variation_feature.txt.gz
- table mapping rsIDs to ancestral allele states, downloadable from Ensembldata/freq_WB
- table mapping SNP (chr:bp:ref:alt) to alternate allele frequencies in UK Biobank White British (or whatever your GWAS population is)
- SLiM simulations need to be run separately (see
scripts/slim
) before running the Snakefile (using Snakemake to spawn 200k+ jobs is just bad practice)