humanXsweeps

Code for analysis of X chromosomes from the Simons Diversity Panel. Once cloned the repository folder needs to be renamed to kmt to fit file paths used in the workflows. Also the following subfolders must be created in the repository folder:

mkdir steps results figures

Conda environments

simons

This is the main py3 environment for the workflows. You can create it like this:

conda env create -f simons.yml

simons_jupyter

This is the main py3 environment for jupyter analysis. You can create it like this:

conda env create -f simons_jupyter.yml

argweaver

Separate py2 environment that is only used from within the GWF workflow (see below). You can create it like this:

You can create it like this:

conda env create -f argweaver.yml

This has argweaver executables and argweaver python lib (must be downloaded and installed using pip). It requires compbio which is also installed the same way.

For some reason I hade to compile these executables manually to keep them from coredumping:

g++ argweaver/src/argweaver/*.cpp argweaver/src/smc2bed.cpp -o ~/anaconda3/envs/argweaver/bin/smc2bed
g++ argweaver/src/argweaver/*.cpp argweaver/src/arg-summarize.cpp -o ~/anaconda3/envs/argweaver/bin/arg-summarize
cd argweaver ; pip install .

Note that you need to chagne the option in the workflow from --resume to --overwrite if you want to redo the analysis

If resampling is aborted the stats file in each sampling dir may be corrupt. Delete the last sample (here 2000):

find steps/argweaver/samples -name '*2000.smc.gz' -exec rm {} \;

and roll back the stats file to the correct iteration (here 1900):

find steps/argweaver/samples -name '*[01].stats' | python scripts roll_back_argweaver_stats_files.py 1900

GWF workflows

Main analysis of SGDP:

gwf -f workflow_simons.py run

Additional analysis of 1000 genomes

gwf -f workflow_1000genomes.py run

Data analysis and visualization using jupyter notebooks

All notebooks are listed in the folder notebooks. They should be run in the order they are numbered.

For easy use of jupyter analysis on a slurm computing cluster use slurm_jupyter.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
notebooks		notebooks
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md
argweaver.yml		argweaver.yml
simons.yml		simons.yml
simons_jupyter.yml		simons_jupyter.yml
templates.py		templates.py
workflow_1000genomes.py		workflow_1000genomes.py
workflow_clues.py		workflow_clues.py
workflow_simons.py		workflow_simons.py
workflow_smcpp.py		workflow_smcpp.py

kaspermunch/humanXsweeps

Folders and files

Latest commit

History

Repository files navigation

humanXsweeps

Conda environments

simons

simons_jupyter

argweaver

GWF workflows

Data analysis and visualization using jupyter notebooks

About

Resources

Stars

Watchers

Forks

Languages