-
Notifications
You must be signed in to change notification settings - Fork 2
Basic usage
Jade Cheng edited this page Sep 16, 2018
·
1 revision
We need to have FASTA files. The sequences need to match in length and names. Let's use the Isolation model as an example here.
$ ls
a.fasta b.fasta
$ cat a.fasta | wc -c
1010013
$ cat b.fasta | wc -c
1010013
$ head *.fasta -n 3
==> a.fasta <==
> input_1_1
aaaaaaaaaaaaaaaaaXaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
==> b.fasta <==
> input_1_1
aaaaaaaaaaaaaaaaaaaaaaaaa-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaCaaGaaaaaa
We need to prepare the ZipHMM data directory according to a model specification.
$ Jocx.py init . iso a.fasta b.fasta
# creating uncompressed sequence file
# using output directory "./ziphmm_iso_a_b"
# parsing "a.fasta"
# parsing "b.fasta"
# comparing sequence "input_1_1"
# sequence length: 1000000
# creating "./ziphmm_iso_a_b/input_1_1.ziphmm"
# Creating 5-state alignment in directory: ./ziphmm_iso_a_b/input_1_1.ziphmm
$ ls
a.fasta b.fasta ziphmm_iso_a_b
$ find ziphmm_iso_a_b/
ziphmm_iso_a_b/
ziphmm_iso_a_b/input_1_1.ziphmm
ziphmm_iso_a_b/input_1_1.ziphmm/data_structure
ziphmm_iso_a_b/input_1_1.ziphmm/nStates2seq
ziphmm_iso_a_b/input_1_1.ziphmm/nStates2seq/5.seq
ziphmm_iso_a_b/input_1_1.ziphmm/original_sequence
We can now execute CoalHMM given the matching model and one of the three optimizers, NM, PSO, and GA.
$ Jocx.py run . iso nm 0.001 2000 0.5
# algorithm = _NMOptimiser
# timeout = None
# max_executions = 1
#
# 2016-11-20 22:08:07.350329
#
# param0 = (0.0005000000000000001, 0.002)
# param1 = (999.9999999999998, 4000.0)
# param2 = (0.25, 1.0)
#
# execution state score param0 param1 param2
0 init -34547.3651534 0.001095650748 1589.08255478 0.283742230795
1 fmin-in -34645.0430766 0.000712774345468 1029.01830743 0.813449518812
1 fmin-cb -34589.6802608 0.000748413062741 926.116476683 0.854121994752
1 fmin-cb -34589.6802608 0.000748413062741 926.116476683 0.854121994752
1 fmin-cb -34533.1270287 0.000831570069713 857.515256188 0.840564502772
:
$ Jocx.py run . iso pso 0.001 2000 0.5
# algorithm = _PSOptimiser
# timeout = None
# max_iterations = 50
# particle_count = 50
# max_initial_velocity = 0.02
# omega = 0.9
# phi_particle = 0.3
# phi_swarm = 0.1
#
# 2016-11-20 22:20:20.968101
#
# param0 = (0.0005000000000000001, 0.002)
# param1 = (999.9999999999998, 4000.0)
# param2 = (0.25, 1.0)
#
#
# PARTICLES FOR ITERATION 1
# swarm_fitness = -34370.3728435
# best_average_fitness = -34865.0449053
# best_minimum_fitness = -35829.4334913
# best_maximum_fitness = -34370.3728435
# current_average_fitness = -34865.0449053
# current_minimum_fitness = -35829.4334913
# current_maximum_fitness = -34370.3728435
#
# gen idv fitness param0 param1 param2 best_fitness best-param0 best-param1 best-param2
1 0 -34370.37284352 0.00181842 1182.56549665 0.31842394 -34370.37284352 0.00181842 1182.56549665 0.31842394
1 1 -34372.85439833 0.00184368 1172.75314657 0.64961210 -34372.85439833 0.00184368 1172.75314657 0.64961210
1 2 -34373.94788272 0.00195363 1070.88584882 0.65517884 -34373.94788272 0.00195363 1070.88584882 0.65517884
1 3 -34393.65611850 0.00157531 1165.64018865 0.36541651 -34393.65611850 0.00157531 1165.64018865 0.36541651
1 4 -34399.70364572 0.00154114 1083.40096385 0.71295224 -34399.70364572 0.00154114 1083.40096385 0.71295224
:
Here is the output of the '--help' option. It explains all of the supported models and the required order of the FASTA files supplied for each model.
USAGE:
python Jocx.py init [options] <exp-folder> <model> <fasta1> <fasta2> ...
python Jocx.py run <exp-folder> <model> <optimizer> <range1> <range2> ...
COMMANDS
init Initializes alignments and zipped sequences needed to run an
experiment. This command must be executed once before using
the 'run' command.
run Runs an experiment. Before running an experiment, the
appropriate directories must be initialized using the 'init'
command.
OPTIONS
--chunk-size Indicates the next argument is the maximum size of an
individual alignment produced by the ZipHMM library; units
are in symbols. If unspecified, the chunk size is set to the
size of the entire sequence.
--help Prints this help message and exits
ARGUMENTS
exp-folder The path to the experiments folder
fasta1 The path to the first FASTA-formatted file
fasta2 The path to the second FASTA-formatted file
model The model to use for the experiment; e.g. 'iso' or 'iim'
optimizer The optimizer to use for the experiment; e.g. 'nm' or 'pso'
range1 The range for the first parameter (see README for details)
range2 The range for the second parameter
DESCRIPTION
This program executes CoalHMM with a specified model and optimizer using a
given sequence of data in the format of ZipHMM directories. The program
prints to standard output the progression of the estimated parameters and
the corresponding log likelihood.
ISOLATION MODEL (iso)
*
/ \ tau
A B
3 params -> tau, coal_rate, recomb_rate
2 seqs -> A, B
1 group -> AB
ISOLATION INITIAL MIGRATION MODEL (iim)
*
/ \
/<->\ mig_time
/ \ iso_time
A B
5 params -> iso_time, mig_time, coal_rate, recomb_rate, mig_rate
2 seqs -> A, B
1 group -> AB
ISOLATION 3 HMM MODEL (iso-3hmm)
*
/ \ tau
A1 B1
A2 B2
3 params -> tau, coal_rate, recomb_rate
4 seqs -> A1, A2, B1, B2
3 groups -> A1B1, A1A2, B1B2
ISOLATION INITIAL MIGRATION 3 HMM MODEL (iim-3hmm)
*
/ \
/<->\ mig_time
/ \ iso_time
A1 B1
A2 B2
5 params -> iso_time, mig_time, coal_rate, recomb_rate, mig_rate
4 seqs -> A1, A2, B1, B2
3 groups -> A1B1, A1A2, B1B2
THREE POP ADMIX MODEL (admix)
*
/ \
/ \ mig_time
/_____\
admix_prop / <-| \ iso_time
A C B
5 params -> iso_time, mig_time, coal_rate, recomb_rate, admix_prop
3 seqs -> A, B, C
3 groups -> AC, BC, AB
THREE POP ADMIX 2 3 MODEL (admix23)
*
/ \ greedy1_time_1a
buddy23_time_1a /\ \
/ \_/\ buddy23_time_2a
admix_prop / <-| \ iso_time
A C B
7 params -> iso_time, buddy23_time_1a, buddy23_time_2a, greedy1_time_1a,
coal_rate, recomb_rate, admix_prop
3 seqs -> A, B, C
3 groups -> AC, BC, AB
ONE POP ADMIX 2 3 MODEL (admix23-1pop)
*
/ \ greedy1_time_1a
buddy23_time_1a /\ \
/ \_/\ buddy23_time_2a
admix_prop / <-| \ iso_time
C1
C2
5 params -> iso_time, mig_time, coal_rate, recomb_rate, admix_prop
2 seqs -> C1, C2
1 groups -> C1C2
TWO POP ADMIX 2 3 MODEL (admix23-2pop)
*
/ \ greedy1_time_1a
buddy23_time_1a /\ \
/ \_/\ buddy23_time_2a
admix_prop / <-| \ iso_time
A1 C1
A2 C2
6 params -> iso_time, buddy23_time, greedy1_time, coal_rate, recomb_rate,
admix_prop
4 seqs -> A1, A2, C1, C2
3 groups -> A1C1, A1A2, C1C2
TWO POP ADMIX 2 3 ONE SAMPLE MODEL (admix23-2pop-1sample)
*
/ \ greedy1_time_1a
buddy23_time_1a /\ \
/ \_/\ buddy23_time_2a
admix_prop / <-| \ iso_time
A C
6 params -> iso_time, buddy23_time, greedy1_time, coal_rate, recomb_rate,
admix_prop
2 seqs -> A, C
1 group -> AC
THREE POP ADMIX 2 3 MODEL 6 HMM (admix23-6hmm)
*
/ \ greedy1_time_1a
buddy23_time_1a /\ \
/ \_/\ buddy23_time_2a
admix_prop / <-| \ iso_time
A1 C1 B1
A2 C2 B2
7 params -> iso_time, buddy23_time_1a, buddy23_time_2a, greedy1_time_1a,
coal_rate, recomb_rate, admix_prop
6 seqs -> A1, A2, B1, B2, C1, C2
6 groups -> A1C1, B1C1, A1B1, A1A2, B1B2, C1C2
THREE POP ADMIX 2 3 MODEL 15 HMM (admix23-15hmm)
*
/ \ greedy1_time_1a
buddy23_time_1a /\ \
/ \_/\ buddy23_time_2a
admix_prop / <-| \ iso_time
A1 C1 B1
A2 C2 B2
7 params -> iso_time, buddy23_time_1a, buddy23_time_2a, greedy1_time_1a,
coal_rate, recomb_rate, admix_prop
6 seqs -> A1, A2, B1, B2, C1, C2
15 groups-> A1C1, A1C2, A2C1, A2C2, B1C1
B1C2, B2C1, B2C2, A1B1, A1B2
A2B1, A2B2, A1A2, B1B2, C1C2
EXAMPLES
$ python Jocx.py init .../ziphmm iso .../seq1.fa .../seq2.fa
$ python Jocx.py run .../ziphmm iso ga
# algorithm = _GAOptimiser
# timeout = None
# elite_count = 1
:
# gen idv fitness param0 param1 param2
1 1 -38786.09710120 0.00083995 164.91715870 0.01344298
1 2 -59912.38796110 0.00098375 31.17524014 0.36805062
:
BUGS
Report all bugs to Jade Cheng <ycheng@cs.au.dk>
Copyright (c) 2015-2018 Jade Cheng