In [1]:
# https://shahlab.ca/projects/apolloh/
# Function apolloh(infile,cnfile,paramset,outparam,outfile)

# INPUTS:  
# infile         Tab-delimited input file containing allelic counts
#                  from the tumour at positions determined as heterozygous
#                  from the normal genome. 
#                  No header line is assumed.
#                 6 columns:
#                     1) chr (integer; 'X' and 'Y' strings can be used)
#                     2) position
#                     3) reference base (can be arbitrary; not used)
#                     4) referenc count
#                     5) non-reference base (can be arbitrary; not used)
#                     6) non-reference count 

# cnfile         Tab-delimited input copy number segment prior file.
#                   The accepted format is the output from HMMcopy,
#                   a read-depth for analyzing copy number in tumour-
#                   normal sequenced genomes.  
#                   However, copy number segments from any source can be used.                
#                  8-columns:
#                     1) id (can be arbitrary, not used)
#                     2) chr
#                     3) start  
#                     4) stop
#                     5) Number of 1kb intervals (can be arbitrary; not used)
#                     6) median log2 ratio (normal and tumour) for segment
#                     7) HMM state: 1=HOMD (0 copies), 2=HEMD (1 copy),
#                         3=NEUT (2 copies), 4=GAIN (3 copies),
#                         5=AMP (4 copies), 6=HLAMP (5+ copies)
#                         Note that for AMP and HLAMP, relative numbers of copies
#                         can be used (i.e. GAIN is 3-4 copies, AMP is 5-6 copies,
#                         HLAMP is 7+ copies)
#                     8 ) CN state (can be arbitrary; not used)
#                   If cnfile='0' is used, then copy number of 2 (diploid) is used
#                     for all positions.  

# paramset       Parameter intialization file is a matlab binary (.mat) file.
#                       This file contains model and setting paramters necessary
#                       to run the program.
#                       See examples in "<$install_dir>/APOLLOH_0.1.0/parameters/".

# outfile        Tab-delimited output file for position-level results. 
#                   9-columns:
#                      1) chr ('X' and 'Y' will be output as 23 and 24)
#                      2) position
#                      3) reference count
#                      4) non-reference count
#                      5) total depth
#                      6) allelic ratio
#                      7) copy number (from input)
#                      8 ) APOLLOH genotype state
#                      9) Zygosity state.
#                   N additional columns:
#                      posterior marginal probabilities (responsibilities) for
#                      each APOLLOH genotype state. 
#                  Zygosity states are:
#                     DLOH=deletion-LOH (state 1)
#                     NLOH=copy-neutral-LOH (states 2,4)
#                     ALOH=amplified-LOH (states 5,8,9,13,14,19)
#                     HET=heterozygous (states 3,6,7)
#                     ASCNA=allele-specific-amplification (states 10,12,15,18)
#                     BCNA=balanced-amplification (states 11,16,17)

#                   Segment boundaries are determined as consecutive
#                    marginal states of DLOH, NLOH, ALOH, HET, BCNA,
#                    ASCNA; this implementation does not output this
#                    information. An external Perl script handles this: "
#                    <$install_dir>/APOLLOH_0.1.0/scripts/createSingleSegFileFromAPOLLOH.pl"

# outparam       Tab-delimited output file storing converged parameters
#                        after model training using Expectation Maximization (EM)
#                        algorithm.
#                      1) Number of iterations 
#                      2) Global normal contamination parameter 
#                      3) Binomial parameters for each HMM class/state.

For the majority of positions in the genome the base present is consistent between individuals, however a small percentage may contain different bases (usually one of two; for instance, ‘A’ or ‘G’) and these positions are called ‘single nucleotide polymorphisms’ or ‘SNPs’. When the genomic copies derived from each parent have different bases for these polymorphic regions (SNPs) the region is said to be heterozygous. Most of the chromosomes within somatic cells of individuals are paired, allowing for SNP locations to be potentially heterozygous. However, one parental copy of a region can sometimes be lost, which results in the region having just one copy. The single copy cannot be heterozygous at SNP locations and therefore the region shows loss of heterozygosity (LOH). Loss of heterozygosity due to loss of one parental copy in a region is also called hemizygosity in that region.

# files: