# EIGENSTRAT / SMARTPCA

Goal: To examine patterns of population structure in duplicated loci.

### Bring:
1. *.haplotypes.tsv file (produced by [Stacks](http://creskolab.uoregon.edu/stacks/))
    * used for dominant coding of each allele

2. genotype file in *.ped format
    * normal codominant genotypes of biallelic SNPs, used to compare 
    

### Take away:
* PCAs of genetic data:
    - full data coded as codominant genotypes (normal bi-allelic SNPs)
    - full data coded as dominant alleles
    - paralogs coded as dominant alleles

### Programs used:
* [EIGENSOFT](http://www.hsph.harvard.edu/alkes-price/software/) (specifically SMARTPCA)
 - Patterson N, Price AL, Reich D (2006) Population Structure and Eigenanalysis. PLoS Genet 2(12): e190. doi: 10.1371/journal.pgen.0020190
    
### Steps
1. a
2. b
3. c

#### Python imports

In [1]:
import os.path
import collections
import numpy as np
import pandas as pd
import random
from IPython.core.pylabtools import figsize
import matplotlib.pyplot as plt
import seaborn as sns

#### Dominance coding
Haplotypes is a class to store the haplotypes assigned to a group of indidividuals at a particular catalog ID.  Each catalog ID is split into one 'locus' per observed haplotyped.  These loci are named as "catID_haplotype"

Genotypes of these dominant loci are scored as: 
    - 1 = allele is present
    - 0 = allele is absent
    - 9 = no call

In [2]:
class Haplotypes(object):
    def __init__(self, catID, ind_names, haplotypes, missing_codes = ["", "-", "No Call", 'Invalid', 'consensus']):
        self.catID = catID
        self.missing_codes = missing_codes
        self.haplotype_of_ind = collections.OrderedDict(zip(ind_names, haplotypes))
        self.set_alleles()
        self.split_catID_haplotypes()
        
    def set_alleles(self):    
        seen = set()
        for ind, haplo in self.haplotype_of_ind.items():
            if haplo in self.missing_codes:
                haplo = "-"
            else:
                haplo_set = frozenset(haplo.split("/"))
                for allele in haplo_set:
                    if allele not in self.missing_codes:
                        seen.add(allele)
        self.alleles = seen
        
    def split_catID_haplotypes(self):
        self.dom_coding_of_allele = dict()
        for allele in self.alleles:
            dom_coding = list()
            for xx in self.haplotype_of_ind.values():
                if xx is '-':
                    dom_coding.append(9)
                elif allele in xx.split('/'):
                    dom_coding.append(1)
                else:
                    dom_coding.append(0)
            self.dom_coding_of_allele['{}_{}'.format(self.catID, allele)] = dom_coding      

Trying out the Haplotypes class

In [3]:
INDNAMES = ['CMHAMM10_0002','CMHAMM10_0005','CMHAMM10_0008','CMHAMM10_0011','CMHAMM10_0012',
            'CMHAMM10_0014','CMHAMM10_0015','CMHAMM10_0016', 'CMHAMM10_0017','CMHAMM10_0018','CMHAMM10_0022',
            'CMHAMM10_0024']
HAPLOS = ['GG/TA','GG/TA','GG/TT','GG/TA','GG/TA','GG/TA','GG/TA','GG/TA','GG/TA','GG/TA','GG/TA','-']

AA = Haplotypes(catID='01', ind_names=INDNAMES, haplotypes=HAPLOS)

for ind, haplotypes in AA.haplotype_of_ind.items():
    print ind, haplotypes
    
print '\n'

for locus, genotypes in AA.dom_coding_of_allele.items():
    print locus, genotypes

CMHAMM10_0002 GG/TA
CMHAMM10_0005 GG/TA
CMHAMM10_0008 GG/TT
CMHAMM10_0011 GG/TA
CMHAMM10_0012 GG/TA
CMHAMM10_0014 GG/TA
CMHAMM10_0015 GG/TA
CMHAMM10_0016 GG/TA
CMHAMM10_0017 GG/TA
CMHAMM10_0018 GG/TA
CMHAMM10_0022 GG/TA
CMHAMM10_0024 -


01_TT [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 9]
01_GG [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9]
01_TA [1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 9]


#### creating the input files for SMARTPCA - dominance coding
This function will encode genotypes as a set of dominant alleles.

- haplotypes_in = haplotypes.tsv (from Stacks)

- indiv_in = ped file of the individuals to retain

- catID_in = map or snplist file of the catIDs to retain

- genotype_out = genotype input for SMARTPCA

- genotype_out = map input for SMARTPCA (list of loci/alleles)

In [4]:
def write_dom_EIGENSTRAT_files(haplotypes_in, ind_in, catID_in, genotype_out, map_out):
    with open(ind_in) as ind_INFILE:
        ind_to_keep = [line.split()[1] for line in ind_INFILE]
        print('Inds to keep: {}'.format(len(ind_to_keep)))

    with open(catID_in) as catID_INFILE:
        try: # this works on map files - catId is found by spliting the snpID
            catId_to_keep = [line.split()[1].split('_')[0] for line in catID_INFILE]
        except IndexError: # this works on snplist files - each line is a catID to keep
            catId_to_keep = [line.strip() for line in catID_INFILE]
        print('catIDs to keep: {}'.format(len(catId_to_keep)))

    with open(haplotypes_in) as INFILE:
        HEADER = next(INFILE)
        ind_names = HEADER.strip().split('\t')[2:]
        ind_keeps = [xx in ind_to_keep for xx in ind_names]
        kept_names = [xx for xx in ind_names if xx in ind_to_keep]
        with open(genotype_out, 'w') as geno_OUTFILE:
            with open(map_out, 'w') as map_OUTFILE:
                for line in INFILE:
                    catID = line.split('\t')[0]
                    if catID in catId_to_keep:
                        haplos = line.strip().split('\t')[2:]
                        kept_haplos = [xx for (cnt,xx) in enumerate(haplos) if ind_keeps[cnt]]
                        haps = Haplotypes(catID, kept_names, kept_haplos)
                        for allele, calls in haps.dom_coding_of_allele.items():
                            geno_OUTFILE.write(''.join([str(xx) for xx in calls]))
                            geno_OUTFILE.write('\n')
                            map_OUTFILE.write('1\t{}\t1\t1\n'.format(allele))
    return(len(catId_to_keep))

Notice, these are based on **haplotypes** - rather than individual snps the codominant data is based on.


# Data sets

listed as: 'LOCUS_SET' . 'ENCODING' . 'FULL_SUBSAMPLE' . 'EXT'

LOCUS_SETS:
    1. complete (codominant)
    2. complete (dominant alleles)
    3. mapped non-paralogs (codominant)
    4. mapped non-paralogs (dominant alleles)
    5. mapped paralogs (dominant alleles)
    
    and subsamples of each down to match the number of loci in smallest set (mapped paralogs)


Copy data into local dir

In [5]:
base_haplotypes_file = os.path.join('data', 'batch_4', 'all.haplotypes.tsv')
base_individual_file = os.path.join('data', 'batch_4', 'pop_genotypes', 'complete.ped')
complete_loci_file = os.path.join('data', 'batch_4', 'pop_genotypes', 'complete.map')
mapped_loci_file = os.path.join('data', 'batch_4', 'pop_genotypes', 'on_map.map')

In [6]:
import shutil

shutil.copy(base_haplotypes_file, os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.haplotypes.tsv'))
shutil.copy(base_individual_file, os.path.join('results', 'batch_4', 'EIGENSOFT', 'individuals.ped'))
shutil.copy(complete_loci_file,   os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.map'))

shutil.copy(mapped_loci_file , os.path.join('results', 'batch_4', 'EIGENSOFT', 'on_map.map'))

#### edit the complete.map file, setting chrom=0 to chrom 40
SMARTPCA removes loci with chrom=0

In [7]:
with open(os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.map')) as INFILE:
    with open(os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.map.tmp'), 'w') as OUTFILE:
        for line in INFILE:
            if int(line.strip().split()[0]) == 0:
                OUTFILE.write('40\t{}\t{}\t{}\n'.format(line.strip().split()[1], line.strip().split()[2], line.strip().split()[3]))
            else:
                OUTFILE.write(line)

shutil.copy(os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.map.tmp'), 
            os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.map'))


In [8]:
write_dom_EIGENSTRAT_files(
    haplotypes_in = os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.haplotypes.tsv'),
    ind_in        = os.path.join('results', 'batch_4', 'EIGENSOFT', 'individuals.ped'),
    catID_in      = os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.map'),
    genotype_out  = os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.dom.txt'),
    map_out       = os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.dom.map')
)

Inds to keep: 174
catIDs to keep: 13407


13407

#### Set of all mapped loci mapped loci

In [9]:
write_dom_EIGENSTRAT_files(
    haplotypes_in = os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.haplotypes.tsv'),
    ind_in        = os.path.join('results', 'batch_4', 'EIGENSOFT', 'individuals.ped'),
    catID_in      = os.path.join('results', 'batch_4', 'EIGENSOFT', 'on_map.map'),
    genotype_out  = os.path.join('results', 'batch_4', 'EIGENSOFT', 'on_map.dom.txt'),
    map_out       = os.path.join('results', 'batch_4', 'EIGENSOFT', 'on_map.dom.map')
)

Inds to keep: 174
catIDs to keep: 7259


7259

# Pick out paralogs

In [10]:
linkage_map = pd.read_csv(os.path.join('linkage_map', 'LEPmap', 'with_paralogs', 'final', 'PS_chum_map_2015.txt'), sep = '\t')
linkage_map.head(3)

Unnamed: 0,contig,resolved_locus,stacks_CatID,stacks_SNP,LEPname,LEP_LG,cM_OLD,paper1_LG,cM
0,c4311,4311,28282,28282_88,4584,1,0,1,0
1,c4311,4311,28282,28282_91,4584,1,0,1,0
2,c56875,56875,39970,39970_17,7377,1,0,1,0


In [11]:
paralogs = pd.read_csv(os.path.join('linkage_map','chum_paralogs.txt'), header = None)
paralogs.columns = ['old_CatID']
paralogs['contig'] = ['c' + str(xx) for xx in paralogs['old_CatID']]
paralogs['paralog'] = True

In [12]:
catID_paralogs = set(pd.merge(linkage_map, paralogs)['stacks_CatID'])

In [13]:
with open(os.path.join('linkage_map','mapped_chum_paralogs.catIDs'), 'w') as OUTFILE:
    for xx in catID_paralogs:
        OUTFILE.write('{}\n'.format(xx))

#### Just the mapped paralogs

In [14]:
num_subsample = write_dom_EIGENSTRAT_files(
    haplotypes_in = os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.haplotypes.tsv'),
    ind_in        = os.path.join('results', 'batch_4', 'EIGENSOFT', 'individuals.ped'),
    catID_in      = os.path.join('linkage_map', 'mapped_chum_paralogs.catIDs'),
    genotype_out  = os.path.join('results', 'batch_4', 'EIGENSOFT', 'paralogs.dom.txt'),
    map_out       = os.path.join('results', 'batch_4', 'EIGENSOFT', 'paralogs.dom.map')
)

Inds to keep: 174
catIDs to keep: 1214


# Subsampling

In [15]:
def subsample_map_file(infile, outfile, num_keep = 1000):
    with open(infile) as INFILE:
        num_loci = sum(1 for line in INFILE)
    lines_to_keep = random.sample(range(num_loci), num_keep)
    with open(outfile, 'w') as OUTFILE:
        with open(infile) as INFILE:
            idx = 0
            for line in INFILE:
                if idx in lines_to_keep:
                    OUTFILE.write(line)
                idx += 1

In [16]:
subsample_map_file(infile  = os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.map'), 
                   outfile = os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.subsample.map'), 
                   num_keep = num_subsample)

In [17]:
subsample_map_file(infile  = os.path.join('results', 'batch_4', 'EIGENSOFT', 'on_map.map'), 
                   outfile = os.path.join('results', 'batch_4', 'EIGENSOFT', 'on_map.subsample.map'), 
                   num_keep = num_subsample)

In [18]:
def subsample_DOMINANT_map_file(infile, outfile, num_keep = 1000):
    with open(infile) as INFILE:
        loci = set([int(line.split()[1].split('_')[0]) for line in INFILE])
    loci_to_keep = random.sample(loci, num_keep)
    #print(loci_to_keep)
    with open(outfile, 'w') as OUTFILE:
        with open(infile) as INFILE:
            for line in INFILE:
                current_locus = int(line.split()[1].split('_')[0])
                if current_locus in loci_to_keep:
                    OUTFILE.write(line)

In [19]:
subsample_DOMINANT_map_file(
        infile = os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.dom.map'), 
        outfile =  os.path.join('results', 'batch_4', 'EIGENSOFT', 'complete.dom.subsample.map'), 
        num_keep = num_subsample)

In [20]:
subsample_DOMINANT_map_file(
        infile = os.path.join('results', 'batch_4', 'EIGENSOFT', 'on_map.dom.map'), 
        outfile =  os.path.join('results', 'batch_4', 'EIGENSOFT', 'on_map.dom.subsample.map'), 
        num_keep = num_subsample)

In [21]:
def subsample_DOMINANT_allele_file(infile, outfile, old_map, new_map):
    with open(old_map) as OLD_MAP:
        with open(new_map) as NEW_MAP:
            OLD_LINES = OLD_MAP.readlines()
            NEW_LINES = NEW_MAP.readlines()
            lines_to_keep = ([OLD_LINES.index(xx) for xx in NEW_LINES])
    with open(infile) as INFILE:
        with open(outfile, 'w') as OUTFILE:
            line_index = 0
            for line in INFILE:
                if line_index in lines_to_keep:
                    OUTFILE.write(line)
                line_index += 1
        
            

In [22]:
subsample_DOMINANT_allele_file('/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.txt',
                               '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.txt', 
                               '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.map',
                              '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.map')

In [23]:
subsample_DOMINANT_allele_file('/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.txt',
                               '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.txt', 
                               '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.map',
                              '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.map')

### subsample the genotype file using PLINK

In [24]:
univseral_plink_commands = "--allow-extra-chr --allow-no-sex --write-snplist --autosome-num 50"

#### subsample complete

In [25]:
!plink --file results/batch_4/EIGENSOFT/complete {univseral_plink_commands} \
    --thin-count {num_subsample} --recode --out results/batch_4/EIGENSOFT/complete.codom.subsample

PLINK v1.90b3q 64-bit (29 May 2015)        https://www.cog-genomics.org/plink2
(C) 2005-2015 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to results/batch_4/EIGENSOFT/complete.codom.subsample.log.
Options in effect:
  --allow-extra-chr
  --allow-no-sex
  --autosome-num 50
  --file results/batch_4/EIGENSOFT/complete
  --out results/batch_4/EIGENSOFT/complete.codom.subsample
  --recode
  --thin-count 1214
  --write-snplist

32127 MB RAM detected; reserving 16063 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (13407 variants, 174 samples).
--file: results/batch_4/EIGENSOFT/complete.codom.subsample-temporary.bed +
results/batch_4/EIGENSOFT/complete.codom.subsample-temporary.bim +
results/batch_4/EIGENSOFT/complete.codom.subsample-temporary.fam written.
13407 variants loaded from .bim file.
174 samples (0 males, 0 females, 174 ambiguous) loaded from .fam.
Ambiguous sex IDs written to
results/batch_4/EIGENS

#### subsample on_map

In [26]:
!plink --file results/batch_4/EIGENSOFT/complete --not-chr 40 \
    --exclude results/batch_4/EIGENSOFT/paralog_SNPs.txt {univseral_plink_commands} \
    --thin-count {num_subsample} --recode --out results/batch_4/EIGENSOFT/on_map.codom.subsample

PLINK v1.90b3q 64-bit (29 May 2015)        https://www.cog-genomics.org/plink2
(C) 2005-2015 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to results/batch_4/EIGENSOFT/on_map.codom.subsample.log.
Options in effect:
  --allow-extra-chr
  --allow-no-sex
  --autosome-num 50
  --exclude results/batch_4/EIGENSOFT/paralog_SNPs.txt
  --file results/batch_4/EIGENSOFT/complete
  --not-chr 40
  --out results/batch_4/EIGENSOFT/on_map.codom.subsample
  --recode
  --thin-count 1214
  --write-snplist

32127 MB RAM detected; reserving 16063 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (7259 variants, 174 samples).
--file: results/batch_4/EIGENSOFT/on_map.codom.subsample-temporary.bed +
results/batch_4/EIGENSOFT/on_map.codom.subsample-temporary.bim +
results/batch_4/EIGENSOFT/on_map.codom.subsample-temporary.fam written.
7259 variants loaded from .bim file.
174 samples (0 males, 0 females, 174 ambiguous) loaded from

#### TODO: I want to set up a way to better manage SmartPCA runs 
I want to be able to:
   * downsample
   * subset loci by paralog/non

### Steps to run SmartPCA

   * create / subsample files

   * create input files
       * genotype file
       * snp file
       * individual file
   * create parfiles for each analysis
   * call SmartPca

Subsample the map files, retianing XX loci at random

In [27]:
#cd {os.path.join('data','batch_4','pop_genotypes')}

## Run Smartpca
Smartpca, within EIGENSOFT, is used for PCA analysis. Smartpca uses a parameter file to specify program options.

#### Notes

   * You may need to check your in input \*.map files, EIGENSOFT is excluding loci on chr=0.  If you want to retain unplaced loci, set chr (column 1) to 1.

   * smartpca also complains about the  6th column of the .ped file (phenotype) when the value is '-9' (ie. missing).  We can set this to 1 or 0 to remove this error, specifing cases or controls. see [FAQ](http://www.hsph.harvard.edu/alkes-price/eigensoft-frequently-asked-questions/)
   * Patterson (2006) notes that the normalization ('usenorm') should **not** be applied to dominance-coded data.


In [28]:
SMARTPCA_path = '/home/ipseg/Programs/EIGENSOFT/EIG6.0.1/bin/smartpca'

#### write parfile(s)

In [29]:
def function_to_write_parfiles(base_dir):
    ## TODO, this is a placeholder
    pass

In [30]:
complete_codom_PARFILE_path = '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.parfile'

with open(complete_codom_PARFILE_path, 'w') as OUTFILE:
    OUTFILE.write('''\
genotypename:  /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.ped
snpname:       /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.map
indivname:     /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/individuals.ped

evecoutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.evec
evaloutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.eval
grmoutname:      /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.rel
snpweightoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.snpweights
deletesnpoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.snpremoved
outlieroutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.ind_outliers

fastmode:      NO
usenorm:       YES
missingmode:   NO
outliermode:   2 
numchrom:      50
numoutlieriter: 5 
numoutlierevec: 10 
lsqproject: NO \
    ''')

In [31]:
complete_dom_PARFILE_path = '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.parfile'

with open(complete_dom_PARFILE_path, 'w') as OUTFILE:
    OUTFILE.write('''\
genotypename:  /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.txt
snpname:       /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.map
indivname:     /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/individuals.ped

evecoutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.evec
evaloutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.eval
grmoutname:      /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.rel
snpweightoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.snpweights
deletesnpoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.snpremoved
outlieroutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.ind_outliers

fastmode:      NO
usenorm:       NO
missingmode:   NO
outliermode:   2 
numchrom:      50
numoutlieriter: 5 
numoutlierevec: 10 
lsqproject: NO \
    ''')

In [32]:
on_map_codom_PARFILE_path = '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.parfile'

with open(on_map_codom_PARFILE_path, 'w') as OUTFILE:
    OUTFILE.write('''\
genotypename:  /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.ped
snpname:       /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.map
indivname:     /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.ped

evecoutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.evec
evaloutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.eval
grmoutname:      /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.rel
snpweightoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.snpweights
deletesnpoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.snpremoved
outlieroutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.ind_outliers

fastmode:      NO
usenorm:       YES
missingmode:   NO
outliermode:   2 
numchrom:      50
numoutlieriter: 0 
numoutlierevec: 0 
lsqproject: NO \
    ''')

In [33]:
on_map_dom_PARFILE_path = '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.parfile'

with open(on_map_dom_PARFILE_path, 'w') as OUTFILE:
    OUTFILE.write('''\
genotypename:  /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.txt
snpname:       /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.map
indivname:     /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.ped

evecoutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.evec
evaloutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.eval
grmoutname:      /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.rel
snpweightoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.snpweights
deletesnpoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.snpremoved
outlieroutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.ind_outliers

fastmode:      NO
usenorm:       NO
missingmode:   NO
outliermode:   2 
numchrom:      50
numoutlieriter: 5 
numoutlierevec: 10 
lsqproject: NO \
    ''')

In [34]:
paralogs_dom_PARFILE_path = '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/paralogs.dom.parfile'

with open(paralogs_dom_PARFILE_path, 'w') as OUTFILE:
    OUTFILE.write('''\
genotypename:  /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/paralogs.dom.txt
snpname:       /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/paralogs.dom.map
indivname:      /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.ped

evecoutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/paralogs.dom.evec
evaloutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/paralogs.dom.eval
grmoutname:      /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/paralogs.dom.rel
snpweightoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/paralogs.dom.snpweights
deletesnpoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/paralogs.dom.snpremoved
outlieroutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/paralogs.dom.ind_outliers

fastmode:      NO
usenorm:       NO
missingmode:   NO
outliermode:   2 
numchrom:      50
numoutlieriter: 5 
numoutlierevec: 10 
lsqproject: NO \
    ''')

## subsampled data

In [35]:
complete_codom_subsample_PARFILE_path = '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.subsample.parfile'

with open(complete_codom_subsample_PARFILE_path, 'w') as OUTFILE:
    OUTFILE.write('''\
genotypename:  /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.subsample.ped
snpname:       /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.subsample.map
indivname:     /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.ped

evecoutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.subsample.evec
evaloutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.subsample.eval
grmoutname:      /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.subsample.rel
snpweightoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.subsample.snpweights
deletesnpoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.subsample.snpremoved
outlieroutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.subsample.ind_outliers

fastmode:      NO
usenorm:       YES
missingmode:   NO
outliermode:   2 
numchrom:      50
numoutlieriter: 5 
numoutlierevec: 10 
lsqproject: NO \
    ''')

In [36]:
complete_dom_subsample_PARFILE_path = '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.parfile'

with open(complete_dom_subsample_PARFILE_path, 'w') as OUTFILE:
    OUTFILE.write('''\
genotypename:  /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.txt
snpname:       /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.map
indivname:     /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.ped

evecoutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.evec
evaloutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.eval
grmoutname:      /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.rel
snpweightoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.snpweights
deletesnpoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.snpremoved
outlieroutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.ind_outliers

fastmode:      NO
usenorm:       NO
missingmode:   NO
outliermode:   2 
numchrom:      50
numoutlieriter: 5 
numoutlierevec: 10 
lsqproject: NO \
    ''')

In [37]:
on_map_codom_subsample_PARFILE_path = '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.subsample.parfile'

with open(on_map_codom_subsample_PARFILE_path, 'w') as OUTFILE:
    OUTFILE.write('''\
genotypename:  /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.subsample.ped
snpname:       /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.subsample.map
indivname:     /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.ped

evecoutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.subsample.evec
evaloutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.subsample.eval
grmoutname:      /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.subsample.rel
snpweightoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.subsample.snpweights
deletesnpoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.subsample.snpremoved
outlieroutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.subsample.ind_outliers

fastmode:      NO
usenorm:       YES
missingmode:   NO
outliermode:   2 
numchrom:      50
numoutlieriter: 5 
numoutlierevec: 10 
lsqproject: NO \
    ''')

In [38]:
on_map_dom_subsample_PARFILE_path = '/home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.parfile'

with open(on_map_dom_subsample_PARFILE_path, 'w') as OUTFILE:
    OUTFILE.write('''\
genotypename:  /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.txt
snpname:       /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.map
indivname:     /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.ped

evecoutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.evec
evaloutname:    /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.eval
grmoutname:      /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.rel
snpweightoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.snpweights
deletesnpoutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.snpremoved
outlieroutname: /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.ind_outliers

fastmode:      NO
usenorm:       NO
missingmode:   NO
outliermode:   2 
numchrom:      50
numoutlieriter: 5 
numoutlierevec: 10 
lsqproject: NO \
    ''')

#### call SMARTPCA

In [39]:
!{SMARTPCA_path} -p {complete_codom_PARFILE_path} > \
 /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.logfile

In [40]:
!{SMARTPCA_path} -p {complete_dom_PARFILE_path} > \
 /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.logfile

In [41]:
!{SMARTPCA_path} -p {on_map_codom_PARFILE_path}> \
 /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.logfile

In [42]:
!{SMARTPCA_path} -p {on_map_dom_PARFILE_path} > \
 /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.logfile

In [43]:
!{SMARTPCA_path} -p {paralogs_dom_PARFILE_path}> \
 /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/paralogs.dom.logfile

#### on subsample

In [44]:
!{SMARTPCA_path} -p {complete_codom_subsample_PARFILE_path}> \
 /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.codom.subsample.logfile

In [45]:
!{SMARTPCA_path} -p {on_map_codom_subsample_PARFILE_path}> \
 /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.codom.subsample.logfile

In [46]:
!{SMARTPCA_path} -p {complete_dom_subsample_PARFILE_path}> \
 /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/complete.dom.subsample.logfile

In [47]:
!{SMARTPCA_path} -p {on_map_dom_subsample_PARFILE_path}> \
 /home/ipseg/Desktop/waples/chum_populations/results/batch_4/EIGENSOFT/on_map.dom.subsample.logfile