# Filtering .vcf file from ipyrad

This notebook details the secondary filtering I do with a .vcf directly from ipyrad. I use this notebook to:  
1) filter out individuals with greater than a certain threshold of missing data,  
3) filter out loci missing in a certain percentage of samples (note: ipyrad does this on a locus basis, but with indels and Ns there could still be sites that are missing in a lot of samples),  
4) filter for a certain minor allele frequency  
5) filter out loci with excess heterozygosity within populations based on Hardy-Weinberg equilibrium \*  
6) filter out loci significantly out of H-W equilibrium within populations  
7) filter for only biallelic SNPs  
8) use python code to select 1 SNP per GBS locus for analyses like PCA that require an "unlinked" dataset


\*Note: I set the *max_shared_Hs_locus* parameter in ipyrad to 1.0 so it does not filter for excess heterozygotes across samples. 

In [1]:
%%sh
date "+%D"

07/25/18


In [2]:
pwd

u'/home/ksilliman/Projects/Phylo_Ostrea/Analysis'

### Filtering out individuals

Removing loci with greater than 60% missing data and filtering for polymorphic biallelic SNPs. Remove BC4_13_C3 for now, need to redo assembly.

In [3]:
%%sh
suffix=Making_Files/OL-final-c85-t88-Breps
vcftools --vcf Assembly/OL-final-c85-t88-Breps-m50_outfiles/OL-final-c85-t88-Breps-m50.vcf --remove-indv BC4_13_C3 --recode \
--recode-INFO-all --max-missing 0.60 --min-alleles 2 --max-alleles 2 --out $suffix-m60


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Assembly/OL-final-c85-t88-Breps-m50_outfiles/OL-final-c85-t88-Breps-m50.vcf
	--recode-INFO-all
	--max-alleles 2
	--min-alleles 2
	--max-missing 0.6
	--out Making_Files/OL-final-c85-t88-Breps-m60
	--recode
	--remove-indv BC4_13_C3

Excluding individuals in 'exclude' list
After filtering, kept 136 out of 137 Individuals
Outputting VCF file...
After filtering, kept 72831 out of a possible 114545 Sites
Run Time = 12.00 seconds


Making a file with the percent missingness for each individual.

In [4]:
%%sh
suffix=Making_Files/OL-final-c85-t88-Breps-m60
vcftools --vcf $suffix.recode.vcf --missing-indv --out $suffix


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/OL-final-c85-t88-Breps-m60.recode.vcf
	--missing-indv
	--out Making_Files/OL-final-c85-t88-Breps-m60

After filtering, kept 136 out of 136 Individuals
Outputting Individual Missingness
After filtering, kept 72831 out of a possible 72831 Sites
Run Time = 1.00 seconds


In [5]:
%%sh
suffix=Making_Files/OL-final-c85-t88-Breps-m60
head $suffix.imiss

INDV	N_DATA	N_GENOTYPES_FILTERED	N_MISS	F_MISS
BC1_10_C6	72831	0	4173	0.057297
BC1_11_C2	72831	0	48234	0.662273
BC1_12_C4	72831	0	42160	0.578874
BC1_1_C2	72831	0	45598	0.62608
BC1_20_C6	72831	0	6472	0.0888633
BC1_22_C7	72831	0	2267	0.0311269
BC1_4_C3	72831	0	6824	0.0936964
BC1_7_C5	72831	0	23808	0.326894
BC1_8_C4	72831	0	15100	0.207329


Plot the percent missingness across individuals. Need to redo without plotly.

In [6]:
import numpy as np
import matplotlib.pyplot as plt
import plotly.plotly as py
import plotly.graph_objs as go
py.sign_in('ksil91', 'ycvvzZQxVMU8Sg9wVQBH')
import pandas

In [9]:
imiss = np.genfromtxt('Making_Files/OL-final-c85-t88-Breps-m60.imiss', names=True,dtype=None)
data = [
    go.Histogram(
        x=imiss[["F_MISS"]],autobinx = False,
        xbins=dict(
            start=0,
            end=1,
            size=0.05
        )
    )
]
layout = go.Layout(
    title='Proportion missing data, ',bargap=0.1)
fig = go.Figure(data=data,layout=layout)
py.iplot(fig)

PlotlyRequestError: Account limit reached: Your account is limited to creating 25 charts. To continue, you can override or delete existing charts or you can upgrade your account at: https://plot.ly/products/cloud

Get a list of individuals missing data at fewer than 50% of sites.

In [35]:
imissDF = pandas.DataFrame(imiss)
imissDF[imissDF.F_MISS > 0.50].INDV.to_csv("Making_Files/OL-c85-t88-Breps_imiss50.txt",sep=" ",index=False)
len(imissDF[imissDF.F_MISS < 0.50].INDV.values)

105

In [36]:
#Individuals to be excluded
%cat Making_Files/OL-c85-t88-Breps_imiss50.txt

BC1_11_C2
BC1_12_C4
BC1_1_C2
BC2_3_C7
BC2_6_C2
BC3_1_C2
BC3_9_C2
BC4_6_C2
BC4_7_C2
BC4_9_C2
CA1_5_C5
CA1_9_C7
CA2_10_C3
CA3_4_C1
CA4_16_C5
CA5_8_C1
CA6_13_C1
CA6_2_C1
CA7_5_C1
OR1_11_C1
OR1_5_C4
WA10_15_C5
WA10_2_C2
WA10_8_C2
WA11_13_C5
WA11_20_C5
WA13_2_C2
WA1_10_C3
WA1_11_C6
WA1_3_C7
WA9_8_C3


Use vcftools to remove individuals with greater than 50% missingness. Also filter for polymorphic loci and loci found in at least 80% of individuals. This is the full SNP dataset, before filtering for minor allele frequency and Hardy-Weinberg equilibrium.

In [37]:
%%sh
suffix=Making_Files/OL-final-c85-t88-Breps
vcftools --vcf Assembly/OL-final-c85-t88-Breps-m50_outfiles/OL-final-c85-t88-Breps-m50.vcf --remove-indv BC4_13_C3 \
--remove Making_Files/OL-c85-t88-Breps_imiss50.txt --recode --recode-INFO-all \
--min-alleles 2 --max-alleles 2 --max-missing 0.80 \
--out ${suffix}-m80x50


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Assembly/OL-final-c85-t88-Breps-m50_outfiles/OL-final-c85-t88-Breps-m50.vcf
	--remove Making_Files/OL-c85-t88-Breps_imiss50.txt
	--recode-INFO-all
	--max-alleles 2
	--min-alleles 2
	--max-missing 0.8
	--out Making_Files/OL-final-c85-t88-Breps-m80x50
	--recode
	--remove-indv BC4_13_C3

Excluding individuals in 'exclude' list
After filtering, kept 105 out of 137 Individuals
Outputting VCF file...
After filtering, kept 48844 out of a possible 114545 Sites
Run Time = 6.00 seconds


Write files with the sample name and either population (.pop) or all strata info (.strata). These are used for the heterozygosity filtering downstream and to convert the .vcf file to a .str file in PGD Spider. It is dependent on the samples having their population as the first part of their name, separated by an underscore. It uses the .imiss file created earlier, so it does include some samples that have been filtered out. This is not a problem for downstream analyses.

In [47]:
IN = open("Making_Files/OL-final-c85-t88-Breps-m60.imiss","r")
OUT_strata = open("Making_Files/OL-c85-t88-Breps.strata","w")
OUT_pop = open("Making_Files/OL-c85-t88-Breps.pop","w")
loc_dict = {'BC1':['Victoria', 'Victoria', 'Puget+Victoria', '48.435667', '-123.377909'],
            'BC2':['Klaskino', 'Klaskino', 'NWBC', '50.298667', '-127.723633'],
            'BC3':['Barkeley_Sound', 'Barkeley_Sound', 'NWBC', '49.01585', '-125.314167'],
            'BC4': ['Ladysmith', 'Ladysmith', 'Ladysmith', '49.011383' ,'-123.8357'],
            'WA12': ['Discovery_Bay', 'Discovery_Bay', 'Puget+Victoria', '47.9978', '-122.8824'],
            'WA11': ['Liberty_Bay', 'Liberty_Bay', 'Puget+Victoria', '47.7375', '-122.6507'],
            'WA13': ['North_Bay', 'North_Bay', 'Puget+Victoria', '47.3925', '-122.8138'],
            'WA10': ['Triton_Cove', 'Triton_Cove', 'Puget+Victoria', '47.6131', '-122.982'],
            'WA1': ['Willapa', 'North_Willapa', 'Willapa', '46.624772', '-123.9887916'],
            'WA9': ['Willapa', 'South_Willapa', 'Willapa', '46.44', '-124.004'],
            'OR3': ['Netarts', 'Netarts', 'Oregon', '45.3911556', '-123.9559028'],
            'OR2': ['Yaquina', 'Yaquina', 'Oregon', '44.579539', '-123.995749'],
            'OR1': ['Coos', 'Coos', 'Willapa', '43.3559861', '-124.1931639'],
            'CA6': ['Humboldt', 'Humboldt', 'NoCal', '40.8557972', '-124.0974611'],
            'CA4': ['Tomales', 'Tomales', 'NoCal', '38.117549', '-122.874497'],
            'CA2': ['San_Francisco', 'SF_PointOrient', 'NoCal', '37.955067', '-122.421800'],
            'CA3': ['San_Francisco', 'SF_Candlestick', 'NoCal', '37.708665', '-122.377607'],
            'CA5': ['Elkhorn_Slough', 'Elkhorn_Slough', 'NoCal', '36.8398194', '-121.7427806'],
            'CA7': ['Mugu_Lagoon', 'Mugu_Lagoon', 'SoCal', '34.101914', '-119.10434'],
            'CA1': ['San_Diego', 'San_Diego', 'SoCal', '32.602500', '-117.118889']}
IN.next()
OUT_strata.write("INDIVIDUALS\tSTRATA\tLOCATION\tREGION\tLATITUDE\tLONGITUDE\tLIBRARY\n")
for line in IN:
    name = line.split()[0]
    pop = name.split("_")[0]
    library = name.split("_")[2]
    OUT_strata.write(name+"\t"+'\t'.join(map(str,loc_dict[pop]))+"\t"+library+"\n")
    OUT_pop.write(name+"\t"+loc_dict[pop][0]+"\n")
    
IN.close()
OUT_strata.close()
OUT_pop.close()

In [39]:
%%sh
head -n 15 Making_Files/OL-c85-t88-Breps.strata

INDIVIDUALS	STRATA	LOCATION	REGION	NS	LATITUDE	LONGITUDE	LIBRARY
BC1_10_C6	Victoria	Victoria	Puget+	North	48.435667	-123.377909	C6
BC1_11_C2	Victoria	Victoria	Puget+	North	48.435667	-123.377909	C2
BC1_12_C4	Victoria	Victoria	Puget+	North	48.435667	-123.377909	C4
BC1_1_C2	Victoria	Victoria	Puget+	North	48.435667	-123.377909	C2
BC1_20_C6	Victoria	Victoria	Puget+	North	48.435667	-123.377909	C6
BC1_22_C7	Victoria	Victoria	Puget+	North	48.435667	-123.377909	C7
BC1_4_C3	Victoria	Victoria	Puget+	North	48.435667	-123.377909	C3
BC1_7_C5	Victoria	Victoria	Puget+	North	48.435667	-123.377909	C5
BC1_8_C4	Victoria	Victoria	Puget+	North	48.435667	-123.377909	C4
BC1_9_C5	Victoria	Victoria	Puget+	North	48.435667	-123.377909	C5
BC2_10_C5	Klaskino	Klaskino	NW_BC	North	50.298667	-127.723633	C5
BC2_11_C5	Klaskino	Klaskino	NW_BC	North	50.298667	-127.723633	C5
BC2_12_C6	Klaskino	Klaskino	NW_BC	North	50.298667	-127.723633	C6
BC2_13_C4	Klaskino	Klaskino	NW_BC	North	50.298667	-127.723633	C4


### Filtering loci by departures from Hardy-Weinberg

Here I filter out loci with excess heterozygosity in at least 2 populations based on Hardy-Weinberg equilibrium and a p-value cutoff of 0.05. It takes a .vcf file and the .pop file just created as input. This uses a slightly modified script from [Jon Puritz's Github](https://github.com/jpuritz/dDocent/blob/master/scripts/filter_hwe_by_pop.pl), written by Chris Hollenbeck. My modified script is in my Github.

In [40]:
%%sh
suffix=Making_Files/OL-final-c85-t88-Breps-m80x50
mkdir Making_Files/hwe_m80x50
#Filtering out loci that depart HWE in at least 2 populations with a p-value cutoff of 0.05
../Methods/Scripts/filter_hwe_by_pop.pl -v $suffix.recode.vcf \
-p Making_Files/OL-c85-t88-Breps.pop -h 0.05 -c 0.16 -o $suffix-hwPbi
mv *.hwe Making_Files/hwe_m80x50
rm *.inds

Processing population: Barkeley_Sound (7 inds)
Processing population: Coos (7 inds)
Processing population: Discovery_Bay (4 inds)
Processing population: Elkhorn_Slough (6 inds)
Processing population: Humboldt (8 inds)
Processing population: Klaskino (10 inds)
Processing population: Ladysmith (9 inds)
Processing population: Liberty_Bay (8 inds)
Processing population: Mugu_Lagoon (10 inds)
Processing population: Netarts (7 inds)
Processing population: North_Bay (6 inds)
Processing population: San_Diego (8 inds)
Processing population: San_Francisco (6 inds)
Processing population: Tomales (5 inds)
Processing population: Triton_Cove (10 inds)
Processing population: Victoria (10 inds)
Processing population: Willapa (9 inds)
Processing population: Yaquina (6 inds)
Outputting results of HWE test for filtered loci to 'filtered.hwe'
Kept 48796 of a possible 48844 loci (filtered 48 loci)


The script only filters on a site-by-site basis. In order to throw out any loci that had a SNP with excess heterozygosity (as these may be paralogs), I make a file with the locus ids to then submit to vcftools.

In [41]:
#Make files of bad loci (that have at least one site with excess heterozygotes) to copy/paste in vcftools. 
IN = open('Making_Files/hwe_m80x50/exclude.hwe', "r")
OUT = open('Making_Files/hwe_m80x50/badchrom.txt', "w")
exset = set()
for line in IN:
    chrom = line.split()[0]
    if chrom not in exset:
        exset.add(chrom)
        OUT.write(" --not-chr "+str(chrom))
OUT.close()
IN.close()
print "m80x50: "+str(len(exset))

m80x50: 43


Vcftools won't take a file of locus names to remove (which is annoying), so I copy and paste the locus names with a --not-chr call into vcftools.

In [42]:
%cat Making_Files/hwe_m80x50/badchrom.txt

 --not-chr locus_207453 --not-chr locus_276810 --not-chr locus_42160 --not-chr locus_29279 --not-chr locus_127570 --not-chr locus_211763 --not-chr locus_40333 --not-chr locus_423422 --not-chr locus_52606 --not-chr locus_380560 --not-chr locus_172864 --not-chr locus_153744 --not-chr locus_129120 --not-chr locus_380572 --not-chr locus_401349 --not-chr locus_125726 --not-chr locus_218417 --not-chr locus_357318 --not-chr locus_220030 --not-chr locus_151068 --not-chr locus_253330 --not-chr locus_357576 --not-chr locus_18461 --not-chr locus_255531 --not-chr locus_101129 --not-chr locus_313478 --not-chr locus_355504 --not-chr locus_347558 --not-chr locus_346121 --not-chr locus_379373 --not-chr locus_131141 --not-chr locus_309065 --not-chr locus_55452 --not-chr locus_140939 --not-chr locus_92709 --not-chr locus_204332 --not-chr locus_357682 --not-chr locus_206197 --not-chr locus_217492 --not-chr locus_280192 --not-chr locus_97057 --not-chr locus_423209 --not-chr locus_48250

In [43]:
%%sh
#-filt are all SNPs, not filtering for minor allele frequency
suffix=Making_Files/OL-final-c85-t88-Breps-m80x50
vcftools --vcf $suffix-hwPbi.recode.vcf --recode --recode-INFO-all --not-chr locus_207453 --not-chr locus_276810 --not-chr locus_42160 --not-chr locus_29279 --not-chr locus_127570 --not-chr locus_211763 --not-chr locus_40333 --not-chr locus_423422 --not-chr locus_52606 --not-chr locus_380560 --not-chr locus_172864 --not-chr locus_153744 --not-chr locus_129120 --not-chr locus_380572 --not-chr locus_401349 --not-chr locus_125726 --not-chr locus_218417 --not-chr locus_357318 --not-chr locus_220030 --not-chr locus_151068 --not-chr locus_253330 --not-chr locus_357576 --not-chr locus_18461 --not-chr locus_255531 --not-chr locus_101129 --not-chr locus_313478 --not-chr locus_355504 --not-chr locus_347558 --not-chr locus_346121 --not-chr locus_379373 --not-chr locus_131141 --not-chr locus_309065 --not-chr locus_55452 --not-chr locus_140939 --not-chr locus_92709 --not-chr locus_204332 --not-chr locus_357682 --not-chr locus_206197 --not-chr locus_217492 --not-chr locus_280192 --not-chr locus_97057 --not-chr locus_423209 --not-chr locus_48250 \
--max-alleles 2 --min-alleles 2 --out $suffix-filt


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/OL-final-c85-t88-Breps-m80x50-hwPbi.recode.vcf
	--not-chr locus_101129
	--not-chr locus_125726
	--not-chr locus_127570
	--not-chr locus_129120
	--not-chr locus_131141
	--not-chr locus_140939
	--not-chr locus_151068
	--not-chr locus_153744
	--not-chr locus_172864
	--not-chr locus_18461
	--not-chr locus_204332
	--not-chr locus_206197
	--not-chr locus_207453
	--not-chr locus_211763
	--not-chr locus_217492
	--not-chr locus_218417
	--not-chr locus_220030
	--not-chr locus_253330
	--not-chr locus_255531
	--not-chr locus_276810
	--not-chr locus_280192
	--not-chr locus_29279
	--not-chr locus_309065
	--not-chr locus_313478
	--not-chr locus_346121
	--not-chr locus_347558
	--not-chr locus_355504
	--not-chr locus_357318
	--not-chr locus_357576
	--not-chr locus_357682
	--not-chr locus_379373
	--not-chr locus_380560
	--not-chr locus_380572
	--not-chr locus_401349
	--not-chr locus_40333
	--not-

In [44]:
%%sh
#-maf025 is filter for minor allele frequency of 2.5%
suffix=Making_Files/OL-final-c85-t88-Breps-m80x50
vcftools --vcf $suffix-hwPbi.recode.vcf --recode --recode-INFO-all --not-chr locus_207453 --not-chr locus_276810 --not-chr locus_42160 --not-chr locus_29279 --not-chr locus_127570 --not-chr locus_211763 --not-chr locus_40333 --not-chr locus_423422 --not-chr locus_52606 --not-chr locus_380560 --not-chr locus_172864 --not-chr locus_153744 --not-chr locus_129120 --not-chr locus_380572 --not-chr locus_401349 --not-chr locus_125726 --not-chr locus_218417 --not-chr locus_357318 --not-chr locus_220030 --not-chr locus_151068 --not-chr locus_253330 --not-chr locus_357576 --not-chr locus_18461 --not-chr locus_255531 --not-chr locus_101129 --not-chr locus_313478 --not-chr locus_355504 --not-chr locus_347558 --not-chr locus_346121 --not-chr locus_379373 --not-chr locus_131141 --not-chr locus_309065 --not-chr locus_55452 --not-chr locus_140939 --not-chr locus_92709 --not-chr locus_204332 --not-chr locus_357682 --not-chr locus_206197 --not-chr locus_217492 --not-chr locus_280192 --not-chr locus_97057 --not-chr locus_423209 --not-chr locus_48250 \
--max-alleles 2 --min-alleles 2 --maf 0.025 --out $suffix-maf025


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/OL-final-c85-t88-Breps-m80x50-hwPbi.recode.vcf
	--not-chr locus_101129
	--not-chr locus_125726
	--not-chr locus_127570
	--not-chr locus_129120
	--not-chr locus_131141
	--not-chr locus_140939
	--not-chr locus_151068
	--not-chr locus_153744
	--not-chr locus_172864
	--not-chr locus_18461
	--not-chr locus_204332
	--not-chr locus_206197
	--not-chr locus_207453
	--not-chr locus_211763
	--not-chr locus_217492
	--not-chr locus_218417
	--not-chr locus_220030
	--not-chr locus_253330
	--not-chr locus_255531
	--not-chr locus_276810
	--not-chr locus_280192
	--not-chr locus_29279
	--not-chr locus_309065
	--not-chr locus_313478
	--not-chr locus_346121
	--not-chr locus_347558
	--not-chr locus_355504
	--not-chr locus_357318
	--not-chr locus_357576
	--not-chr locus_357682
	--not-chr locus_379373
	--not-chr locus_380560
	--not-chr locus_380572
	--not-chr locus_401349
	--not-chr locus_40333
	--not-

### Subset one SNP per GBS locus

In [25]:
## Code to subset one SNP per GBS locus from a VCF file. Chooses the SNP
## with the highest sample coverage. If there is a tie, chooses the 1st SNP in the loci. (may change to random)
## May be specific to VCF format output from ipyrad.
## This is also in script format in Github as subsetSNPs.py

def subsetSNPs(inputfile,outputfile):
    import linecache
    locidict = {}
    lineNum = []
    IN = open(inputfile, "r")
    OUT = open(outputfile, "w")

    n = 1
    for line in IN:
        if "#" not in line:
            linelist = line.split()
            loci = linelist[0]
            #Column 7 is INFO column of VCF file
            NS = float(linelist[7].split(";")[0].split("=")[1])
            if loci not in locidict.keys():
                locidict[loci] = [NS,n]
            else:
                if locidict[loci][0] < NS:
                    locidict[loci] = [NS,n]
        else:
            OUT.write(line)
        n += 1
    IN.close()
    print("Total SNPS: "+str(n)+"\nUnlinked SNPs: "+str(len(locidict.keys())))

    for locus in sorted(locidict.keys()):
        line = linecache.getline(inputfile, locidict[locus][1])
        OUT.write(line)
    OUT.close()


In [45]:
#No maf filtering
infile = "Making_Files/OL-final-c85-t88-Breps-m80x50-filt.recode.vcf"
outfile = "Inputs/OL-c85t8-m80x50-u.vcf"
subsetSNPs(infile,outfile)
#Maf of 2.5%
infile = "Making_Files/OL-final-c85-t88-Breps-m80x50-maf025.recode.vcf"
outfile = "Inputs/OL-c85t8-m80x50-maf025-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 48567
Unlinked SNPs: 11192
Total SNPS: 17027
Unlinked SNPs: 7414


## Making an outlier only .vcf
Once I identified outlier loci, I now use VCFtools to filter those SNPs.
### Union

In [57]:
IN = open('Outlier/m80x50maf025filt-pcaQ_OF_BS-union.snp', "r")
INL = open('Outlier/m80x50maf025filt-pcaQ_OF_BS-union.txt',"r")
OUT = open('Making_Files/m80x50maf025filt-pcaQ_OF_BS-union.badchrom', "w")
OUTg = open('Making_Files/m80x50maf025filt-pcaQ_OF_BS-union.goodchrom', "w")
exset = set()
for line in INL:
    chrom = line.strip()
    if chrom not in exset:
        exset.add(chrom)
        OUT.write(" --not-chr locus_"+str(chrom))
print len(exset)
x = 0
for line in IN:
    chrom = line.strip().split("_")[1]
    snp = line.strip().split("_")[3]
    OUTg.write("locus_"+chrom+"\t"+snp+"\n")
    x += 1
print x
OUT.close()
OUTg.close()
IN.close()
INL.close()

576
854


In [None]:
%cat Making_Files/m80x50maf025filt-pcaQ_OF_BS-union.badchrom

In [None]:
%%sh
suffix=Making_Files/OL-final-c85-t88-Breps-m80x50-maf025
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all --not-chr locus_100568 --not-chr locus_100957 --not-chr locus_128049 --not-chr locus_202687 --not-chr locus_214762 --not-chr locus_217068 --not-chr locus_217430 --not-chr locus_22471 --not-chr locus_23130 --not-chr locus_28586 --not-chr locus_29312 --not-chr locus_308418 --not-chr locus_34532 --not-chr locus_346887 --not-chr locus_356067 --not-chr locus_35810 --not-chr locus_370301 --not-chr locus_38660 --not-chr locus_421469 --not-chr locus_100043 --not-chr locus_100364 --not-chr locus_100411 --not-chr locus_100504 --not-chr locus_100808 --not-chr locus_100960 --not-chr locus_10098 --not-chr locus_101215 --not-chr locus_102571 --not-chr locus_108914 --not-chr locus_109152 --not-chr locus_111738 --not-chr locus_112451 --not-chr locus_113470 --not-chr locus_114599 --not-chr locus_116394 --not-chr locus_117964 --not-chr locus_11853 --not-chr locus_119053 --not-chr locus_119579 --not-chr locus_123136 --not-chr locus_124098 --not-chr locus_124135 --not-chr locus_124358 --not-chr locus_125648 --not-chr locus_126964 --not-chr locus_127011 --not-chr locus_127647 --not-chr locus_127763 --not-chr locus_128645 --not-chr locus_128722 --not-chr locus_128892 --not-chr locus_129172 --not-chr locus_129561 --not-chr locus_130727 --not-chr locus_130733 --not-chr locus_135065 --not-chr locus_135149 --not-chr locus_136543 --not-chr locus_138836 --not-chr locus_139083 --not-chr locus_139609 --not-chr locus_140076 --not-chr locus_142035 --not-chr locus_142044 --not-chr locus_142778 --not-chr locus_145002 --not-chr locus_147430 --not-chr locus_148130 --not-chr locus_148498 --not-chr locus_148663 --not-chr locus_15031 --not-chr locus_151776 --not-chr locus_153087 --not-chr locus_155580 --not-chr locus_155720 --not-chr locus_156010 --not-chr locus_157034 --not-chr locus_157162 --not-chr locus_157193 --not-chr locus_157620 --not-chr locus_158201 --not-chr locus_158317 --not-chr locus_158825 --not-chr locus_159067 --not-chr locus_159889 --not-chr locus_159941 --not-chr locus_160301 --not-chr locus_160524 --not-chr locus_161375 --not-chr locus_161778 --not-chr locus_163432 --not-chr locus_165360 --not-chr locus_168170 --not-chr locus_168716 --not-chr locus_169007 --not-chr locus_169036 --not-chr locus_171090 --not-chr locus_176767 --not-chr locus_176971 --not-chr locus_177822 --not-chr locus_178139 --not-chr locus_182349 --not-chr locus_187593 --not-chr locus_188623 --not-chr locus_189683 --not-chr locus_190601 --not-chr locus_190976 --not-chr locus_192639 --not-chr locus_196985 --not-chr locus_197922 --not-chr locus_202226 --not-chr locus_203381 --not-chr locus_203647 --not-chr locus_203738 --not-chr locus_204061 --not-chr locus_204175 --not-chr locus_204364 --not-chr locus_204421 --not-chr locus_204530 --not-chr locus_205521 --not-chr locus_205564 --not-chr locus_206044 --not-chr locus_206410 --not-chr locus_206625 --not-chr locus_206668 --not-chr locus_206785 --not-chr locus_206906 --not-chr locus_207071 --not-chr locus_207632 --not-chr locus_207914 --not-chr locus_211071 --not-chr locus_211464 --not-chr locus_212017 --not-chr locus_213868 --not-chr locus_21391 --not-chr locus_213983 --not-chr locus_21411 --not-chr locus_214825 --not-chr locus_215150 --not-chr locus_215183 --not-chr locus_21551 --not-chr locus_216159 --not-chr locus_216347 --not-chr locus_216622 --not-chr locus_216782 --not-chr locus_217386 --not-chr locus_218451 --not-chr locus_218462 --not-chr locus_218611 --not-chr locus_21970 --not-chr locus_219795 --not-chr locus_219854 --not-chr locus_22028 --not-chr locus_22053 --not-chr locus_22067 --not-chr locus_220773 --not-chr locus_222016 --not-chr locus_222895 --not-chr locus_22340 --not-chr locus_224420 --not-chr locus_22483 --not-chr locus_22552 --not-chr locus_22578 --not-chr locus_226298 --not-chr locus_226633 --not-chr locus_227395 --not-chr locus_227598 --not-chr locus_22856 --not-chr locus_23042 --not-chr locus_23289 --not-chr locus_233438 --not-chr locus_235514 --not-chr locus_23563 --not-chr locus_236163 --not-chr locus_23707 --not-chr locus_23820 --not-chr locus_238327 --not-chr locus_238895 --not-chr locus_239109 --not-chr locus_24203 --not-chr locus_243326 --not-chr locus_24478 --not-chr locus_24745 --not-chr locus_247498 --not-chr locus_248943 --not-chr locus_249277 --not-chr locus_249586 --not-chr locus_249990 --not-chr locus_250285 --not-chr locus_250740 --not-chr locus_251406 --not-chr locus_252249 --not-chr locus_252994 --not-chr locus_25365 --not-chr locus_254320 --not-chr locus_254642 --not-chr locus_25585 --not-chr locus_25667 --not-chr locus_258392 --not-chr locus_259858 --not-chr locus_260072 --not-chr locus_262003 --not-chr locus_263310 --not-chr locus_26421 --not-chr locus_264373 --not-chr locus_26502 --not-chr locus_265624 --not-chr locus_26845 --not-chr locus_26921 --not-chr locus_269318 --not-chr locus_27026 --not-chr locus_270943 --not-chr locus_27170 --not-chr locus_273091 --not-chr locus_273990 --not-chr locus_274111 --not-chr locus_274413 --not-chr locus_277053 --not-chr locus_277560 --not-chr locus_277576 --not-chr locus_278553 --not-chr locus_278643 --not-chr locus_27944 --not-chr locus_27989 --not-chr locus_281304 --not-chr locus_283589 --not-chr locus_284643 --not-chr locus_285131 --not-chr locus_28767 --not-chr locus_289083 --not-chr locus_28951 --not-chr locus_29026 --not-chr locus_290498 --not-chr locus_290543 --not-chr locus_290548 --not-chr locus_290600 --not-chr locus_295177 --not-chr locus_298293 --not-chr locus_29879 --not-chr locus_29894 --not-chr locus_299660 --not-chr locus_303036 --not-chr locus_303501 --not-chr locus_303695 --not-chr locus_304188 --not-chr locus_304265 --not-chr locus_304596 --not-chr locus_305839 --not-chr locus_306647 --not-chr locus_306754 --not-chr locus_308637 --not-chr locus_308832 --not-chr locus_30905 --not-chr locus_309216 --not-chr locus_309357 --not-chr locus_309540 --not-chr locus_309854 --not-chr locus_309961 --not-chr locus_310690 --not-chr locus_310907 --not-chr locus_311109 --not-chr locus_311248 --not-chr locus_311295 --not-chr locus_312053 --not-chr locus_312763 --not-chr locus_313145 --not-chr locus_313251 --not-chr locus_313299 --not-chr locus_313936 --not-chr locus_31473 --not-chr locus_316248 --not-chr locus_317962 --not-chr locus_31875 --not-chr locus_32128 --not-chr locus_321861 --not-chr locus_32386 --not-chr locus_325982 --not-chr locus_326653 --not-chr locus_327475 --not-chr locus_328619 --not-chr locus_33006 --not-chr locus_330282 --not-chr locus_330491 --not-chr locus_331870 --not-chr locus_332300 --not-chr locus_332625 --not-chr locus_33272 --not-chr locus_333236 --not-chr locus_33342 --not-chr locus_335315 --not-chr locus_336143 --not-chr locus_33617 --not-chr locus_33681 --not-chr locus_338116 --not-chr locus_33961 --not-chr locus_340260 --not-chr locus_340582 --not-chr locus_34068 --not-chr locus_341658 --not-chr locus_341756 --not-chr locus_344579 --not-chr locus_344766 --not-chr locus_34581 --not-chr locus_345922 --not-chr locus_346248 --not-chr locus_347990 --not-chr locus_348611 --not-chr locus_34877 --not-chr locus_352893 --not-chr locus_354963 --not-chr locus_354978 --not-chr locus_35529 --not-chr locus_355309 --not-chr locus_355642 --not-chr locus_355723 --not-chr locus_355940 --not-chr locus_356037 --not-chr locus_356136 --not-chr locus_356170 --not-chr locus_35619 --not-chr locus_356425 --not-chr locus_356944 --not-chr locus_357468 --not-chr locus_357930 --not-chr locus_358031 --not-chr locus_358060 --not-chr locus_358511 --not-chr locus_358569 --not-chr locus_358571 --not-chr locus_358852 --not-chr locus_359245 --not-chr locus_361253 --not-chr locus_363103 --not-chr locus_363802 --not-chr locus_364232 --not-chr locus_365216 --not-chr locus_36554 --not-chr locus_367642 --not-chr locus_368612 --not-chr locus_37060 --not-chr locus_37161 --not-chr locus_37284 --not-chr locus_37373 --not-chr locus_37494 --not-chr locus_379975 --not-chr locus_380069 --not-chr locus_380430 --not-chr locus_38077 --not-chr locus_381233 --not-chr locus_382237 --not-chr locus_382688 --not-chr locus_382965 --not-chr locus_383242 --not-chr locus_384589 --not-chr locus_38491 --not-chr locus_385266 --not-chr locus_385558 --not-chr locus_385656 --not-chr locus_38587 --not-chr locus_386273 --not-chr locus_386278 --not-chr locus_38641 --not-chr locus_386802 --not-chr locus_387661 --not-chr locus_387784 --not-chr locus_388113 --not-chr locus_388514 --not-chr locus_388998 --not-chr locus_389053 --not-chr locus_389163 --not-chr locus_389273 --not-chr locus_389953 --not-chr locus_390262 --not-chr locus_390335 --not-chr locus_390632 --not-chr locus_390965 --not-chr locus_391048 --not-chr locus_39129 --not-chr locus_39221 --not-chr locus_392380 --not-chr locus_392485 --not-chr locus_392753 --not-chr locus_393164 --not-chr locus_393668 --not-chr locus_393858 --not-chr locus_396273 --not-chr locus_396693 --not-chr locus_397334 --not-chr locus_397681 --not-chr locus_397723 --not-chr locus_397793 --not-chr locus_397896 --not-chr locus_398552 --not-chr locus_39876 --not-chr locus_399614 --not-chr locus_399663 --not-chr locus_400112 --not-chr locus_400399 --not-chr locus_40066 --not-chr locus_401671 --not-chr locus_401730 --not-chr locus_402417 --not-chr locus_403155 --not-chr locus_403249 --not-chr locus_403442 --not-chr locus_40345 --not-chr locus_403562 --not-chr locus_403859 --not-chr locus_404121 --not-chr locus_40418 --not-chr locus_404788 --not-chr locus_405822 --not-chr locus_406512 --not-chr locus_406656 --not-chr locus_40665 --not-chr locus_407089 --not-chr locus_40729 --not-chr locus_40749 --not-chr locus_40753 --not-chr locus_407635 --not-chr locus_407813 --not-chr locus_407965 --not-chr locus_408106 --not-chr locus_408130 --not-chr locus_408460 --not-chr locus_40973 --not-chr locus_409748 --not-chr locus_410306 --not-chr locus_410591 --not-chr locus_410618 --not-chr locus_41301 --not-chr locus_41325 --not-chr locus_414770 --not-chr locus_415004 --not-chr locus_415225 --not-chr locus_416619 --not-chr locus_418710 --not-chr locus_419663 --not-chr locus_42009 --not-chr locus_421053 --not-chr locus_421522 --not-chr locus_421661 --not-chr locus_421678 --not-chr locus_421916 --not-chr locus_421923 --not-chr locus_422008 --not-chr locus_422034 --not-chr locus_422166 --not-chr locus_422230 --not-chr locus_422436 --not-chr locus_422492 --not-chr locus_422879 --not-chr locus_422894 --not-chr locus_423346 --not-chr locus_423769 --not-chr locus_42633 --not-chr locus_45456 --not-chr locus_46486 --not-chr locus_50025 --not-chr locus_50420 --not-chr locus_51245 --not-chr locus_51814 --not-chr locus_53585 --not-chr locus_54198 --not-chr locus_55987 --not-chr locus_6537 --not-chr locus_65465 --not-chr locus_67593 --not-chr locus_69127 --not-chr locus_70871 --not-chr locus_71617 --not-chr locus_71822 --not-chr locus_72896 --not-chr locus_74406 --not-chr locus_76315 --not-chr locus_77644 --not-chr locus_81257 --not-chr locus_84263 --not-chr locus_86381 --not-chr locus_86941 --not-chr locus_87567 --not-chr locus_93224 --not-chr locus_93836 --not-chr locus_94096 --not-chr locus_94509 --not-chr locus_95374 --not-chr locus_95762 --not-chr locus_95912 --not-chr locus_96385 --not-chr locus_96504 --not-chr locus_96660 --not-chr locus_96739 --not-chr locus_96877 --not-chr locus_99833 --not-chr locus_99874 --not-chr locus_100164 --not-chr locus_100799 --not-chr locus_101638 --not-chr locus_101905 --not-chr locus_122972 --not-chr locus_125825 --not-chr locus_127315 --not-chr locus_139215 --not-chr locus_141297 --not-chr locus_149622 --not-chr locus_151147 --not-chr locus_151549 --not-chr locus_154256 --not-chr locus_154465 --not-chr locus_154915 --not-chr locus_157027 --not-chr locus_165391 --not-chr locus_165850 --not-chr locus_166705 --not-chr locus_167415 --not-chr locus_172575 --not-chr locus_182197 --not-chr locus_186436 --not-chr locus_187544 --not-chr locus_192199 --not-chr locus_205384 --not-chr locus_206271 --not-chr locus_206297 --not-chr locus_206599 --not-chr locus_210978 --not-chr locus_21400 --not-chr locus_216893 --not-chr locus_223485 --not-chr locus_224630 --not-chr locus_228030 --not-chr locus_238442 --not-chr locus_24005 --not-chr locus_244176 --not-chr locus_257367 --not-chr locus_276886 --not-chr locus_28409 --not-chr locus_298193 --not-chr locus_307014 --not-chr locus_30713 --not-chr locus_30965 --not-chr locus_310097 --not-chr locus_310461 --not-chr locus_312479 --not-chr locus_312607 --not-chr locus_312754 --not-chr locus_32356 --not-chr locus_327047 --not-chr locus_333629 --not-chr locus_334079 --not-chr locus_335010 --not-chr locus_344955 --not-chr locus_34539 --not-chr locus_34543 --not-chr locus_349673 --not-chr locus_355847 --not-chr locus_356043 --not-chr locus_356426 --not-chr locus_35825 --not-chr locus_362562 --not-chr locus_37570 --not-chr locus_380646 --not-chr locus_382951 --not-chr locus_38335 --not-chr locus_384012 --not-chr locus_392024 --not-chr locus_39782 --not-chr locus_400056 --not-chr locus_40141 --not-chr locus_402943 --not-chr locus_41343 --not-chr locus_42075 --not-chr locus_45729 --not-chr locus_55869 --not-chr locus_78715 --not-chr locus_83874 --not-chr locus_96906 \
--max-alleles 2 --min-alleles 2 --out $suffix-neutU

1631 SNPs excluded, 15,384 left

In [60]:
%%sh
suffix=Making_Files/OL-final-c85-t88-Breps-m80x50-maf025
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all --positions Making_Files/m80x50maf025filt-pcaQ_OF_BS-union.goodchrom \
--max-alleles 2 --min-alleles 2 --out $suffix-outU


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/OL-final-c85-t88-Breps-m80x50-maf025.recode.vcf
	--recode-INFO-all
	--max-alleles 2
	--min-alleles 2
	--out Making_Files/OL-final-c85-t88-Breps-m80x50-maf025-outU
	--positions Making_Files/m80x50maf025filt-pcaQ_OF_BS-union.goodchrom
	--recode

After filtering, kept 105 out of 105 Individuals
Outputting VCF file...
After filtering, kept 854 out of a possible 17015 Sites
Run Time = 0.00 seconds


In [61]:
infile = "Making_Files/OL-final-c85-t88-Breps-m80x50-maf025-neutU.recode.vcf"
outfile = "Inputs/OL-c85t8-m80x50-maf025-neutU-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 15396
Unlinked SNPs: 6838


### Intersect 2

In [48]:
#Make files of bad loci (that have at least one site with excess heterozygotes) to copy/paste in vcftools. 
IN = open('Outlier/m80x50maf025filt-pcaQ_OF_BS-isect2.snp', "r")
INL = open('Outlier/m80x50maf025filt-pcaQ_OF_BS-isect2.txt',"r")
OUT = open('Making_Files/m80x50maf025filt-pcaQ_OF_BS-isect2.badchrom', "w")
OUTg = open('Making_Files/m80x50maf025filt-pcaQ_OF_BS-isect2.goodchrom', "w")
exset = set()
for line in INL:
    chrom = line.strip()
    if chrom not in exset:
        exset.add(chrom)
        OUT.write(" --not-chr locus_"+str(chrom))
print len(exset)
x = 0
for line in IN:
    chrom = line.strip().split("_")[1]
    snp = line.strip().split("_")[3]
    OUTg.write("locus_"+chrom+"\t"+snp+"\n")
    x += 1
print x
OUT.close()
OUTg.close()
IN.close()
INL.close()

101
136


In [49]:
%cat Making_Files/m80x50maf025filt-pcaQ_OF_BS-isect2.badchrom

 --not-chr locus_100568 --not-chr locus_100957 --not-chr locus_109152 --not-chr locus_111738 --not-chr locus_117964 --not-chr locus_124135 --not-chr locus_128049 --not-chr locus_128645 --not-chr locus_140076 --not-chr locus_142778 --not-chr locus_145002 --not-chr locus_147430 --not-chr locus_148130 --not-chr locus_148498 --not-chr locus_159889 --not-chr locus_160301 --not-chr locus_163432 --not-chr locus_168170 --not-chr locus_190976 --not-chr locus_202687 --not-chr locus_203738 --not-chr locus_206410 --not-chr locus_207632 --not-chr locus_214762 --not-chr locus_214825 --not-chr locus_21551 --not-chr locus_217068 --not-chr locus_217430 --not-chr locus_219854 --not-chr locus_220773 --not-chr locus_222016 --not-chr locus_224420 --not-chr locus_22471 --not-chr locus_227395 --not-chr locus_227598 --not-chr locus_23130 --not-chr locus_23289 --not-chr locus_236163 --not-chr locus_238327 --not-chr locus_243326 --not-chr locus_24478 --not-chr locus_248943 --not-chr locus_25585 --not-chr locus_

In [None]:
%%sh
suffix=Making_Files/OL-final-c85-t88-Breps-m80x50-maf025
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all --not-chr locus_100568 --not-chr locus_100957 --not-chr locus_109152 --not-chr locus_111738 --not-chr locus_117964 --not-chr locus_124135 --not-chr locus_128049 --not-chr locus_128645 --not-chr locus_140076 --not-chr locus_142778 --not-chr locus_145002 --not-chr locus_147430 --not-chr locus_148130 --not-chr locus_148498 --not-chr locus_159889 --not-chr locus_160301 --not-chr locus_163432 --not-chr locus_168170 --not-chr locus_190976 --not-chr locus_202687 --not-chr locus_203738 --not-chr locus_206410 --not-chr locus_207632 --not-chr locus_214762 --not-chr locus_214825 --not-chr locus_21551 --not-chr locus_217068 --not-chr locus_217430 --not-chr locus_219854 --not-chr locus_220773 --not-chr locus_222016 --not-chr locus_224420 --not-chr locus_22471 --not-chr locus_227395 --not-chr locus_227598 --not-chr locus_23130 --not-chr locus_23289 --not-chr locus_236163 --not-chr locus_238327 --not-chr locus_243326 --not-chr locus_24478 --not-chr locus_248943 --not-chr locus_25585 --not-chr locus_258392 --not-chr locus_259858 --not-chr locus_262003 --not-chr locus_263310 --not-chr locus_27026 --not-chr locus_274413 --not-chr locus_27989 --not-chr locus_281304 --not-chr locus_285131 --not-chr locus_28586 --not-chr locus_289083 --not-chr locus_290548 --not-chr locus_29312 --not-chr locus_306647 --not-chr locus_308418 --not-chr locus_31473 --not-chr locus_338116 --not-chr locus_341756 --not-chr locus_34532 --not-chr locus_34581 --not-chr locus_346248 --not-chr locus_346887 --not-chr locus_348611 --not-chr locus_355940 --not-chr locus_356067 --not-chr locus_35619 --not-chr locus_356425 --not-chr locus_357468 --not-chr locus_35810 --not-chr locus_361253 --not-chr locus_364232 --not-chr locus_370301 --not-chr locus_37161 --not-chr locus_380069 --not-chr locus_381233 --not-chr locus_382237 --not-chr locus_38587 --not-chr locus_38660 --not-chr locus_386802 --not-chr locus_396273 --not-chr locus_397723 --not-chr locus_397793 --not-chr locus_398552 --not-chr locus_402417 --not-chr locus_403442 --not-chr locus_40345 --not-chr locus_406656 --not-chr locus_41325 --not-chr locus_421469 --not-chr locus_421522 --not-chr locus_422492 --not-chr locus_422894 --not-chr locus_423769 --not-chr locus_45456 --not-chr locus_70871 --not-chr locus_76315 --not-chr locus_84263 --not-chr locus_99874 \
--max-alleles 2 --min-alleles 2 --out $suffix-neutI2

Filters 289 SNPs, kept 16,726 SNPS 

In [62]:
%%sh
suffix=Making_Files/OL-final-c85-t88-Breps-m80x50-maf025
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all --positions Making_Files/m80x50maf025filt-pcaQ_OF_BS-isect2.goodchrom \
--max-alleles 2 --min-alleles 2 --out $suffix-outI2


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/OL-final-c85-t88-Breps-m80x50-maf025.recode.vcf
	--recode-INFO-all
	--max-alleles 2
	--min-alleles 2
	--out Making_Files/OL-final-c85-t88-Breps-m80x50-maf025-outI2
	--positions Making_Files/m80x50maf025filt-pcaQ_OF_BS-isect2.goodchrom
	--recode

After filtering, kept 105 out of 105 Individuals
Outputting VCF file...
After filtering, kept 136 out of a possible 17015 Sites
Run Time = 0.00 seconds


In [63]:
infile = "Making_Files/OL-final-c85-t88-Breps-m80x50-maf025-neutI2.recode.vcf"
outfile = "Inputs/OL-c85t8-m80x50-maf025-neutI2-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 16738
Unlinked SNPs: 7313


## Making files for Puget Sound and British Columbia only

In [17]:
#Making a BC/Puget+ pop file
IN = open("Making_Files/OL-final-c85-t88-Breps-m50.imiss","r")
OUT_strata = open("Making_Files/BCWA-c85-t88-Breps.strata","w")
OUT_pop = open("Making_Files/BCWA-c85-t88-Breps.pop","w")
loc_dict = {'BC1':['Victoria', 'Victoria', 'Puget+', 'North', '48.435667', '-123.377909'],
            'BC2':['Klaskino', 'Klaskino', 'NW_BC', 'North', '50.298667', '-127.723633'],
            'BC3':['Barkeley_Sound', 'Barkeley_Sound', 'NW_BC', 'North', '49.01585', '-125.314167'],
            'BC4': ['Ladysmith', 'Ladysmith', 'Puget+', 'North','49.011383' ,'-123.8357'],
            'WA12': ['Discovery_Bay', 'Discovery_Bay', 'Puget+', 'North', '47.9978', '-122.8824'],
            'WA11': ['Liberty_Bay', 'Liberty_Bay', 'Puget+', 'North', '47.7375', '-122.6507'],
            'WA13': ['North_Bay', 'North_Bay', 'Puget+', 'North','47.3925', '-122.8138'],
            'WA10': ['Triton_Cove', 'Triton_Cove', 'Puget+', 'North', '46.6131', '-122.982']}
IN.next()
OUT_strata.write("INDIVIDUALS\tSTRATA\tLOCATION\tREGION\tNS\tLATITUDE\tLONGITUDE\tLIBRARY\n")
for line in IN:
    name = line.split()[0]
    pop = name.split("_")[0]
    if pop in loc_dict.keys():
        library = name.split("_")[2]
        OUT_strata.write(name+"\t"+'\t'.join(map(str,loc_dict[pop]))+"\t"+library+"\n")
        OUT_pop.write(name+"\t"+loc_dict[pop][0]+"\n")
    
IN.close()
OUT_strata.close()
OUT_pop.close()

In [18]:
%%sh
suffix=Making_Files/BCWA-c85-t88-Breps
vcftools --vcf Assembly/OL-final-c85-t88-Breps-m50_outfiles/OL-final-c85-t88-Breps-m50.vcf --keep Making_Files/BCWA-c85-t88-Breps.pop \
--recode --recode-INFO-all --max-missing 0.60 --min-alleles 2 --max-alleles 2 --out $suffix-m60


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Assembly/OL-final-c85-t88-Breps-m50_outfiles/OL-final-c85-t88-Breps-m50.vcf
	--keep Making_Files/BCWA-c85-t88-Breps.pop
	--recode-INFO-all
	--max-alleles 2
	--min-alleles 2
	--max-missing 0.6
	--out Making_Files/BCWA-c85-t88-Breps-m60
	--recode

Keeping individuals in 'keep' list
After filtering, kept 64 out of 137 Individuals
Outputting VCF file...
After filtering, kept 76213 out of a possible 114545 Sites
Run Time = 9.00 seconds


In [19]:
%%sh
suffix=Making_Files/BCWA-c85-t88-Breps
vcftools --vcf $suffix-m60.recode.vcf  --missing-indv --out $suffix


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/BCWA-c85-t88-Breps-m60.recode.vcf
	--missing-indv
	--out Making_Files/BCWA-c85-t88-Breps

After filtering, kept 64 out of 64 Individuals
Outputting Individual Missingness
After filtering, kept 76213 out of a possible 76213 Sites
Run Time = 1.00 seconds


In [32]:
imiss = np.genfromtxt('Making_Files/BCWA-c85-t88-Breps.imiss', names=True,dtype=None)
imissDF = pandas.DataFrame(imiss)
len(imissDF[imissDF.F_MISS < 0.65].INDV.values)

59

In [33]:
imissDF[imissDF.F_MISS > 0.65].INDV.to_csv("Making_Files/BCWA-c85-t88-Breps_imiss65.txt",sep=" ",index=False)

In [44]:
%%sh
suffix=Making_Files/BCWA-c85-t88-Breps

vcftools --vcf $suffix-m60.recode.vcf --remove-indv BC4_6_C2 \
--remove Making_Files/BCWA-c85-t88-Breps_imiss65.txt --recode --recode-INFO-all \
--min-alleles 2 --max-alleles 2 --max-missing 0.60 \
--out ${suffix}-m60x65


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/BCWA-c85-t88-Breps-m60.recode.vcf
	--remove Making_Files/BCWA-c85-t88-Breps_imiss65.txt
	--recode-INFO-all
	--max-alleles 2
	--min-alleles 2
	--max-missing 0.6
	--out Making_Files/BCWA-c85-t88-Breps-m60x65
	--recode
	--remove-indv BC4_6_C2

Excluding individuals in 'exclude' list
After filtering, kept 59 out of 64 Individuals
Outputting VCF file...
After filtering, kept 75491 out of a possible 76213 Sites
Run Time = 5.00 seconds


In [46]:
%%sh
suffix=Making_Files/BCWA-c85-t88-Breps

vcftools --vcf $suffix-m60.recode.vcf --remove-indv BC4_6_C2 \
--remove Making_Files/BCWA-c85-t88-Breps_imiss65.txt --recode --recode-INFO-all \
--min-alleles 2 --max-alleles 2 --max-missing 0.80 \
--out ${suffix}-m80x65


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/BCWA-c85-t88-Breps-m60.recode.vcf
	--remove Making_Files/BCWA-c85-t88-Breps_imiss65.txt
	--recode-INFO-all
	--max-alleles 2
	--min-alleles 2
	--max-missing 0.8
	--out Making_Files/BCWA-c85-t88-Breps-m80x65
	--recode
	--remove-indv BC4_6_C2

Excluding individuals in 'exclude' list
After filtering, kept 59 out of 64 Individuals
Outputting VCF file...
After filtering, kept 27494 out of a possible 76213 Sites
Run Time = 2.00 seconds


Using hetfilters from full dataset

In [47]:
%%sh
suffix=Making_Files/BCWA-c85-t88-Breps-m60x65
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all --not-chr locus_355504 --not-chr locus_48250 --not-chr locus_253330 --not-chr locus_394022 --not-chr locus_40333 --not-chr locus_333125 --not-chr locus_25870 --not-chr locus_289022 --not-chr locus_380572 --not-chr locus_55452 --not-chr locus_174781 --not-chr locus_101129 --not-chr locus_18461 --not-chr locus_23480 --not-chr locus_387974 --not-chr locus_153744 --not-chr locus_347558 --not-chr locus_172352 --not-chr locus_396787 --not-chr locus_255531 --not-chr locus_407412 --not-chr locus_140939 --not-chr locus_218417 --not-chr locus_42160 --not-chr locus_217492 --not-chr locus_125726 --not-chr locus_393720 --not-chr locus_220030 --not-chr locus_151068 --not-chr locus_409958 --not-chr locus_50784 --not-chr locus_206197 --not-chr locus_420365 --not-chr locus_401349 --not-chr locus_241939 --not-chr locus_258320 --not-chr locus_207453 --not-chr locus_380560 --not-chr locus_97057 --not-chr locus_131141 --not-chr locus_423209 --not-chr locus_286392 --not-chr locus_252249 --not-chr locus_172864 --not-chr locus_196985 --not-chr locus_157163 --not-chr locus_41857 --not-chr locus_357318 --not-chr locus_333703 --not-chr locus_276810 --not-chr locus_379373 --not-chr locus_92709 \
--not-chr locus_333703 --not-chr locus_276810 --not-chr locus_407412 --not-chr locus_380560 --not-chr locus_286392 --not-chr locus_347558 --not-chr locus_40333 --not-chr locus_387974 --not-chr locus_220030 --not-chr locus_203783 --not-chr locus_421342 --not-chr locus_50784 --not-chr locus_380572 --not-chr locus_41857 --not-chr locus_131141 --not-chr locus_218417 --not-chr locus_219923 --not-chr locus_125726 --not-chr locus_266793 --not-chr locus_97057 --not-chr locus_333125 --not-chr locus_18461 --not-chr locus_54888 --not-chr locus_92709 --not-chr locus_355504 --not-chr locus_140939 --not-chr locus_215692 --not-chr locus_207453 --not-chr locus_31565 --not-chr locus_357318 --not-chr locus_153744 --not-chr locus_216221 --not-chr locus_206197 --not-chr locus_101129 --not-chr locus_25870 --not-chr locus_401349 --not-chr locus_393720 --not-chr locus_119972 --not-chr locus_62278 --not-chr locus_423209 --not-chr locus_217492 --not-chr locus_42160 --not-chr locus_396787 --not-chr locus_55452 --not-chr locus_48250 --not-chr locus_253330 --not-chr locus_172864 --not-chr locus_379373 \
--max-alleles 2 --min-alleles 2 --out $suffix-filt


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/BCWA-c85-t88-Breps-m60x65.recode.vcf
	--not-chr locus_101129
	--not-chr locus_119972
	--not-chr locus_125726
	--not-chr locus_131141
	--not-chr locus_140939
	--not-chr locus_151068
	--not-chr locus_153744
	--not-chr locus_157163
	--not-chr locus_172352
	--not-chr locus_172864
	--not-chr locus_174781
	--not-chr locus_18461
	--not-chr locus_196985
	--not-chr locus_203783
	--not-chr locus_206197
	--not-chr locus_207453
	--not-chr locus_215692
	--not-chr locus_216221
	--not-chr locus_217492
	--not-chr locus_218417
	--not-chr locus_219923
	--not-chr locus_220030
	--not-chr locus_23480
	--not-chr locus_241939
	--not-chr locus_252249
	--not-chr locus_253330
	--not-chr locus_255531
	--not-chr locus_258320
	--not-chr locus_25870
	--not-chr locus_266793
	--not-chr locus_276810
	--not-chr locus_286392
	--not-chr locus_289022
	--not-chr locus_31565
	--not-chr locus_333125
	--not-chr locus_3

In [48]:
%%sh
suffix=Making_Files/BCWA-c85-t88-Breps-m80x65
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all --not-chr locus_355504 --not-chr locus_48250 --not-chr locus_253330 --not-chr locus_394022 --not-chr locus_40333 --not-chr locus_333125 --not-chr locus_25870 --not-chr locus_289022 --not-chr locus_380572 --not-chr locus_55452 --not-chr locus_174781 --not-chr locus_101129 --not-chr locus_18461 --not-chr locus_23480 --not-chr locus_387974 --not-chr locus_153744 --not-chr locus_347558 --not-chr locus_172352 --not-chr locus_396787 --not-chr locus_255531 --not-chr locus_407412 --not-chr locus_140939 --not-chr locus_218417 --not-chr locus_42160 --not-chr locus_217492 --not-chr locus_125726 --not-chr locus_393720 --not-chr locus_220030 --not-chr locus_151068 --not-chr locus_409958 --not-chr locus_50784 --not-chr locus_206197 --not-chr locus_420365 --not-chr locus_401349 --not-chr locus_241939 --not-chr locus_258320 --not-chr locus_207453 --not-chr locus_380560 --not-chr locus_97057 --not-chr locus_131141 --not-chr locus_423209 --not-chr locus_286392 --not-chr locus_252249 --not-chr locus_172864 --not-chr locus_196985 --not-chr locus_157163 --not-chr locus_41857 --not-chr locus_357318 --not-chr locus_333703 --not-chr locus_276810 --not-chr locus_379373 --not-chr locus_92709 \
--not-chr locus_333703 --not-chr locus_276810 --not-chr locus_407412 --not-chr locus_380560 --not-chr locus_286392 --not-chr locus_347558 --not-chr locus_40333 --not-chr locus_387974 --not-chr locus_220030 --not-chr locus_203783 --not-chr locus_421342 --not-chr locus_50784 --not-chr locus_380572 --not-chr locus_41857 --not-chr locus_131141 --not-chr locus_218417 --not-chr locus_219923 --not-chr locus_125726 --not-chr locus_266793 --not-chr locus_97057 --not-chr locus_333125 --not-chr locus_18461 --not-chr locus_54888 --not-chr locus_92709 --not-chr locus_355504 --not-chr locus_140939 --not-chr locus_215692 --not-chr locus_207453 --not-chr locus_31565 --not-chr locus_357318 --not-chr locus_153744 --not-chr locus_216221 --not-chr locus_206197 --not-chr locus_101129 --not-chr locus_25870 --not-chr locus_401349 --not-chr locus_393720 --not-chr locus_119972 --not-chr locus_62278 --not-chr locus_423209 --not-chr locus_217492 --not-chr locus_42160 --not-chr locus_396787 --not-chr locus_55452 --not-chr locus_48250 --not-chr locus_253330 --not-chr locus_172864 --not-chr locus_379373 \
--max-alleles 2 --min-alleles 2 --out $suffix-filt


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/BCWA-c85-t88-Breps-m80x65.recode.vcf
	--not-chr locus_101129
	--not-chr locus_119972
	--not-chr locus_125726
	--not-chr locus_131141
	--not-chr locus_140939
	--not-chr locus_151068
	--not-chr locus_153744
	--not-chr locus_157163
	--not-chr locus_172352
	--not-chr locus_172864
	--not-chr locus_174781
	--not-chr locus_18461
	--not-chr locus_196985
	--not-chr locus_203783
	--not-chr locus_206197
	--not-chr locus_207453
	--not-chr locus_215692
	--not-chr locus_216221
	--not-chr locus_217492
	--not-chr locus_218417
	--not-chr locus_219923
	--not-chr locus_220030
	--not-chr locus_23480
	--not-chr locus_241939
	--not-chr locus_252249
	--not-chr locus_253330
	--not-chr locus_255531
	--not-chr locus_258320
	--not-chr locus_25870
	--not-chr locus_266793
	--not-chr locus_276810
	--not-chr locus_286392
	--not-chr locus_289022
	--not-chr locus_31565
	--not-chr locus_333125
	--not-chr locus_3

In [49]:
%%sh
suffix=Making_Files/BCWA-c85-t88-Breps-m60x65
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all \
--maf 0.025 --max-alleles 2 --min-alleles 2 --out $suffix-maf025-filt


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/BCWA-c85-t88-Breps-m60x65.recode.vcf
	--recode-INFO-all
	--maf 0.025
	--max-alleles 2
	--min-alleles 2
	--out Making_Files/BCWA-c85-t88-Breps-m60x65-maf025-filt
	--recode

After filtering, kept 59 out of 59 Individuals
Outputting VCF file...
After filtering, kept 22293 out of a possible 75491 Sites
Run Time = 2.00 seconds


In [50]:
%%sh
suffix=Making_Files/BCWA-c85-t88-Breps-m80x65
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all \
--maf 0.025 --max-alleles 2 --min-alleles 2 --out $suffix-maf025-filt


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/BCWA-c85-t88-Breps-m80x65.recode.vcf
	--recode-INFO-all
	--maf 0.025
	--max-alleles 2
	--min-alleles 2
	--out Making_Files/BCWA-c85-t88-Breps-m80x65-maf025-filt
	--recode

After filtering, kept 59 out of 59 Individuals
Outputting VCF file...
After filtering, kept 6721 out of a possible 27494 Sites
Run Time = 0.00 seconds


In [51]:
infile = "Making_Files/BCWA-c85-t88-Breps-m60x65-filt.recode.vcf"
outfile = "Inputs/BCWA-c85-t88-Breps-m60x65-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 75153
Unlinked SNPs: 16868


In [64]:
infile = "Making_Files/BCWA-c85-t88-Breps-m60x65-maf025-filt.recode.vcf"
outfile = "Inputs/BCWA-c85-t88-Breps-m60x65-maf025-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 22305
Unlinked SNPs: 10063


In [65]:
infile = "Making_Files/BCWA-c85-t88-Breps-m80x65-filt.recode.vcf"
outfile = "Inputs/BCWA-c85-t88-Breps-m80x65-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 27366
Unlinked SNPs: 6408


In [52]:
infile = "Making_Files/BCWA-c85-t88-Breps-m80x65-maf025-filt.recode.vcf"
outfile = "Inputs/BCWA-c85-t88-Breps-m80x65-maf025-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 6733
Unlinked SNPs: 3348


## Making CA only

In [53]:
#Making a BC/Puget+ pop file
IN = open("Making_Files/OL-final-c85-t88-Breps-m50.imiss","r")
OUT_strata = open("Making_Files/CA-c85-t88-Breps.strata","w")
OUT_pop = open("Making_Files/CA-c85-t88-Breps.pop","w")
loc_dict = {'CA6': ['Humboldt', 'Humboldt', 'Humboldt', 'South', '40.8557972', '-124.0974611'],
            'CA4': ['Tomales', 'Tomales', 'NoCal', 'South', '38.117549', '-122.874497'],
            'CA2': ['SF_PointOrient', 'SF_PointOrient', 'NoCal', 'South', '37.955067', '-122.421800'],
            'CA3': ['SF_Candlestick', 'SF_Candlestick', 'NoCal', 'South', '37.708665', '-122.377607'],
            'CA5': ['Elkhorn_Slough', 'Elkhorn_Slough', 'NoCal', 'South', '36.8398194', '-121.7427806'],
            'CA7': ['Mugu_Lagoon', 'Mugu_Lagoon', 'SoCal', 'South', '34.101914', '-119.10434'],
            'CA1': ['San_Diego', 'San_Diego', 'SoCal', 'South', '32.602500', '-117.118889']}
IN.next()
OUT_strata.write("INDIVIDUALS\tSTRATA\tLOCATION\tREGION\tNS\tLATITUDE\tLONGITUDE\tLIBRARY\n")
for line in IN:
    name = line.split()[0]
    pop = name.split("_")[0]
    if pop in loc_dict.keys():
        library = name.split("_")[2]
        OUT_strata.write(name+"\t"+'\t'.join(map(str,loc_dict[pop]))+"\t"+library+"\n")
        OUT_pop.write(name+"\t"+loc_dict[pop][0]+"\n")
    
IN.close()
OUT_strata.close()
OUT_pop.close()

In [54]:
%%sh
suffix=Making_Files/CA-c85-t88-Breps
vcftools --vcf Assembly/OL-final-c85-t88-Breps-m50_outfiles/OL-final-c85-t88-Breps-m50.vcf --keep Making_Files/CA-c85-t88-Breps.pop \
--recode --recode-INFO-all --max-missing 0.60 --min-alleles 2 --max-alleles 2 --out $suffix-m60


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Assembly/OL-final-c85-t88-Breps-m50_outfiles/OL-final-c85-t88-Breps-m50.vcf
	--keep Making_Files/CA-c85-t88-Breps.pop
	--recode-INFO-all
	--max-alleles 2
	--min-alleles 2
	--max-missing 0.6
	--out Making_Files/CA-c85-t88-Breps-m60
	--recode

Keeping individuals in 'keep' list
After filtering, kept 40 out of 137 Individuals
Outputting VCF file...
After filtering, kept 76323 out of a possible 114545 Sites
Run Time = 8.00 seconds


In [55]:
%%sh
suffix=Making_Files/CA-c85-t88-Breps
vcftools --vcf $suffix-m60.recode.vcf  --missing-indv --out $suffix


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/CA-c85-t88-Breps-m60.recode.vcf
	--missing-indv
	--out Making_Files/CA-c85-t88-Breps

After filtering, kept 40 out of 40 Individuals
Outputting Individual Missingness
After filtering, kept 76323 out of a possible 76323 Sites
Run Time = 0.00 seconds


In [56]:
imiss = np.genfromtxt('Making_Files/CA-c85-t88-Breps.imiss', names=True,dtype=None)
imissDF = pandas.DataFrame(imiss)
len(imissDF[imissDF.F_MISS < 0.65].INDV.values)

37

In [57]:
imissDF[imissDF.F_MISS > 0.65].INDV.to_csv("Making_Files/CA-c85-t88-Breps_imiss65.txt",sep=" ",index=False)

In [58]:
%%sh
suffix=Making_Files/CA-c85-t88-Breps

vcftools --vcf $suffix-m60.recode.vcf --remove-indv CA7_5_C1 --remove-indv CA1_5_C5 --remove-indv CA3_4_C1 \
--remove Making_Files/CA-c85-t88-Breps_imiss65.txt --recode --recode-INFO-all \
--min-alleles 2 --max-alleles 2 --max-missing 0.60 \
--out ${suffix}-m60x65


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/CA-c85-t88-Breps-m60.recode.vcf
	--remove Making_Files/CA-c85-t88-Breps_imiss65.txt
	--recode-INFO-all
	--max-alleles 2
	--min-alleles 2
	--max-missing 0.6
	--out Making_Files/CA-c85-t88-Breps-m60x65
	--recode
	--remove-indv CA1_5_C5
	--remove-indv CA3_4_C1
	--remove-indv CA7_5_C1

Excluding individuals in 'exclude' list
After filtering, kept 37 out of 40 Individuals
Outputting VCF file...
After filtering, kept 74999 out of a possible 76323 Sites
Run Time = 2.00 seconds


In [59]:
%%sh
suffix=Making_Files/CA-c85-t88-Breps

vcftools --vcf $suffix-m60.recode.vcf --remove-indv CA7_5_C1 --remove-indv CA1_5_C5 --remove-indv CA3_4_C1 \
--remove Making_Files/CA-c85-t88-Breps_imiss65.txt --recode --recode-INFO-all \
--min-alleles 2 --max-alleles 2 --max-missing 0.80 \
--out ${suffix}-m80x65


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/CA-c85-t88-Breps-m60.recode.vcf
	--remove Making_Files/CA-c85-t88-Breps_imiss65.txt
	--recode-INFO-all
	--max-alleles 2
	--min-alleles 2
	--max-missing 0.8
	--out Making_Files/CA-c85-t88-Breps-m80x65
	--recode
	--remove-indv CA1_5_C5
	--remove-indv CA3_4_C1
	--remove-indv CA7_5_C1

Excluding individuals in 'exclude' list
After filtering, kept 37 out of 40 Individuals
Outputting VCF file...
After filtering, kept 31932 out of a possible 76323 Sites
Run Time = 1.00 seconds


In [60]:
%%sh
suffix=Making_Files/CA-c85-t88-Breps-m60x65
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all --not-chr locus_355504 --not-chr locus_48250 --not-chr locus_253330 --not-chr locus_394022 --not-chr locus_40333 --not-chr locus_333125 --not-chr locus_25870 --not-chr locus_289022 --not-chr locus_380572 --not-chr locus_55452 --not-chr locus_174781 --not-chr locus_101129 --not-chr locus_18461 --not-chr locus_23480 --not-chr locus_387974 --not-chr locus_153744 --not-chr locus_347558 --not-chr locus_172352 --not-chr locus_396787 --not-chr locus_255531 --not-chr locus_407412 --not-chr locus_140939 --not-chr locus_218417 --not-chr locus_42160 --not-chr locus_217492 --not-chr locus_125726 --not-chr locus_393720 --not-chr locus_220030 --not-chr locus_151068 --not-chr locus_409958 --not-chr locus_50784 --not-chr locus_206197 --not-chr locus_420365 --not-chr locus_401349 --not-chr locus_241939 --not-chr locus_258320 --not-chr locus_207453 --not-chr locus_380560 --not-chr locus_97057 --not-chr locus_131141 --not-chr locus_423209 --not-chr locus_286392 --not-chr locus_252249 --not-chr locus_172864 --not-chr locus_196985 --not-chr locus_157163 --not-chr locus_41857 --not-chr locus_357318 --not-chr locus_333703 --not-chr locus_276810 --not-chr locus_379373 --not-chr locus_92709 \
--not-chr locus_333703 --not-chr locus_276810 --not-chr locus_407412 --not-chr locus_380560 --not-chr locus_286392 --not-chr locus_347558 --not-chr locus_40333 --not-chr locus_387974 --not-chr locus_220030 --not-chr locus_203783 --not-chr locus_421342 --not-chr locus_50784 --not-chr locus_380572 --not-chr locus_41857 --not-chr locus_131141 --not-chr locus_218417 --not-chr locus_219923 --not-chr locus_125726 --not-chr locus_266793 --not-chr locus_97057 --not-chr locus_333125 --not-chr locus_18461 --not-chr locus_54888 --not-chr locus_92709 --not-chr locus_355504 --not-chr locus_140939 --not-chr locus_215692 --not-chr locus_207453 --not-chr locus_31565 --not-chr locus_357318 --not-chr locus_153744 --not-chr locus_216221 --not-chr locus_206197 --not-chr locus_101129 --not-chr locus_25870 --not-chr locus_401349 --not-chr locus_393720 --not-chr locus_119972 --not-chr locus_62278 --not-chr locus_423209 --not-chr locus_217492 --not-chr locus_42160 --not-chr locus_396787 --not-chr locus_55452 --not-chr locus_48250 --not-chr locus_253330 --not-chr locus_172864 --not-chr locus_379373 \
--max-alleles 2 --min-alleles 2 --out $suffix-filt


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/CA-c85-t88-Breps-m60x65.recode.vcf
	--not-chr locus_101129
	--not-chr locus_119972
	--not-chr locus_125726
	--not-chr locus_131141
	--not-chr locus_140939
	--not-chr locus_151068
	--not-chr locus_153744
	--not-chr locus_157163
	--not-chr locus_172352
	--not-chr locus_172864
	--not-chr locus_174781
	--not-chr locus_18461
	--not-chr locus_196985
	--not-chr locus_203783
	--not-chr locus_206197
	--not-chr locus_207453
	--not-chr locus_215692
	--not-chr locus_216221
	--not-chr locus_217492
	--not-chr locus_218417
	--not-chr locus_219923
	--not-chr locus_220030
	--not-chr locus_23480
	--not-chr locus_241939
	--not-chr locus_252249
	--not-chr locus_253330
	--not-chr locus_255531
	--not-chr locus_258320
	--not-chr locus_25870
	--not-chr locus_266793
	--not-chr locus_276810
	--not-chr locus_286392
	--not-chr locus_289022
	--not-chr locus_31565
	--not-chr locus_333125
	--not-chr locus_333

In [61]:
%%sh
suffix=Making_Files/CA-c85-t88-Breps-m80x65
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all --not-chr locus_355504 --not-chr locus_48250 --not-chr locus_253330 --not-chr locus_394022 --not-chr locus_40333 --not-chr locus_333125 --not-chr locus_25870 --not-chr locus_289022 --not-chr locus_380572 --not-chr locus_55452 --not-chr locus_174781 --not-chr locus_101129 --not-chr locus_18461 --not-chr locus_23480 --not-chr locus_387974 --not-chr locus_153744 --not-chr locus_347558 --not-chr locus_172352 --not-chr locus_396787 --not-chr locus_255531 --not-chr locus_407412 --not-chr locus_140939 --not-chr locus_218417 --not-chr locus_42160 --not-chr locus_217492 --not-chr locus_125726 --not-chr locus_393720 --not-chr locus_220030 --not-chr locus_151068 --not-chr locus_409958 --not-chr locus_50784 --not-chr locus_206197 --not-chr locus_420365 --not-chr locus_401349 --not-chr locus_241939 --not-chr locus_258320 --not-chr locus_207453 --not-chr locus_380560 --not-chr locus_97057 --not-chr locus_131141 --not-chr locus_423209 --not-chr locus_286392 --not-chr locus_252249 --not-chr locus_172864 --not-chr locus_196985 --not-chr locus_157163 --not-chr locus_41857 --not-chr locus_357318 --not-chr locus_333703 --not-chr locus_276810 --not-chr locus_379373 --not-chr locus_92709 \
--not-chr locus_333703 --not-chr locus_276810 --not-chr locus_407412 --not-chr locus_380560 --not-chr locus_286392 --not-chr locus_347558 --not-chr locus_40333 --not-chr locus_387974 --not-chr locus_220030 --not-chr locus_203783 --not-chr locus_421342 --not-chr locus_50784 --not-chr locus_380572 --not-chr locus_41857 --not-chr locus_131141 --not-chr locus_218417 --not-chr locus_219923 --not-chr locus_125726 --not-chr locus_266793 --not-chr locus_97057 --not-chr locus_333125 --not-chr locus_18461 --not-chr locus_54888 --not-chr locus_92709 --not-chr locus_355504 --not-chr locus_140939 --not-chr locus_215692 --not-chr locus_207453 --not-chr locus_31565 --not-chr locus_357318 --not-chr locus_153744 --not-chr locus_216221 --not-chr locus_206197 --not-chr locus_101129 --not-chr locus_25870 --not-chr locus_401349 --not-chr locus_393720 --not-chr locus_119972 --not-chr locus_62278 --not-chr locus_423209 --not-chr locus_217492 --not-chr locus_42160 --not-chr locus_396787 --not-chr locus_55452 --not-chr locus_48250 --not-chr locus_253330 --not-chr locus_172864 --not-chr locus_379373 \
--max-alleles 2 --min-alleles 2 --out $suffix-filt


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/CA-c85-t88-Breps-m80x65.recode.vcf
	--not-chr locus_101129
	--not-chr locus_119972
	--not-chr locus_125726
	--not-chr locus_131141
	--not-chr locus_140939
	--not-chr locus_151068
	--not-chr locus_153744
	--not-chr locus_157163
	--not-chr locus_172352
	--not-chr locus_172864
	--not-chr locus_174781
	--not-chr locus_18461
	--not-chr locus_196985
	--not-chr locus_203783
	--not-chr locus_206197
	--not-chr locus_207453
	--not-chr locus_215692
	--not-chr locus_216221
	--not-chr locus_217492
	--not-chr locus_218417
	--not-chr locus_219923
	--not-chr locus_220030
	--not-chr locus_23480
	--not-chr locus_241939
	--not-chr locus_252249
	--not-chr locus_253330
	--not-chr locus_255531
	--not-chr locus_258320
	--not-chr locus_25870
	--not-chr locus_266793
	--not-chr locus_276810
	--not-chr locus_286392
	--not-chr locus_289022
	--not-chr locus_31565
	--not-chr locus_333125
	--not-chr locus_333

In [62]:
%%sh
suffix=Making_Files/CA-c85-t88-Breps-m60x65
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all \
--maf 0.025 --max-alleles 2 --min-alleles 2 --out $suffix-maf025-filt


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/CA-c85-t88-Breps-m60x65.recode.vcf
	--recode-INFO-all
	--maf 0.025
	--max-alleles 2
	--min-alleles 2
	--out Making_Files/CA-c85-t88-Breps-m60x65-maf025-filt
	--recode

After filtering, kept 37 out of 37 Individuals
Outputting VCF file...
After filtering, kept 38869 out of a possible 74999 Sites
Run Time = 2.00 seconds


In [63]:
%%sh
suffix=Making_Files/CA-c85-t88-Breps-m80x65
vcftools --vcf $suffix.recode.vcf --recode --recode-INFO-all \
--maf 0.025 --max-alleles 2 --min-alleles 2 --out $suffix-maf025-filt


VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Making_Files/CA-c85-t88-Breps-m80x65.recode.vcf
	--recode-INFO-all
	--maf 0.025
	--max-alleles 2
	--min-alleles 2
	--out Making_Files/CA-c85-t88-Breps-m80x65-maf025-filt
	--recode

After filtering, kept 37 out of 37 Individuals
Outputting VCF file...
After filtering, kept 16142 out of a possible 31932 Sites
Run Time = 1.00 seconds


In [66]:
infile = "Making_Files/CA-c85-t88-Breps-m60x65-filt.recode.vcf"
outfile = "Inputs/CA-c85-t88-Breps-m60x65-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 74670
Unlinked SNPs: 16699


In [67]:
infile = "Making_Files/CA-c85-t88-Breps-m60x65-maf025-filt.recode.vcf"
outfile = "Inputs/CA-c85-t88-Breps-m60x65-maf025-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 38881
Unlinked SNPs: 13298


In [68]:
infile = "Making_Files/CA-c85-t88-Breps-m80x65-filt.recode.vcf"
outfile = "Inputs/CA-c85-t88-Breps-m80x65-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 31758
Unlinked SNPs: 7322


In [69]:
infile = "Making_Files/CA-c85-t88-Breps-m80x65-maf025-filt.recode.vcf"
outfile = "Inputs/CA-c85-t88-Breps-m80x65-maf025-u.vcf"
subsetSNPs(infile,outfile)

Total SNPS: 16154
Unlinked SNPs: 5742
