## Converting ipyrad .u.str output for adegenet 

You need 3 files: the .u.str file from ipyrad, a file listing the population of each sample, and a file listing an integer for each population. Based on the format of my sample names (Population_ID), I created the Sample-Population file with a simple python script that takes in any of the stats output from Steps 2-6.You could also just manually make a file with a column of sample ids and a column of corresponding Population IDs.

In [None]:
# %load /home/ksil91/Projects/Ostrea/makePopFile.py
##create population file for ipyrad. Can be used with ipyrad stats files from steps 2-6.

import sys

def makePopFile(infile, outfile):
    IN = open(infile, "r")
    OUT = open(outfile, "w")
    for line in IN:
        sampleID = line.split()[0]
        popID = sampleID.split("_")[0]
        OUT.write(sampleID + " "+popID+"\n")
    IN.close()
    OUT.close()


def main(argv):
    #get arguments from command line
    inf = argv[1]
    outf = argv[2]
    makePopFile(inf,outf)

if __name__ == "__main__":
    status = main(sys.argv)
    sys.exit(status)



File should look like this, with sample id 1st, a space or tab, then the population id:

In [11]:
%%sh 
head /home/ksil91/Projects/Ostrea/over10k_popfile.txt 

clusters_total clusters
BC1_1 BC1
BC1_10w_6 BC1
BC1_11 BC1
BC1_12 BC1
BC1_19 BC1
BC1_2 BC1
BC1_20 BC1
BC1_22 BC1
BC1_5 BC1


It's ok if you have some extra stuff on there for my subsequent code.

For .str files, you need to code the population as an integer. For this, I created a file with the Population ID string, the desired integer, and then optional columns of other info, like the full name of the population or GPS coordinates. This file can then be played around with to create different groupings of populations.

In [12]:
%%sh
head /home/ksil91/Projects/Ostrea/Pop2Int.txt

BC1 4 Victoria
BC2 1 Klaskino
BC3 2 Barkeley_Sound
BC4 3 Ladysmith
WA12 5 Discovery_Bay
WA11 6 Liberty_Bay
WA13 7 North_Bay
WA10 8 Triton_Cove
WA1 9 North_Willapa
WA9 9 South_Willpa


Then I made a script to add a column of integers to the .str ipyrad output corresponding to the population of each sample.

In [None]:
# %load /home/ksil91/Projects/Ostrea/AddPopsStr.py
import sys

def addPops(str_infile, popfile, outfile, pop2int):
    IN = open(str_infile, "r")
    OUT = open(outfile, "w")
    pops = open(popfile, "r")
    pop2int = open(pop2int, "r")
    popdict = {}
    pop2intdict = {}
    for line in pop2int:
        popID = line.split()[0]
        intID = line.strip().split()[1]
        pop2intdict[popID] = intID
    for line in pops:
        sampleID = line.split()[0]
        popID = line.strip().split()[1]
        popdict[sampleID] = popID
    for line in IN:
        linelist = line.split()
        intID = pop2intdict[popdict[linelist[0]]]
        linelist.insert(1, intID)
        print >> OUT, "\t".join(str(e) for e in linelist)
    IN.close()
    OUT.close()
    pops.close()
    pop2int.close()

def main(argv):
    #get arguments from command line
    inf = argv[1]
    outf = argv[2]
    popf = argv[3]
    pop2int = argv[4]
    addPops(inf,popf, outf, pop2int)

if __name__ == "__main__":
    status = main(sys.argv)
    sys.exit(status)



In [14]:
%%sh
head /home/ksil91/Projects/Ostrea/over10k-min75H32pops.u.str | cut -f 1-4

BC1_1	4	3	0
BC1_1	4	3	0
BC1_10w_6	4	3	0
BC1_10w_6	4	3	0
BC1_11	4	3	0
BC1_11	4	3	0
BC1_12	4	3	0
BC1_12	4	3	0
BC1_19	4	-9	0
BC1_19	4	-9	0
