-
Notifications
You must be signed in to change notification settings - Fork 11
jgi_fosmid_id2plate
nielshanson edited this page Feb 22, 2014
·
2 revisions
An overview of the problem of the two fosmid-end naming conventions and a method to reconcile them and make them compatible with the fosmid_qc.py program for automatic identification of fosmids to their original fosmid-ends.
This is a four step process. Given an original fosmid-end library file from the JGI, e.g., FGYA.fasta.
- Strip the header names of the fosmid-end libraries using
strip_fastas.pyproducing a file likeGYA.fasta.names.txt
python strip_fastas.py -i ../fosmid_qc/data/EndSequenceDB/FGYA.fasta -o .
nielsh$ head FGYA.fasta.names.txt
FGYA1000.b1
FGYA1000.g1
FGYA1001.b1
FGYA1001.g1
FGYA1002.b1
- Use the JGI's own
id2plate3a.plscript to decode the plate location and position
perl id2plate3a.pl FGYA.fasta.names.txt > FGYA.id2plate.txt
nielsh$ head FGYA.id2plate.txt
FGYA1000.b1 Well O10 for sequential 384-well Plate 2 (PGF Plate 9)
FGYA1000.g1 Well O10 for sequential 384-well Plate 2 (PGF Plate 9)
FGYA1001.b1 Well A12 for sequential 384-well Plate 2 (PGF Plate 9)
FGYA1001.g1 Well A12 for sequential 384-well Plate 2 (PGF Plate 9)
FGYA1002.b1 Well C12 for sequential 384-well Plate 2 (PGF Plate 9)
- Process the output of
id2plate3a.plwith process_id2plate3a.py to produce a file likeFGYA.id2plate.txt.map.txt
python process_id2plate3a.py -i FGYA.id2plate.txt -o .
nielsh$ head FGYA.id2plate.txt.map.txt
FGYA1000.b1 FGYA-2.CPR_O10
FGYA1000.g1 FGYA-2.CPF_O10
FGYA1001.b1 FGYA-2.CPR_A12
FGYA1001.g1 FGYA-2.CPF_A12
FGYA1002.b1 FGYA-2.CPR_C12
- Finally, process the
FGYA.id2plate.txt.map.txtfile with the original.fastafile to create a.new.fastafile likeFGYA.fasta.new.fasta, changing all the header names to follow the fosmid_qc convention.
python fasta_remapper.py -i ../fosmid_qc/data/EndSequenceDB/FGYA.fasta -m FGYA.id2plate.txt.map.txt -o .
nielsh$ head FGYA.fasta.new.fasta
>FGYA-2.CPR_O10
CACCAAGGCATACGCATTGGTCCTCGTTTGATATCCTGCTTCTCCCTCGTTCCATTGAGGACACTCTCTACCTCGCGAGGGCAGATGCGCTTGCACGGACGAAACACACAAGCCCCAATGCGTGCCACCACACTGGGTCAGAGGCAGCTCAATCTAGAGATGCCGCGTATCTGCGGAGGCGCCAATCACTCGATCAATGCACATAAACCAGGAAATCACACTGTTGTGTTCAAGAAAACTCTTTCTTACTATAGTGTATGTGGCGGGGAAGGCCGAGGGCAACGTTTCAAACGGTATAACCCTTCGTCGTTTCGTCCGTCCACAATATCCATGCGAAAGGGGGAAAGGGAAAACATATGTAGTGTATGCAGTATACTTCCAAACACGTCGTGTTCAGAATGGTGGAGCCGAGTAACTGTCCCGCGGTGCGGCACCAGCGGGTGTGAAGGGGAAATGCTCTCGCAACTGTGTTGAAAAAGAGGAGGGAAGCACGCGTGTGGTGTGGGG
>FGYA-2.CPF_O10
CGTGTGTGTGTGTGCGTGTATGCACGGGCACCCGCATGTCCATCAAACACGGAGTGGTTAGGGGTATTGCCTGGTTGACTGACTGACTGGCTGGACCCCCCCCCCAACCCCCCCTCCCCACTCCCTCATCCTACACTCCCTCCACCTCTCTACCCCATCCCACCGTGTGTGTGCGTGCGCAGTCCCTAGCCTGCCTTCTCCTCGCTGCCGGCCGATTCTGCTCGCACCGCCTGATTCGTCTGTCTGTCGAGGCTCGTGCGTCTCTACCATACGTGCCGCCCCTCAATGGCGACCACAAGGGACCGCAGGGCCGTCGGCTCACAATGGATCCAGCCAGTTTCCGTGGCCTGAACTTTGGCGGCGAGTCGTGTGGGGGTG
Your collection of new.fasta files should now be ready to be added to your fosmid-end library and compatible with fosmid_qc.py.