Skip to content

jgi_fosmid_id2plate

nielshanson edited this page Feb 22, 2014 · 2 revisions

An overview of the problem of the two fosmid-end naming conventions and a method to reconcile them and make them compatible with the fosmid_qc.py program for automatic identification of fosmids to their original fosmid-ends.

This is a four step process. Given an original fosmid-end library file from the JGI, e.g., FGYA.fasta.

  • Strip the header names of the fosmid-end libraries using strip_fastas.py producing a file like GYA.fasta.names.txt
python strip_fastas.py -i ../fosmid_qc/data/EndSequenceDB/FGYA.fasta -o .

nielsh$ head FGYA.fasta.names.txt
FGYA1000.b1
FGYA1000.g1
FGYA1001.b1
FGYA1001.g1
FGYA1002.b1
  • Use the JGI's own id2plate3a.pl script to decode the plate location and position
perl id2plate3a.pl FGYA.fasta.names.txt > FGYA.id2plate.txt

nielsh$ head FGYA.id2plate.txt
FGYA1000.b1 Well O10 for sequential 384-well Plate 2 (PGF Plate 9)
FGYA1000.g1 Well O10 for sequential 384-well Plate 2 (PGF Plate 9)
FGYA1001.b1 Well A12 for sequential 384-well Plate 2 (PGF Plate 9)
FGYA1001.g1 Well A12 for sequential 384-well Plate 2 (PGF Plate 9)
FGYA1002.b1 Well C12 for sequential 384-well Plate 2 (PGF Plate 9)
  • Process the output of id2plate3a.pl with process_id2plate3a.py to produce a file like FGYA.id2plate.txt.map.txt
python process_id2plate3a.py -i FGYA.id2plate.txt -o .

nielsh$ head FGYA.id2plate.txt.map.txt
FGYA1000.b1	FGYA-2.CPR_O10
FGYA1000.g1	FGYA-2.CPF_O10
FGYA1001.b1	FGYA-2.CPR_A12
FGYA1001.g1	FGYA-2.CPF_A12
FGYA1002.b1	FGYA-2.CPR_C12
  • Finally, process the FGYA.id2plate.txt.map.txt file with the original .fasta file to create a .new.fasta file like FGYA.fasta.new.fasta, changing all the header names to follow the fosmid_qc convention.
python fasta_remapper.py -i ../fosmid_qc/data/EndSequenceDB/FGYA.fasta -m FGYA.id2plate.txt.map.txt -o .

nielsh$ head FGYA.fasta.new.fasta
>FGYA-2.CPR_O10
CACCAAGGCATACGCATTGGTCCTCGTTTGATATCCTGCTTCTCCCTCGTTCCATTGAGGACACTCTCTACCTCGCGAGGGCAGATGCGCTTGCACGGACGAAACACACAAGCCCCAATGCGTGCCACCACACTGGGTCAGAGGCAGCTCAATCTAGAGATGCCGCGTATCTGCGGAGGCGCCAATCACTCGATCAATGCACATAAACCAGGAAATCACACTGTTGTGTTCAAGAAAACTCTTTCTTACTATAGTGTATGTGGCGGGGAAGGCCGAGGGCAACGTTTCAAACGGTATAACCCTTCGTCGTTTCGTCCGTCCACAATATCCATGCGAAAGGGGGAAAGGGAAAACATATGTAGTGTATGCAGTATACTTCCAAACACGTCGTGTTCAGAATGGTGGAGCCGAGTAACTGTCCCGCGGTGCGGCACCAGCGGGTGTGAAGGGGAAATGCTCTCGCAACTGTGTTGAAAAAGAGGAGGGAAGCACGCGTGTGGTGTGGGG
>FGYA-2.CPF_O10
CGTGTGTGTGTGTGCGTGTATGCACGGGCACCCGCATGTCCATCAAACACGGAGTGGTTAGGGGTATTGCCTGGTTGACTGACTGACTGGCTGGACCCCCCCCCCAACCCCCCCTCCCCACTCCCTCATCCTACACTCCCTCCACCTCTCTACCCCATCCCACCGTGTGTGTGCGTGCGCAGTCCCTAGCCTGCCTTCTCCTCGCTGCCGGCCGATTCTGCTCGCACCGCCTGATTCGTCTGTCTGTCGAGGCTCGTGCGTCTCTACCATACGTGCCGCCCCTCAATGGCGACCACAAGGGACCGCAGGGCCGTCGGCTCACAATGGATCCAGCCAGTTTCCGTGGCCTGAACTTTGGCGGCGAGTCGTGTGGGGGTG

Your collection of new.fasta files should now be ready to be added to your fosmid-end library and compatible with fosmid_qc.py.

Clone this wiki locally