# Genbank phylogeny from COI sequences

Complete mitogenome genbank records were downloaded for 4 ant species, each from a different subfamily:
  - ***Pseudomyrmex gracilis*** (Pseudomyrmecinae)
  - ***Formica fusca*** (Formicinae)
  - ***Linepthema humile*** (Dolichoderinae)
  - ***Solenopsis invicta*** (Myrmicinae)

In [4]:
# importing everything we'll need
from Bio import SeqIO
import os
import glob

Now, we'll be using SeqIO to obtain the COI nucleotide sequence for each species and save it into separate files:

In [41]:
def create_dir(dir_name):
    os.makedirs(os.path.dirname(dir_name), exist_ok=True)

def extract_COI(gb_file):
    for record in SeqIO.parse(gb_file, "genbank"):
        species_name = record.annotations.get('organism').replace(" ", "_")
        filename = "./coi_seqs/{}_coi.fa".format(species_name)
        create_dir(filename)
        with open(filename, "w") as coi_file:
            for gene in record.features:
                if gene.type in ["CDS"] and gene.qualifiers.get('gene')[0] in ['COX1', 'COI']:
                    header = "{}-{}".format(species_name, gene.qualifiers.get('gene')[0])
                    sequence = gene.location.extract(record.seq) # Mas seq pode ter stop codon truncado
                    if len(sequence) % 3 == 1:
                        sequence += "AA" #Resto 1 - Precisa adicionar 'AA'
                    elif len(sequence) % 3 == 2:
                        sequence += "A" #Resto 2 - Precisa adicionar 'A'
                    coi_file.write(">{}\n{}\n".format(header, sequence))

for gb_file in glob.glob("./ant_mitogenomes/*.gb"):
    extract_COI(gb_file)

Running blastn with the COI sequences against NCBI's formicidae sequences:

In [None]:
%%bash
mkdir blast_results
for coi in ./coi_seqs/*; do
    echo "Running blast search for $coi..." && 
    blastn -query $coi -db nr -remote -entrez_query "Formicidae [Organism]" > ./blast_results/$(basename $coi .fa).blast;
done

In [57]:
%%bash
mkdir blast_results
if [ ! -f "36668.txids"]; then
    echo "Getting taxidlist..." && get_species_taxids.sh -t 36668 > 36668.txids
fi

for coi in ./coi_seqs/*; do
    echo "Running blast search for $coi..." && 
    blastn -query $coi -db nr -taxidlist 36668.txids -outfmt 7 -out OUTPUT.tab > ./blast_results/$(basename $coi .fa).blast;
done
#blastn -query $coi -db nr -remote -entrez_query "Formicidae [Organism]" > ./blast_results/$(basename $coi .fa).blast;

Running blast search for ./coi_seqs/Formica_fusca_coi.fa...
Running blast search for ./coi_seqs/Linepithema_humile_coi.fa...
Running blast search for ./coi_seqs/Pseudomyrmex_gracilis_coi.fa...
Running blast search for ./coi_seqs/Solenopsis_invicta_coi.fa...


mkdir: cannot create directory ‘blast_results’: File exists
bash: line 2: [: missing `]'
BLAST query/options error: Invalid taxidlist file 
Please refer to the BLAST+ user manual.
BLAST query/options error: Invalid taxidlist file 
Please refer to the BLAST+ user manual.
BLAST query/options error: Invalid taxidlist file 
Please refer to the BLAST+ user manual.
BLAST query/options error: Invalid taxidlist file 
Please refer to the BLAST+ user manual.


CalledProcessError: Command 'b'mkdir blast_results\nif [ ! -f "36668.txids"]; then\n    echo "Getting taxidlist..." && get_species_taxids.sh -t 36668 > 36668.txids\nfi\n\nfor coi in ./coi_seqs/*; do\n    echo "Running blast search for $coi..." && \n    blastn -query $coi -db nr -taxids 36668.txids -outfmt 7 -out OUTPUT.tab > ./blast_results/$(basename $coi .fa).blast;\ndone\n#blastn -query $coi -db nr -remote -entrez_query "Formicidae [Organism]" > ./blast_results/$(basename $coi .fa).blast;\n'' returned non-zero exit status 1.