# Introduction

This notebook demonstrates the effect of setting different values for the --max_gap_between_tblastn_hsps option for the sum_fwd_srch command. This option is essentially the maximum number of basepairs allowed between potential exons (TBLASTN HSPs) for them to be considered as potentially part of the same gene. Higher values for this option (e.g., 10,000bp) will accomodate genes with very long introns while lower values (e.g., 1,000bp) will allow inclusive detection of homologous genes that are adjacent on a genomic sequence.

In this demonstration, *Arabidopsis thaliana* genes encoding 1) the Adaptor Protein 2 complex alpha subunit (AT5G22770 and AT5G22780) and 2) the Coatomer Protein I complex beta subunit (AT4G31480 and AT4G31490) are used as examples, because in both cases the gene paralogues are positioned adjacent to each other in the genome (on chromosome 5 and chromosome 4, respectively).

These gene loci can be easily visualized using NCBI's genome viewer:
https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?id=GCF_000001735.4

# Preliminary steps

## Import some basic python modules

In [1]:
import os
import sys
import time
import platform
import subprocess
from Bio import SeqIO
from Bio import Entrez
import glob
from Bio.Blast import NCBIXML
import pandas as pd
from IPython.display import display, HTML, Image
sys.path.append('/opt/amoebae')

## Record name of this notebook

In [2]:
%%javascript
// Define relative path to current notebook file.
var nb = IPython.notebook;
var kernel = IPython.notebook.kernel;
var command = "NOTEBOOK_PATH = '" + nb.notebook_path + "'";
kernel.execute(command);

<IPython.core.display.Javascript object>

In [7]:
# Define path name of current notebook file.
current_notebook = os.path.basename(NOTEBOOK_PATH)
print("Notebook name:\n", current_notebook)

Notebook name:
 demo_max_gap_between_tblastn_hsps_option.ipynb


## Record the specific version of AMOEBAE code used

In [8]:
# Record git repository version information.
wd = ["/opt/amoebae"]
script_dir = wd[0] 
git_hash = str(subprocess.check_output(["git", "rev-parse", "HEAD"], cwd=script_dir).strip())
git_branch = str(subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"], cwd=script_dir).strip())  
print('\nGit repository (code) version: ' + git_hash + ' (branch name: ' + git_branch + ')\n')


Git repository (code) version: b'b1f12fb92c94e5165a87d3bdb7a8d774bac37f82' (branch name: b'master')



## Make a subdirectory to store output.

In [9]:
subdir = current_notebook.rsplit('.', 1)[0] + '_output'

In [10]:
%%bash -s "$subdir"
mkdir $1

In [11]:
%cd {subdir}

/opt/200328_ALYS_Fix_missing_tblastn_hits/demo_max_gap_between_tblastn_hsps_option_output


# Set up sequence databases for searching

## Download peptide and nucleotide sequences for the *Arabidopsis thaliana* genome

In [12]:
%%time

# Initiate a list of file paths for downloaded sequence and annotation files.
datafile_path_list = []

# Define a dictionary of source URLs and new filenames for sequence and annotation files.
# Note that the filenames (besides extension) are the species name with underscores instead of spaces.
datafile_dict = {"Arabidopsis_thaliana.faa": "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1/GCF_000001735.4_TAIR10.1_protein.faa.gz",
                 "Arabidopsis_thaliana.fna": "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1/GCF_000001735.4_TAIR10.1_genomic.fna.gz",
                 "Arabidopsis_thaliana.gff3": "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1/GCF_000001735.4_TAIR10.1_genomic.gff.gz",
          }

# Make a new temporary directory to store data files.
temp_db_dir_name = 'temporary_db_dir'
if not os.path.isdir(temp_db_dir_name):
    os.mkdir(temp_db_dir_name)

# Download all the data files via NCBI's FTP server.
for filename in datafile_dict.keys():
    url = datafile_dict[filename]
    filepath = os.path.join(temp_db_dir_name, filename)
    if not os.path.isfile(filepath):
        subprocess.call(['curl', url, '--output', filepath + '.gz'])
        subprocess.call(['gunzip', filepath + '.gz'])

CPU times: user 952 µs, sys: 19.8 ms, total: 20.7 ms
Wall time: 46.7 s


## Initiate a data directory structure
To generate a directory structure and spreadsheets for storing formatted sequence files
and metadata for each sequence file, use the 'mkdatadir' command (this takes a
single argument which is the full path that you want your new directory to be
written to):

In [13]:
%env DATADIR=AMOEBAE_Data

env: DATADIR=AMOEBAE_Data


In [14]:
%%bash
amoebae mkdatadir $DATADIR


        
        To allow AMOEBAE scripts to locate your new data directory, change the
        value of the root_amoebae_data_dir variable in the settings.py file to
        the full path to the directory:

        AMOEBAE_Data
        


In [15]:
# Check that the path indicated in the settings file is correct.
import settings
print(settings.root_amoebae_data_dir)
assert settings.root_amoebae_data_dir == "AMOEBAE_Data"

AMOEBAE_Data


## Prepare databases for searching

In [16]:
%%bash
SECONDS=0

for X in temporary_db_dir/*; do amoebae add_to_dbs $X; done

ELAPSED="Preparing sequence databases for searching took the following amount of time: $(($SECONDS / 3600))hrs $((($SECONDS / 60) % 60))min $(($SECONDS % 60))sec"
echo $ELAPSED



Building a new DB, current time: 03/29/2020 18:10:07
New DB name:   /opt/200328_ALYS_Fix_missing_tblastn_hits/demo_max_gap_between_tblastn_hsps_option_output/AMOEBAE_Data/Genomes/Arabidopsis_thaliana.faa
New DB title:  AMOEBAE_Data/Genomes/Arabidopsis_thaliana.faa
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 48265 sequences in 4.96952 seconds.


Creating SSI index for AMOEBAE_Data/Genomes/Arabidopsis_thaliana.faa...    done.
Indexed 48265 sequences (48265 names).
SSI index written to file AMOEBAE_Data/Genomes/Arabidopsis_thaliana.faa.ssi


Building a new DB, current time: 03/29/2020 18:10:18
New DB name:   /opt/200328_ALYS_Fix_missing_tblastn_hits/demo_max_gap_between_tblastn_hsps_option_output/AMOEBAE_Data/Genomes/Arabidopsis_thaliana.fna
New DB title:  AMOEBAE_Data/Genomes/Arabidopsis_thaliana.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 7 sequences in 2.85738 

In [17]:
%%bash
# List the databases now accessible by AMOEBAE.
amoebae list_dbs

Arabidopsis_thaliana.faa
Arabidopsis_thaliana.fna


# Set up queries

## Enter your email to access the NCBI protein database via NCBI Entrez

In [18]:
# Comment out this line and use the line at the bottom of this cell instead, if you want to run all cells at once.
#Entrez.email = input("Enter your email address here: ")  # Tell NCBI who you are.

# Use the line at the top of this cell instead.
#Entrez.email = "yourname@email.com"
Entrez.email = "lael@ualberta.ca"

## Download single-sequence queries

In [19]:
%%time

# Define a dictionary with NCBI sequence accessions as keys and filenames to write
# the corresponding sequences to as values.
query_dict = {"NP_851058.1": "AP2alpha_Athaliana_NP_851058.1_query.faa",
              "NP_001320104.1": "COPIbeta_Athaliana_NP_001320104.1_query.faa"
          }

# Make a new temporary directory to store sequence files.
temp_query_dir_name = 'temporary_query_dir'
if not os.path.isdir(temp_query_dir_name):
    os.mkdir(temp_query_dir_name)

# Loop over keys in the query_dict dictionary.
for accession in query_dict.keys():
    # Retrieve the corresponding filename from the dictionary.
    filename = query_dict[accession]
    filepath = os.path.join(temp_query_dir_name, filename)
    # Only download sequences that have not already been downloaded.
    if not os.path.isfile(filepath):
        # Download the sequence from NCBI via Entrez, using the Biopython module.
        net_handle = Entrez.efetch(db="protein", id=accession, rettype="fasta", retmode="text")
        out_handle = open(filepath, "w")
        out_handle.write(net_handle.read())
        out_handle.close()
        net_handle.close()
    # Check that the sequence was actually downloaded.
    assert os.path.isfile(filepath), """The sequence with the following accession could not be downloaded from NCBI: %s\n
    Try re-running this cell.""" % accession

CPU times: user 34.4 ms, sys: 9.39 ms, total: 43.8 ms
Wall time: 1.25 s


## Prepare single-sequence queries for searching

In [20]:
%%bash
SECONDS=0

for QUERYFILE in temporary_query_dir/*.faa; do amoebae add_to_queries $QUERYFILE; done

ELAPSED="Preparing query sequences for searching took the following amount of time: $(($SECONDS / 3600))hrs $((($SECONDS / 60) % 60))min $(($SECONDS % 60))sec"
echo $ELAPSED

Preparing query sequences for searching took the following amount of time: 0hrs 0min 2sec


In [21]:
%%bash
amoebae list_queries

AP2alpha_Athaliana_NP_851058.1_query.faa
COPIbeta_Athaliana_NP_001320104.1_query.faa


# Run forward searches

In [22]:
%env SRCHRESDIR=AMOEBAE_Search_Results_1

env: SRCHRESDIR=AMOEBAE_Search_Results_1


In [23]:
%%bash
# Make a new directory to contain search results.
mkdir $SRCHRESDIR
# Write query and database list files.
amoebae list_queries > $SRCHRESDIR/queries.txt
amoebae list_dbs > $SRCHRESDIR/databases.txt

In [24]:
%%bash
# Optional. Get the help output for the setup_fwd_srch command.
amoebae setup_fwd_srch -h

usage: amoebae [-h] [--outdir OUTDIR] srch_dir query_list_file db_list_file

Make a directory in which to write output files from similarity searches.

positional arguments:
  srch_dir         Path to directory that will contain output directory as a
                   subdirectory.
  query_list_file  Path to file with list of queries to search with.
  db_list_file     Path to file with list of databases to search with.

optional arguments:
  -h, --help       show this help message and exit
  --outdir OUTDIR  Path to directory to put search results into (so that this
                   step can be piped together with other commands). (default:
                   None)

Note: Use the bash script to run forward searches on a remote server.


In [25]:
%env FWDSRCHDIR=fwd_srch_1

env: FWDSRCHDIR=fwd_srch_1


In [26]:
%%bash
# Set up forward searches.
amoebae setup_fwd_srch $SRCHRESDIR\
                       $SRCHRESDIR/queries.txt\
                       $SRCHRESDIR/databases.txt\
                       --outdir $SRCHRESDIR/$FWDSRCHDIR

In [27]:
%%bash
SECONDS=0

# Run forward searches. This could take a while.
amoebae run_fwd_srch $SRCHRESDIR/$FWDSRCHDIR

ELAPSED="Running forward searches took the following amount of time: $(($SECONDS / 3600))hrs $((($SECONDS / 60) % 60))min $(($SECONDS % 60))sec"
echo $ELAPSED

Running forward searches took the following amount of time: 0hrs 0min 6sec


# Summarize forward search results

Now we can generate a summary of the raw output files. Important criteria may be customized here as well. Specifically the forward search E-value threshold, and the maximum number of nucleotide bases allowed between TBLASTN HSPs to be considered part of the same gene (view optional arguments via the -h option).

In [28]:
%%bash
amoebae sum_fwd_srch -h

usage: amoebae [-h] [--max_evalue MAX_EVALUE]
               [--max_gap_between_tblastn_hsps MAX_GAP_BETWEEN_TBLASTN_HSPS]
               [--do_not_use_exonerate]
               [--exonerate_score_threshold EXONERATE_SCORE_THRESHOLD]
               [--max_hits_to_sum MAX_HITS_TO_SUM]
               fwd_srch_out csv_file

Append information about forward searches to csv summary file (this is used to
organize reverse searches). For TBLASTN searches (protein queries, nucleotide
target sequences), HSPs are clustered into groups that are close enough within
the target sequence to potentially represent exons from the same coding
sequence. The nucleotide subsequences in which these clusters of HSPs are
found are then analyzed using exonerate to identify and translate potential
exons, in "protein2genome" mode, because exonerate, unlike TBLASTN, attempts
to identify exon boundaries, yielding translations that are less likely to
include translations of non-coding regions outside exons (which mig

Summarize forward searches using 10,000bp as the value for the --max_gap_between_tblastn_hsps option:

In [29]:
%%time
# Summarize forward search results in a CSV file.
# ***Note that only the top 5 hits for each individual search will be reported, as specified here. 
# This is simply to save time, and previous analyses have confirmed that the number of positive hits will not exceed 5 for any of the searches.
!amoebae sum_fwd_srch $SRCHRESDIR/$FWDSRCHDIR\
                     $SRCHRESDIR/$FWDSRCHDIR'_sum1.csv'\
                     --max_gap_between_tblastn_hsps 100000 \
                     --max_hits_to_sum 5
                    



            improve translation of sequences identified by TBLASTN. If you do not
            want to do this, then use the --do_not_use_exonerate option.


Result 1 of 4
Extracting information from search result file AP2alpha_Athaliana_NP_851058.1_query__Arabidopsis_thaliana_faa_srch_out.txt
Result 2 of 4
Extracting information from search result file AP2alpha_Athaliana_NP_851058.1_query__Arabidopsis_thaliana_fna_srch_out.txt

	Search program was tblastn.
	Checking number of distinct genes represented by HSPs.

	Query: NP_851058.1
	Hit 1: NC_003076.8 "NC_003076.8 Arabidopsis thaliana chromosome 5 sequence"
	HSP positions in subject sequence (1 dot = 179836 bp):
	 0                                                                                                                                                    26975502
	 v                                                                                                                                                    v
	 ............

	Hit 1 HSP cluster 1:
	HSP positions in subject sequence (1 dot = 119 bp):
	7579846                                                                                                                                              7597828
	v                                                                                                                                                    v
	......................................................................................................................................................
	#.....................................................................................................................................................  7579846..7579996, minus, 7.2242e-18
	......................................................................................................................................................  7580088..7580175, minus, 7.2242e-18
	.............................................................................

	Hit 2 HSP cluster 1:
	HSP positions in subject sequence (1 dot = 9 bp):
	11360098                                                                                                                                             11361493
	v                                                                                                                                                    v
	......................................................................................................................................................
	#################################################################.....................................................................................  11360098..11360707, plus, 6.08597e-20
	.............................................................................................................................########################.  11361265..11361493, plus, 1.07255e-05


	Hit 2 HSP cluster 2:
	HSP positions in subject sequence (1 dot = 6 bp):
	180381

	Hit 2 HSP cluster 1:
	HSP positions in subject sequence (1 dot = 5 bp):
	18038121                                                                                                                                             18038937
	v                                                                                                                                                    v
	......................................................................................................................................................
	######################################################################################################################################################  18038121..18038937, minus, 5.98903e-05


Could not identify FASTA sequence in exonerate output file AMOEBAE_Search_Results_1/fwd_srch_1/COPIbeta_Athaliana_NP_001320104.1_query__Arabidopsis_thaliana_fna_srch_out_subject_subseq_NC_003070.9_18038121-18038937_exonerate_out.txt
Writing dataframe to csv file


Forwa

In [30]:
# Load data from the CSV file using the pandas library.
df = pd.read_csv(os.path.join(os.environ['SRCHRESDIR'],os.environ['FWDSRCHDIR']) + '_sum1.csv_out.csv')
# Display the data in an HTML table.
display(HTML(df.to_html()))

Unnamed: 0,Query title,Query file,Query species (if applicable),Query database name,Query accession (if applicable),Query description,Query length,Subject database species (if applicable),Subject database file,Forward search method,Forward hit rank,Forward hit score,Forward hit score difference from top hit score,Forward hit E-value (top HSP),Forward hit E-value (top HSP) order of magnitude difference compared to top hit,Forward hit length,Forward hit length as a percentage of query length,Forward hit percent query cover,Forward hit accession,Forward hit description,Forward hit sequence,Forward hit coordinates of subsequence(s) that align(s) to query,Forward hit description of subsequence(s) that align(s) to query,Forward hit subsequence(s) that align(s) to query,Proximity (bp) to end of subject sequence (if searching in nucleotide sequences),Positive/redundant (+) or negative (-) hit based on E-value criterion
0,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,1,5396.0,0,0.0,0,1012,100,100,NP_851058.1,"""NP_851058.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_851058.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
1,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,2,5396.0,0,0.0,0,1012,100,100,NP_851057.1,"""NP_851057.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_851057.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
2,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,3,5396.0,0,0.0,0,1012,100,100,NP_197669.1,"""NP_197669.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_197669.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
3,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,4,5396.0,0,0.0,0,1012,100,100,NP_001330971.1,"""NP_001330971.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_001330971.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
4,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,5,5396.0,0,0.0,0,1012,100,100,NP_001330970.1,"""NP_001330970.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_001330970.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
5,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,6,5396.0,0,0.0,0,1012,100,100,NP_001330969.1,"""NP_001330969.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_001330969.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
6,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.fna,tblastn 2.10.0+,1,211.075,325,2.135e-108,0,1012,100,100,NC_003076.8,"""NC_003076.8 Arabidopsis thaliana chromosome 5 sequence""",-,"[[7579847,7579997],[7580090,7580169],[7580330,7580401],[7580488,7580568],[7580777,7580902],[7580980,7581066],[7581171,7581335],[7581440,7581520],[7581811,7582044],[7582165,7582329],[7582430,7582550],[7582660,7582764],[7582846,7582952],[7583429,7583494],[7583578,7583683],[7583826,7583914],[7584048,7584129],[7584676,7584773],[7584873,7584974],[7585363,7585455],[7585536,7585607],[7585806,7586013],[7586094,7586191],[7586274,7586336],[7586550,7586675],[7587027,7587158],[7587901,7588026]]","NC_003076.8 [[7579847,7579997],[7580090,7580169],[7580330,7580401],[7580488,7580568],[7580777,7580902],[7580980,7581066],[7581171,7581335],[7581440,7581520],[7581811,7582044],[7582165,7582329],[7582430,7582550],[7582660,7582764],[7582846,7582952],[7583429,7583494],[7583578,7583683],[7583826,7583914],[7584048,7584129],[7584676,7584773],[7584873,7584974],[7585363,7585455],[7585536,7585607],[7585806,7586013],[7586094,7586191],[7586274,7586336],[7586550,7586675],[7587027,7587158],[7587901,7588026]]",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,7579846,+
7,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.fna,tblastn 2.10.0+,2,98.5969,437,6.08597e-20,88,76,8,8,NC_003070.9,"""NC_003070.9 Arabidopsis thaliana chromosome 1 sequence""",-,"[[11361266,11361493]]","NC_003070.9 [[11361266,11361493]]",VATVQQDPDDTLKRKTFELLYKMTKSSNVEVIVDRMIDYMISINDNHYKTEIASRCVELAEQFAPSNQWFIQVASQ,11360098,+
8,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.fna,tblastn 2.10.0+,3,87.8113,448,1.05159e-16,92,22,2,2,NC_003070.9,"""NC_003070.9 Arabidopsis thaliana chromosome 1 sequence""",-,"[[18039208,18039273]]","NC_003070.9 [[18039208,18039273]]",RESSMSSSSTSIMDNLFQRSLE,12388569,+
9,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.fna,tblastn 2.10.0+,4,77.7962,458,1.21404e-13,95,99,10,10,NC_003070.9,"""NC_003070.9 Arabidopsis thaliana chromosome 1 sequence""",-,"[[8441772,8442068]]","NC_003070.9 [[8441772,8442068]]",MIRAIRACKTAAEERAVVRKECADIRALINEDDPHDRHRNLAKLMFIHMLGYPTHFGQMECLKLIASPGFPEKRIGYLGLMLLLDERQEVLMLVTNSLK,8441771,+


Summarize forward searches using 1,000bp as the value for the --max_gap_between_tblastn_hsps option. Notice that now there are two TBLASTN hits listed for AP2 alpha on Chromosome 5 and two for COPI beta on Chromosome 4.

In [31]:
%%time
# Summarize forward search results in a CSV file.
# ***Note that only the top 5 hits for each individual search will be reported, as specified here. 
# This is simply to save time, and previous analyses have confirmed that the number of positive hits will not exceed 5 for any of the searches.
!amoebae sum_fwd_srch $SRCHRESDIR/$FWDSRCHDIR\
                     $SRCHRESDIR/$FWDSRCHDIR'_sum2.csv'\
                     --max_gap_between_tblastn_hsps 1000 \
                     --max_hits_to_sum 5
                    



            improve translation of sequences identified by TBLASTN. If you do not
            want to do this, then use the --do_not_use_exonerate option.


Result 1 of 4
Extracting information from search result file AP2alpha_Athaliana_NP_851058.1_query__Arabidopsis_thaliana_faa_srch_out.txt
Result 2 of 4
Extracting information from search result file AP2alpha_Athaliana_NP_851058.1_query__Arabidopsis_thaliana_fna_srch_out.txt

	Search program was tblastn.
	Checking number of distinct genes represented by HSPs.

	Query: NP_851058.1
	Hit 1: NC_003076.8 "NC_003076.8 Arabidopsis thaliana chromosome 5 sequence"
	HSP positions in subject sequence (1 dot = 179836 bp):
	 0                                                                                                                                                    26975502
	 v                                                                                                                                                    v
	 ............

	Hit 1 HSP cluster 1:
	HSP positions in subject sequence (1 dot = 54 bp):
	7579846                                                                                                                                              7588026
	v                                                                                                                                                    v
	......................................................................................................................................................
	##....................................................................................................................................................  7579846..7579996, minus, 7.2242e-18
	....#.................................................................................................................................................  7580088..7580175, minus, 7.2242e-18
	........#.....................................................................

	Hit 2 HSP cluster 1:
	HSP positions in subject sequence (1 dot = 9 bp):
	11360098                                                                                                                                             11361493
	v                                                                                                                                                    v
	......................................................................................................................................................
	#################################################################.....................................................................................  11360098..11360707, plus, 6.08597e-20
	.............................................................................................................................########################.  11361265..11361493, plus, 1.07255e-05


	Hit 2 HSP cluster 2:
	HSP positions in subject sequence (1 dot = 6 bp):
	180381

Could not identify FASTA sequence in exonerate output file AMOEBAE_Search_Results_1/fwd_srch_1/COPIbeta_Athaliana_NP_001320104.1_query__Arabidopsis_thaliana_fna_srch_out_subject_subseq_NC_003070.9_18038121-18038937_exonerate_out.txt
Writing dataframe to csv file


Forward search results written/appended to
                spreadsheet:

	AMOEBAE_Search_Results_1/fwd_srch_1_sum2.csv

CPU times: user 1.42 s, sys: 690 ms, total: 2.11 s
Wall time: 50.2 s


In [32]:
# Load data from the CSV file using the pandas library.
df = pd.read_csv(os.path.join(os.environ['SRCHRESDIR'],os.environ['FWDSRCHDIR']) + '_sum2.csv_out.csv')
# Display the data in an HTML table.
display(HTML(df.to_html()))

Unnamed: 0,Query title,Query file,Query species (if applicable),Query database name,Query accession (if applicable),Query description,Query length,Subject database species (if applicable),Subject database file,Forward search method,Forward hit rank,Forward hit score,Forward hit score difference from top hit score,Forward hit E-value (top HSP),Forward hit E-value (top HSP) order of magnitude difference compared to top hit,Forward hit length,Forward hit length as a percentage of query length,Forward hit percent query cover,Forward hit accession,Forward hit description,Forward hit sequence,Forward hit coordinates of subsequence(s) that align(s) to query,Forward hit description of subsequence(s) that align(s) to query,Forward hit subsequence(s) that align(s) to query,Proximity (bp) to end of subject sequence (if searching in nucleotide sequences),Positive/redundant (+) or negative (-) hit based on E-value criterion
0,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,1,5396.0,0,0.0,0,1012,100,100,NP_851058.1,"""NP_851058.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_851058.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
1,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,2,5396.0,0,0.0,0,1012,100,100,NP_851057.1,"""NP_851057.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_851057.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
2,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,3,5396.0,0,0.0,0,1012,100,100,NP_197669.1,"""NP_197669.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_197669.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
3,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,4,5396.0,0,0.0,0,1012,100,100,NP_001330971.1,"""NP_001330971.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_001330971.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
4,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,5,5396.0,0,0.0,0,1012,100,100,NP_001330970.1,"""NP_001330970.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_001330970.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
5,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.faa,blastp 2.10.0+,6,5396.0,0,0.0,0,1012,100,100,NP_001330969.1,"""NP_001330969.1 alpha-adaptin [Arabidopsis thaliana]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,"[[0,1012]]","""NP_001330969.1 alpha-adaptin [Arabidopsis thaliana] [[0, 1012]]""",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,-,+
6,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.fna,tblastn 2.10.0+,1,211.075,325,2.135e-108,0,1012,100,100,NC_003076.8,"""NC_003076.8 Arabidopsis thaliana chromosome 5 sequence""",-,"[[7579847,7579997],[7580090,7580169],[7580330,7580401],[7580488,7580568],[7580777,7580902],[7580980,7581066],[7581171,7581335],[7581440,7581520],[7581811,7582044],[7582165,7582329],[7582430,7582550],[7582660,7582764],[7582846,7582952],[7583429,7583494],[7583578,7583683],[7583826,7583914],[7584048,7584129],[7584676,7584773],[7584873,7584974],[7585363,7585455],[7585536,7585607],[7585806,7586013],[7586094,7586191],[7586274,7586336],[7586550,7586675],[7587027,7587158],[7587901,7588026]]","NC_003076.8 [[7579847,7579997],[7580090,7580169],[7580330,7580401],[7580488,7580568],[7580777,7580902],[7580980,7581066],[7581171,7581335],[7581440,7581520],[7581811,7582044],[7582165,7582329],[7582430,7582550],[7582660,7582764],[7582846,7582952],[7583429,7583494],[7583578,7583683],[7583826,7583914],[7584048,7584129],[7584676,7584773],[7584873,7584974],[7585363,7585455],[7585536,7585607],[7585806,7586013],[7586094,7586191],[7586274,7586336],[7586550,7586675],[7587027,7587158],[7587901,7588026]]",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYLDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTISTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAMVLADQQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSNEQHGPVGAEGVPDEVDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGANVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSESTGAILCLARIETDPADRTQLRMTVGTGDPTLTFELKEFIKEQLITVPMGSRALVPAAGPAPPVAQPPSPAALADDPGAMLAGLL,7579846,+
7,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.fna,tblastn 2.10.0+,2,211.46,325,3.04809e-108,0,1013,100,100,NC_003076.8,"""NC_003076.8 Arabidopsis thaliana chromosome 5 sequence""",-,"[[7590103,7590256],[7590343,7590422],[7590538,7590609],[7590696,7590776],[7590996,7591121],[7591209,7591295],[7591400,7591564],[7591668,7591748],[7592051,7592284],[7592405,7592569],[7592670,7592790],[7592900,7593004],[7593086,7593192],[7593365,7593430],[7593515,7593620],[7593743,7593831],[7593933,7594014],[7594586,7594683],[7594797,7594898],[7595141,7595233],[7595308,7595379],[7595545,7595752],[7595834,7595931],[7596017,7596079],[7596295,7596420],[7596793,7596924],[7597703,7597828]]","NC_003076.8 [[7590103,7590256],[7590343,7590422],[7590538,7590609],[7590696,7590776],[7590996,7591121],[7591209,7591295],[7591400,7591564],[7591668,7591748],[7592051,7592284],[7592405,7592569],[7592670,7592790],[7592900,7593004],[7593086,7593192],[7593365,7593430],[7593515,7593620],[7593743,7593831],[7593933,7594014],[7594586,7594683],[7594797,7594898],[7595141,7595233],[7595308,7595379],[7595545,7595752],[7595834,7595931],[7596017,7596079],[7596295,7596420],[7596793,7596924],[7597703,7597828]]",MTGMRGLSVFISDVRNCQNKEAERLRVDKELGNIRTCFKNEKVLTPYKKKKYVWKMLYIHMLGYDVDFGHMEAVSLISAPKYPEKQVGYIVTSCLLNENHDFLKLAINTVRNDIIGRNETFQCLALTLVGNIGGRDFAESLAPDVQKLLISSSCRPLVRKKAALCLLRLFRKNPDAVNVDGWADRMAQLLDERDLGVLTSSTSLLVALVSNNHEAYSSCLPKCVKILERLARNQDVPQEYTYYGIPSPWLQVKAMRALQYFPTIEDPSTRKALFEVLQRILMGTDVVKNVNKNNASHAVLFEALSLVMHLDAEKEMMSQCVALLGKFISVREPNIRYLGLENMTRMLMVTDVQDIIKKHQSQIITSLKDPDISIRRRALDLLYGMCDVSNAKDIVEELLQYLSTAEFSMREELSLKAAILAEKFAPDLSWYVDVILQLIDKAGDFVSDDIWFRVVQFVTNNEDLQPYAASKAREYMDKIAIHETMVKVSAYILGEYGHLLARQPGCSASELFSILHEKLPTVSTPTIPILLSTYAKLLMHAQPPDPELQKKVWAVFKKYESCIDVEIQQRAVEYFELSKKGPAFMDVLAEMPKFPERQSSLIKKAENVEDTADQSAIKLRAQQQPSNAIVLADPQPVNGAPPPLKVPILSGSTDPESVARSLSHPNGTLSNIDPQTPSPDLLSDLLGPLAIEAPPGAVSYEQHGPVGAEGVPDEIDGSAIVPVEEQTNTVELIGNIAERFHALCLKDSGVLYEDPHIQIGIKAEWRGHHGRLVLFMGNKNTSPLTSVQALILPPAHLRLDLSPVPDTIPPRAQVQSPLEVMNIRPSRDVAVLDFSYKFGTNVVSAKLRIPATLNKFLQPLQLTSEEFFPQWRAISGPPLKLQEVVRGVRPLALPEMANLFNSFHVTICPGLDPNPNNLVASTTFYSETTGAMLCLARIETDPADRTQLRLTVGSGDPTLTFELKEFIKEQLITIPMGSRALVPAAGPAPSPAVQPPSPAALADDPGAMLAGLL,7590102,+
8,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.fna,tblastn 2.10.0+,3,98.5969,437,6.08597e-20,88,76,8,8,NC_003070.9,"""NC_003070.9 Arabidopsis thaliana chromosome 1 sequence""",-,"[[11361266,11361493]]","NC_003070.9 [[11361266,11361493]]",VATVQQDPDDTLKRKTFELLYKMTKSSNVEVIVDRMIDYMISINDNHYKTEIASRCVELAEQFAPSNQWFIQVASQ,11360098,+
9,AP2alpha,AP2alpha_Athaliana_NP_851058.1_query.faa,-,-,NP_851058.1,alpha-adaptin [Arabidopsis thaliana],1012,Arabidopsis thaliana,Arabidopsis_thaliana.fna,tblastn 2.10.0+,4,87.8113,448,1.05159e-16,92,22,2,2,NC_003070.9,"""NC_003070.9 Arabidopsis thaliana chromosome 1 sequence""",-,"[[18039208,18039273]]","NC_003070.9 [[18039208,18039273]]",RESSMSSSSTSIMDNLFQRSLE,12388569,+
