Skip to content
Script that labels phylogenetic clades based on the clade of the nearest neighbor using patristic distances determined from the tree.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
manuscript update with preprint May 31, 2019
reference_data reference and sample data update Jun 6, 2019
Dockerfile Docker and perl Jun 14, 2019 Add clade annotation to original query file script Apr 26, 2019
nn_classifier.R update Mar 14, 2019

octoFLU: Automated classification to evolutionary origin of influenza A virus gene sequences detected in U.S. swine

Docker Automated build DockerHub Pulls


Determines evolutionary origin of influenza A virus genes through inference of maximum likelihood tree and then assignment of a defined genetic clade based on nearest neighbor determined by patristic distances.

This tool has been tested on swine H1 and H3 data (i.e., collected from 2010 to present), sequence from other serotypes, sequence that is too short, or sequence that is collected from outside North America may generate incorrect results. We suggest you use the IRD Sequence Annotation tool prior to running this pipeline.

We also recommend that output from the automatic classification be interpreted conservatively, and that more comprehensive phylogenetic analyses may be required for accurate determination of evolutionary history. This pipeline generates a phylogeny using a limited set of reference sequences and annotates the queries based upon the "nearest neighbor." If query sequences are very dissimilar to the annotated reference set (e.g., swine H1 sequence from the 1990s, or swine data collected in Euope or Asia) they are likely to be misclassified.

If you use this pipeline or the curated reference datasets in your work, please cite this:

Chang, J.+, Anderson, T.K.+, Zeller, M.A.+, Gauger, P.C., Vincent, A.L. octoFLU: Automated classification to evolutionary origin of influenza A virus gene sequences detected in U.S. swine. +These authors contributed equally.


Unaligned fasta with query sequences (e.g., strain name with protein segment identifier).


  • Text output stating the query name, protein symbol, genetic clade or evolutionary lineage.
  • Text output holding the query name and top BLASTn hit.
  • Inferred maximum likelihood trees with reference gene sets and queries.


bash sample_data/query_sample.fasta

Running the pipeline

Edit the paths in You will need to have an installation of

# Connect your reference dataset here

# Connect your programs here, can use full path names
MAFFT=`which mafft`

Then run the pipeline

bash sample_data/query_sample.fasta

The output will be in a *_Final_Output.txt file and *_output folder, any trees generated will be listed and named by protein symbol, and blast_output.txt includes the query genes and their top BLASTn hit.

The main bottleneck is waiting for trees to run in FastTree (an installation of multi-threaded version helps). A sampling of the output is included, split by ....

bash sample_data/query.fasta

less query.fasta_Final_Output.txt

QUERY_MH540411_A/swine/Iowa/A02169143/2018		    H1	pdm		1A.3.3.2 
QUERY_MH595470_A/swine/South_Dakota/A02170160/2018	H1	delta1	1B.2.2.2 
QUERY_MH595472_A/swine/Illinois/A02170163/2018		H1	alpha	1A.1.1 
QUERY_MH546131_A/swine/Minnesota/A01785562/2018		H3	2010-human_like	3.2010.1 
QUERY_MH561745_A/swine/Minnesota/A01785568/2018		H3	2010-human_like	3.2010.1 
QUERY_MH551260_A/swine/Iowa/A02016898/2018			H3	2010-human_like	3.2010.1 
QUERY_MH551259_A/swine/Iowa/A02016897/2018			N1	classicalSwine 
QUERY_MH561752_A/swine/Minnesota/A01785574/2018		N1	classicalSwine 
QUERY_MH551263_A/swine/Minnesota/A02016891/2018		N1	classicalSwine 
QUERY_MK024152_A/swine/Minnesota/A01785613/2018		N2	1998 
QUERY_MH976804_A/swine/Michigan/A01678583/2018		N2	1998
QUERY_MH595471_A/swine/South_Dakota/A02170160/2018	N2	2002 
QUERY_MH922882_A/swine/Ohio/18TOSU4536/2018		M	pdm 
QUERY_MK321295_A/swine/Florida/A01104129/2018	M	pdm
QUERY_MK129490_A/swine/Illinois/A02170163/2018	M	pdm
QUERY_MK185286_A/swine/Iowa/A02016889/2018	PB1	TRIG 
QUERY_MK185322_A/swine/Iowa/A02169143/2018	PB1	pdm
QUERY_MK039744_A/swine/Iowa/A02254795/2018	PB1	TRIG


Start the Docker deamon and navigate to your query file location.

cd mydataset/
docker pull flucrew/octoflu
docker run -it -v ${PWD}:/data octoflu:latest /bin/bash

From inside the docker image you should be able to run the pipeline. Remember to copy files to /data to pull them out of the docker image to your computer.

docker > bash sample_data/query_sample.fasta
docker > cp -rf query_sample.fasta_output /data/.
docker > exit 


Singularity and Docker are friends. A singularity image can be built using singularity pull.

singularity pull docker://flucrew/octoflu

Future Considerations

  • Reannotate the tree with NN-clades for ease of use.
  • Integrate the mergeGC.R script to combine gene assignments to a whole genome constellation descriptor.
  • Annotate input sequences with gene classification, and use these designations in the inferred phylogenetic trees.
You can’t perform that action at this time.