A Bayesian APOBEC3G-induced hypermutation analysis tool implemented in Python.
The following examples illustrate some of the basic usage of hyperfreq.
Note that if one navigates to the root of this code base, these commands should all execute using the test data in
To run an analysis, use the
hyperfreq analyze command.
hyperfreq analyze -h will give you a full list of options.
Here are some examples to get you started and give you a rough sense of hyperfreq's capabilities.
# Simple analysis comparing each sequence to a consensus sequence constructed from the entire alignment hyperfreq analyze tests/data/alignment.fasta # Instead, compare each sequence to the consensus for a cluster specified in a clusters file hyperfreq analyze tests/data/alignment.fasta -c tests/data/clusters.csv # Specify the reference sequence(s) you want each sequence to be compared to hyperfreq analyze tests/data/alignment.fasta -r tests/data/ref_seqs.fasta
These commands will all output data to a file named
The prefix and location of this file can be specified using the
Multiple pattern analysis
By default, these analyses look for GG context hypermutation, suggestive of APOBEC3G activity.
One can specify multiple contexts for analysis using the
Pattern options include
`GG` | A3G activity `GA` | A3F (and other A3) activity in humans `GR` | combined A3G and A3F activity (as often observed in hypermutated HIV `GM` | rhesus macaque A3DE activity (as observed in XMRV and SFV infections) \* `GV` | combined rhesus A3DE and A3G activity
Note that R, M and V are IUPAC degenerate codes for A or G; A or C; and A, C or G, respectively.
When running multiple patterns, the
call.csv file contains a column called
call_pattern which represents the pattern in which the evidence of hypermutation appears to be strongest.
Other data in the
call.csv file will contain counts, and statistics specifically for the pattern considered the call pattern.
-F/--full-output flag is specified, a separate file is output for each pattern analyzed (for example
Splitting sequences for HM free alignments
Given an alignment, we can cut out sites/columns suspected of hypermutation by using the
Running this command requires specifying an alignment and a CSV file with a column named
column, specifying which positions in the alignment to be cut out.
In addition to the
hyperfreq analyze also produces a
hyperfreq_analysis.sites.csv file which has such a column, as well as information regarding which sequences were hypermutated at which sequence positions.
# for an alignment with hypermutated columns removed hyperfreq split alignment.fasta hypermutated_columns.csv
For more thorough usage, run
hyperfreq split -h at the command line.
If you want to write your own scripts, you can do so by importing the appropriate modules
from hyperfreq import Alignment from Bio import SeqIO # Create a hyperfreq alignment object seqs = SeqIO.parse('some_file.fasta', 'fasta') aln = Alignment(seqs) # Obtain an analysis generator which can be iterated over. analysis = aln.analyze() # Iterate over each sequence in the analysis, and do whatever you like! for seq_result in analysis: print seq_result['sequence'], "hm status:", seq_result['hm_pos']
It's also possible to define your own mutation patterns using the
It may be possible in the future to more flexibly specify patterns more flexibly via the CLI, but for now, doing so requires using this code base as a library in writing your own scripts.
If having this functionality available via the command line is important to you, please submit an issue, and we'll see what we can do.
It is also now possible to run a
hyperfreq analyze command with the
-N / --interactive flag.
This causes the program to load up an interactive python session with the
analysis generator, instead of simply writing the results to file.
You can enter
dir() to see what namespaces and data have been included for you, and obviously load any libraries (numpy, biopython, etc.) that might be helpful.
Do note though that
analysis is a generator, so if you need to make more than one pass through the results interactively, you'll need to throw them into an list.
analysis = [result for result in analysis]
Currently, Hyperfreq has only been tested on Linux systems, but it should be possible to get it set up on OSX fairly easily. If you do get it set up on OSX or Windows and have any tips to share, please feel free to add a page to the wiki.
Hyperfreq depends on the following python libraries
We recommend installing betarat and alnclst first, using the directions in the links above. Since alnclst requires biopython, you should now only have to install fisher. Assuming you have pip and the python-dev libraries installed (which you should after following the instructions above), you should be able to run
# (sudo may not be necessary, depending on how you set up your python environment) sudo pip install fisher
With that out of the way...
Now you should be able to download and install hyperfreq
# I like to download things to a src directory in my home folder mkdir -p ~/src cd ~/src # Download and unzip the source code wget https://github.com/fhcrc/hyperfreq/archive/betarat-refactor.zip -O hyperfreq.zip unzip hyperfreq.zip cd hyperfreq # Install (sudo may not be necessary, depending on how you set up your python environment) sudo python setup.py install
And there you have it!
You can try running
hyperfreq -h from the command line to test your installation.
If you have any trouble installing, please submit an issue, so we can try to help and update the documentation.