GPU accelerated FFT-based multiple sequence alinger
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
doc
.gitignore
README.md
aligner.py
dataObj.py
pyFFTalign.py
test.py
testdataGen.py

README.md

gpuFFTMSA

GPU accelerated FFT-based multiple sequence alinger.

Requirements:

For the CPU portion of the code to work, only python, numpy, pylab(for debug) need to be installed.

For GPU portion, the above need to be available, as well as pyCUDA and pyFFT.CUDA

Usage:

  • if pyFFTalign.py is run without arguments it will gather default values out of ./data
usage: pyFFTalign.py [-h] [-i INPUT_GENOME] [-s INPUT_SEQS]
                     [--logFile LOGFILE] [-l LOGLEVEL] [-g] [-e] [--verify]

given a directory of input genomes and sequences it will try to match up each
sequence to its genome

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_GENOME, --inputgenome INPUT_GENOME
                        Input genome file or dir of fna files (Default:
                        ./data/sampleGenome.fna)
  -s INPUT_SEQS, --inputseqs INPUT_SEQS
                        Input sequence file (Default: ./data/sampleGenome.seq)
  --logFile LOGFILE     output to log file (Default: False)
  -l LOGLEVEL, --log LOGLEVEL
                        Log level, use DEBUG of more output (Default: INFO)
  -g, --usegpu          gpu option (Default: False)
  -e, --chopefficient   chop efficiently (Default: False)
  --verify              verify successfull transcription (Default: False)

Note: input files are expected in the following format

*.fna => INPUT_GENOME:
>some title
ATATTTTTTCTTGTTTTTTATATCCACAAACTCTTTTCGTACTTTTACACAGTATATCGTGTTGTGGACA
ATTTTATTCCACAAGGTATTGATTTTGTGGATAACTTTCTTAATTTCATTGCTATAGCTACTTTTTTTTG
ATATTATAGTTGTGTTTTCACTTTGAATAAGTTTTCCACATCTTTATCTTATCCACAATTTGTGTATAAC
ATGTGGACAGTTTTAATCACATGTGGGTAAATGATTATCCACATTTGCTTTTTTGTCGAAAACCCTATCT

*.seq => INPUT_SEQS:
some title|ATATTATAGTTGTGTTTTCACTTTGAATAAGTTTTCCACATCTTTATCTTATCCACAATTTGTGTATAAC

Code layout:

  • pyFFTalign.py
    • Provides basic command-line parsing and provides a wrapper for the other two.
    • Reads in Genome(long sequence)
    • Reads in Sequences(shorter sequences)
  • dataObj.py
    • Data container for raw and transcribed sequences
    • transcribes sequences on object creation
    • methods for returning padded raw and transcribed sequences
  • aligner.py
    • CPU and GPU correlation functions

More information: