fold-fastalike

This parallelized Python tool reads in a fasta-like file and performs various RNA folding analyses.

Overview

The tool reads in a fasta-like file and performs various RNA folding analyses using the ViennaRNA toolset.

Currently, it's capable of:

Generating minimum free energy (MFE) structures in dot-bracket format (--fold flag)
Computing base-pairing probabilities for all possible i, j partners (--fold flag)
Computing overall base-pairing probability profiles (Prob(paired) / Prob(unpaired)) (--fold flag)
Computing Shannon entropy profiles (--fold flag)
Producing pairing probability "dot-plots" in PDF format (--fold flag)
Generating maximum expected accuracy (MEA) structures in dot-bracket format (--MEA flag)
Computing unpaired segment probabilities (the probability of an unpaired run of length L at each position) (--lunp [LUNP] flag)

All outputs will be saved to a folder that contains a folder for each transcript in the provided fasta-like file. Within each of these folders will be the generated files.

It does not currently support the incorporation of structure profiling data (reactivities).

Installation

No automated installation routines are currently configured, but it's recommended to add the folder containing the fold-fastalike module to your $PYTHONPATH so that the tool can be called from anywhere. If you don't add the module to your path, you'll only be able to call the module when your current working directory is this repository.

The tool has a set of required dependecies in order to properly function:

Python 3 with numpy and tqdm installed
ViennaRNA command line tools (RNAfold, RNAplfold, etc.)
ps2pdf14 (comes preinstalled on some systems, otherwise most easily installed as a part of Ghostscript with HomeBrew)

Usage

fold-fastalike is a Python module that can be run with the syntax python -m fold-fastalike [input] [OPTIONS]. The input should be a fasta-like file containing a set of RNA sequences for processing (see sample_data). Any type of FASTA-formatted file should work so long as the transcript names are preceeded by > at the beginning of the line and the nucleotide sequence is provided below it.

Run python -m fold-fastalike --version to check that your Python interpreter is functional and the correct version of the module is returned.

Run python -m fold-fastalike -h to see available options.

usage: fold-fastalike [-h] [--version] [--output output] [--lunp LUNP] [--MEA]
                      [--fold] [--tasks TASKS]
                      input

Fold and process information for a batch of sequences in a fasta-like file.

positional arguments:
  input            the input fasta-like file to process.

optional arguments:
  -h, --help       show this help message and exit
  --version        show program's version number and exit
  --output output  the output directory to use when generating outputs for
                   each sequence. Defaults to a date-stamped folder with the
                   prefix outputs in the same location as the command was
                   called
  --lunp LUNP      compute unpaired run probabilities up to length LUNP
  --MEA            compute Maximum Expected Accuracy (MEA) structures
  --fold           compute MFE structure and partition function.
  --tasks TASKS    Number of parallel tasks to use. Default is to use all
                   available.

Examples

The simplest use of the module is via the --fold flag. This activates the MFE and base pair probability routines.

For example:

python -m fold-fastalike sample_data/test_sequences.fasta --fold

This will create a folder called outputs-{%date%}_{%time%} and four folders within this folder, one for each transcript in the used file. Each folder will have:

{%name%}.fasta >> Single-transcript fasta file
{%name%}_dotplot.pdf >> Dot-plot in PDF format
{%name%}_dotplot.ps >> Dot-plot is PostScript format
{%name%}_mfe.dot >> MFE structure in dot-bracket format
{%name%}_bpdists.txt >> Pairing probabilities (Prob(i, j)) in text format
{%name%}_profiles.txt >> Overall pairing probabilities (P(paired)) and entropy profiles

Compute MEA structures

To compute MFE structures and a PDF dot-plot for each transcript in the provided sample data, run

python -m fold-fastalike sample_data/test_sequences.fasta --MEA

Outputs will be produced with the same folder structure as before, and each folder will have:

{%name%}.fasta >> Single-transcript fasta file
{%name%}_mea.dot >> MFE structure in dot-bracket format

Compute unpaired run probabilities

To compute the probabilities of unpaired runs up to a certain length across the provided transcripts, run

python -m fold-fastalike sample_data/test_sequences.fasta --lunp [LUNP]

For instance, to compute probabilities up to a length of 10 nt, we would run:

python -m fold-fastalike sample_data/test_sequences.fasta --lunp 10

Outputs will be produced with the same folder structure as before, and each folder will have:

{%name%}.fasta >> Single-transcript fasta file
{%name%}_unpaired_run_probs.ps >> Unpaired run probabilities in PostScript format

Notes

The program will, by default, create a new output folder each time that it's called. To instead save outputs to a predefined location, use the --output [output] flag to specify an output directory.
Base pairing probabilities of the form Prob(i,j) are typically encoded as the square-root of the actual value. Note the headers of the relevant files and be sure to convert to the raw probabilities by squaring when necessary.
The routines used to post-process some ViennaRNA outputs are also available within the module as standalone scripts.
- python -m fold-fastalike.postscript2bpdists [PS_FILE_IN] [TXT_FILE_OUT]
- python -m fold-fastalike.bpdists2overall [TXT_FILE_IN] [TXT_FILE_OUT]

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
fold-fastalike		fold-fastalike
sample_data		sample_data
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fold-fastalike

fold-fastalike

sample_data

sample_data

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pytest.ini

pytest.ini

requirements.txt

requirements.txt

Repository files navigation

fold-fastalike

Overview

Installation

Usage

Examples

Compute MEA structures

Compute unpaired run probabilities

Notes

About

Releases

Packages

Languages

License

peradecki/fold-fastalike

Folders and files

Latest commit

History

Repository files navigation

fold-fastalike

Overview

Installation

Usage

Examples

Compute MEA structures

Compute unpaired run probabilities

Notes

About

Resources

License

Stars

Watchers

Forks

Languages