Navigation Menu

Skip to content

sanger-pathogens/remove_blocks_from_aln

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

remove_blocks_from_aln

mask, remove or keep regions in an alignment

Build Status
License: GPL v3
Docker Build Status

Contents

Introduction

remove_blocks_from_aln is a tool to mask, remove or keep regions in an alignment. The regions are defined in an EMBL style tab-delimited file.

Installation

Details for installing remove_blocks_from_aln are provided below. If you encounter an issue when installing remove_blocks_from_aln please contact your local system administrator. If you encounter a bug please log it here or email us at path-help@sanger.ac.uk.

From source

This installation requires Python 2.7, so ensure that you have created a suitable environment that will use the Python 2 interpreter before running any python commands below. For instance, you can create such an environment with conda:

conda create -n my_env python=2.7
conda activate my_env

Download the latest release from this github repository, or clone the repository. Then run the tests:

python setup.py test

If the tests all pass, install:

python setup.py install

Usage

remove_block_from_aln.py [options]

	-a <file name>     alignment file name
	-o <file name>     output file name
	-t <file name>     tab file name (containing regions to keep/remove)
	-r <name>          reference name (optional, but required if there are gaps in the reference sequence relative to the tab file)
	-k                 keep regions in tab file (default is to mask them)
	-c                 cut regions in tab file (default is to mask them)
	-R                 do not remove blocks from reference sequence (default is to remove from all sequences)
	-s <N|X|?|->       Symbol to use for removed regions (default = N)
	-h                 Show help menu

Inputs

Alignment file

Alignment file must be in fasta format:

>sequence1
AAAATTTTCCCCGGGG
>sequence2
TTTTGGGGAAAACCCC

Tab file

Tab file should emulate the EMBL file format, using only feature (FT) lines to define your regions:

FT   misc_feature   5070..5095
FT                   /note="26 bp;mummer_exact"
FT   misc_feature   8163..8182
FT                   /note="20 bp;mummer_exact"
FT   misc_feature   9569..9588
FT                   /note="20 bp;mummer_exact"

Note: while complemented regions can be handled, joins cannot and should be split into separate regions.

License

remove_blocks_from_aln is free software, licensed under GPLv3.

Feedback/Issues

Please report any issues to the issues page or email path-help@sanger.ac.uk.