GitHub - richarddurbin/hexamer: find likely coding segments in DNA using composition-normalised hexamer tables

richarddurbin / hexamer Public

Notifications You must be signed in to change notification settings
Fork 0
Star 14

find likely coding segments in DNA using composition-normalised hexamer tables

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
AH6.dna		AH6.dna
LICENSE		LICENSE
Makefile		Makefile
README		README
hexamer.c		hexamer.c
hextable.c		hextable.c
readseq.c		readseq.c
readseq.h		readseq.h
worm.coding		worm.coding

Repository files navigation

hexamer and hextable
--------------------

hextable makes files of statistics that hexamer uses to scan for
likely coding regions.The principle is to use 6mers, but to avoid
deriving any information from base composition.  I therefore normalise
the frequencies of each 6mer by dividing by the total frequency of all
6mers with the same base composition.

The input of hextable is a fasta file of coding sequences in frame.
The -o file output is an ascii list of 4096 floating point numbers
giving log likelihood ratio scores in bits.  The output on stdout is a
summary of the information content of the table, indicating how
disriminative it is likely to be.  The output of hexamer is in GFF
format (http://www.sanger.ac.uk/Users/rd/gff.html).

Type "make" to build the programs, and "make clean" to remove them.

Example usage:

	hextable -o worm.hex worm.coding
	hexamer -T 20 worm.hex AH6.dna

NB these programs assume all a,c,g,t.  n's found in sequences are
converted to c.

Richard Durbin (rd@sanger.ac.uk) 9/95-4/98

PS 30/3/99 The original version of hexamer had some initialisation
bugs, which have been fixed today.