Skip to content

calculate_4mer_freq.py

KanHC edited this page Aug 18, 2013 · 8 revisions

About

This script calculates the tetra-nucleotide frequency from a number of .fasta files. Displaying all the calculated values as a tab-delimited text file (.tsv).

Common Usage

$ python calculate_4mer_freq.py -i *fasta -o 4mer_matrix.tsv

where,

  • *fasta is a unix glob pattern selecting one or more .fasta (.fa|.fas|.fasta|.fna|.f) files
  • 4mer_matrix.tsv is the name of the result matrix containing the tetramer counts for each input file

Example

Using some example .fasta files, contained in the /calculate_4mer_freq/example/ directory of this repo. One can create a matrix of tetra-nucleotide frequenies 4mer_matrix.tsv using the following command in the /calcualte_4mer_freq/ directory:

$ python calculate_4mer_freq.py -i example/* -o 4mer_matrix.tsv
4mer_matrix.tsv
	A27018-scaffolds.fasta	A27019-scaffolds.fasta	A27021-scaffolds.fasta	A27020-scaffolds.fasta
AAAA	146	169	214	498
AAAT	93	121	139	303
AAAC	68	68	86	164
AAAG	86	72	120	187
AATA	91	76	115	268
AATT	87	94	136	243
AATC	48	54	71	116
AATG	58	47	94	163
AACA	77	68	97	164
AACT	62	58	71	129
AACC	60	54	55	91
AACG	22	37	60	54
AAGA	78	62	133	184
AAGT	60	50	53	115
AAGC	74	58	57	82
AAGG	59	49	70	93
ATAA	87	70	106	257
ATAT	89	63	121	205
ATAC	57	42	52	47
...

Clone this wiki locally