Motif Independent Metric
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README
bioutilities.py
mim.py
progressbar.py
sample_hg18.bed
utils.py

README

[MIM: Motif Independet Metric - Luca Pinello]

This software calculates a measure of sequence specificity called Motif Independent Metric. 

[INSTALLATION AND REQUIREMENTS]

In order to use the MIM software you need:

-Python 2.7(http://www.python.org/getit/) 

and the following modules:

-Numpy (http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/)
-Scipy (http://sourceforge.net/projects/scipy/files/scipy/0.9.0/)

Note: 
If you are using the Windows 64 bit version of Python, you must download the 64 bit version of the modules from:

http://www.lfd.uci.edu/~gohlke/pythonlibs/

[USAGE]

The program takes as input:

1) A proper BED file (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) containing the coordinates of your sequences for some reference genome

2) The fasta files of your reference genome (http://hgdownload.cse.ucsc.edu/downloads.html)

To launch the program from a shell type:

python mim.py bed_file.bed /path_to_your_genome_directory/

To see a list of the optional arguments with a short explanation, please just type:

python mim.py

At the end of the execution the program will report the MIM value for the sequences extracted from the bed file and its p-value.

[TESTING EXAMPLE]

1) Download the human genome (hg18) fasta files from here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/chromFa.zip
and extract all the files from chromFa.zip in a directory (for example hg18_directory).

2) Run the program with the following command:

python sample_hg18.bed hg18_directory

Notes: If you want to build a more realiable null model, you can use the parameter --null_rep to increase the sampling accuracy (the default value is 1000):

python sample_hg18.bed hg18_directory --null_rep 5000

You can also use a null model that shuffle the input sequences instead of random sampling sequence from genome with the optional parameter --null_model:

python sample_hg18.bed hg18_directory --null_model shuffle