Motif Independent Metric
Python
Switch branches/tags
Nothing to show
Latest commit 1eded1d Oct 4, 2011 @lucapinello cleanup
Permalink
Failed to load latest commit information.
README cleanup Oct 3, 2011
bioutilities.py update DOC3 Oct 3, 2011
mim.py cleanup Sep 30, 2011
progressbar.py initial release Sep 30, 2011
sample_hg18.bed initial release Sep 30, 2011
utils.py cleanup Sep 30, 2011

README

[MIM: Motif Independet Metric - Luca Pinello]

This software calculates a measure of sequence specificity called Motif Independent Metric. 

[INSTALLATION AND REQUIREMENTS]

In order to use the MIM software you need:

-Python 2.7(http://www.python.org/getit/) 

and the following modules:

-Numpy (http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/)
-Scipy (http://sourceforge.net/projects/scipy/files/scipy/0.9.0/)

Note: 
If you are using the Windows 64 bit version of Python, you must download the 64 bit version of the modules from:

http://www.lfd.uci.edu/~gohlke/pythonlibs/

[USAGE]

The program takes as input:

1) A proper BED file (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) containing the coordinates of your sequences for some reference genome

2) The fasta files of your reference genome (http://hgdownload.cse.ucsc.edu/downloads.html)

To launch the program from a shell type:

python mim.py bed_file.bed /path_to_your_genome_directory/

To see a list of the optional arguments with a short explanation, please just type:

python mim.py

At the end of the execution the program will report the MIM value for the sequences extracted from the bed file and its p-value.

[TESTING EXAMPLE]

1) Download the human genome (hg18) fasta files from here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/chromFa.zip
and extract all the files from chromFa.zip in a directory (for example hg18_directory).

2) Run the program with the following command:

python sample_hg18.bed hg18_directory

Notes: If you want to build a more realiable null model, you can use the parameter --null_rep to increase the sampling accuracy (the default value is 1000):

python sample_hg18.bed hg18_directory --null_rep 5000

You can also use a null model that shuffle the input sequences instead of random sampling sequence from genome with the optional parameter --null_model:

python sample_hg18.bed hg18_directory --null_model shuffle