Skip to content

A re-implementation of the minimum redundancy maximum relevance (mRMR) feature selection algorithm with emphasis on greatly increased perfomance (1000x or greater on large data sets) and an improved user interface.

License

Notifications You must be signed in to change notification settings

rlichtenwalter/mRMR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Improved mRMR is a re-implementation of the minimum redundancy maximum relevance (mRMR) feature selection algorithm with emphasis on greatly increased perfomance (1000x or greater on large data sets) and an improved user interface. There are no disadvantages to using this utility as opposed to the original release by Hanchuan Peng, but benefits include:

- results identical to original mRMR implementation by Hanchuan Peng, excluding statistically inconsequential preservation of rank corresponce in the case of metric ties
- incorporation of all improvements from the Fast-MRMR implementation by Sergio Ramírez
- additional performance improvements, such as avoiding computing mutual information for zero-entropy attributes, and careful selection of and usage of data structures
- output for each attribute includes its selection rank, entropy, mutual information with the class attribute, and mRMR score in an easily parsed format friendly to downstream manipulation
- operates directly on original textual data, requiring no transformation into a one-time binary representation
- robust data set parser fails gracefully with bad input and reports the location of the first error
- modular support in the code for arbitrary discretization routines, with several examples already provided and implemented
- support to output the result of parsing and discretization so that it can be verified and analyzed with external tools
- supports stream-based processing, and can operate equally well reading a data set from standard input, in a pipeline, from a named pipe, or with process substitution
- standard GNU getopt_long POSIX-compliant option processing, including full-featured -h/--help capability, informative error messages, graceful failure, and sensible defaults
- high-quality C++14 compliant code base


BUILING
===============
1. To build, enter project directory and type 'make'.
2. To run from project directory, type './mrmr -h' to get usage information and additional help.

EXAMPLE USAGE
===============
The following two commands are equivalent in effect when run from the project top-level directory.

< example.tsv bin/mrmr
bin/mrmr -t '\t' -c 1 -d 'truncate' example.tsv

Notes: Implemented discretization functionality is minimal. Feature values are expected to be or to discretize to be contiguous integers starting from 0, but this is not currently checked.

VERSION HISTORY
===============
0.93 (beta)
	- Added header for easier usage as library principally in support of Python bindings.

0.92 (beta)
	- Added example data and updated README with example usage.
	- Added note about expected feature value ranges.

0.91 (beta)
	- Refactored discretization code.
	- Added 'truncate' discretization procedure and make it the default.
	- Added support for new warning log level and made it the default.
	- Now warn in case a discretization method is not explicitly chosen.
	- Implemented attribute domain translation and integer overflow detection.

0.9 (beta)
	- Support for delimiter specification.
	- Minor improvement to log handling.
	- Small incidental code changes.

0.1 (beta)
	- Initial release.

About

A re-implementation of the minimum redundancy maximum relevance (mRMR) feature selection algorithm with emphasis on greatly increased perfomance (1000x or greater on large data sets) and an improved user interface.

Resources

License

Stars

Watchers

Forks

Packages

No packages published