Pre-compiled 64-bit executables are provided here for Linux, macOS and Windows. Older releases can be found on the release page.
What does this tool do?
grex is a small command-line utility that is meant to simplify the often complicated and tedious task of creating regular expressions. It does so by automatically generating regular expressions from user-provided input strings.
In the current version, grex generates the most specific regular expression possible which exactly matches the given input only and nothing else. This is and always will be the default setting. In later releases, the tool will be able to create more generalized expressions by using wildcards. These generalization features will have to be explicitly enabled by respective command-line flags and options.
- character classes
- detection of common prefixes and suffixes
- alternation using
- optionality using
- concatenation of all of the former
- reading input strings from the command-line or from a file
How to install?
scoop install grex
brew tap pemistahl/formulas brew install grex
Alternatively, you can download the self-contained executable for your platform above and put it in a place of your choice. grex is also hosted on crates.io, the official Rust package registry. If you are a Rust developer and already have the Rust toolchain installed, you can install by compiling from source using cargo, the Rust package manager:
cargo install grex
How to use?
$ grex -h grex 0.2.0 Peter M. Stahl <firstname.lastname@example.org> grex generates regular expressions from user-provided input strings. USAGE: grex <INPUT>... --file <FILE> FLAGS: -h, --help Prints help information -V, --version Prints version information OPTIONS: -f, --file <FILE> Reads input strings from a file with each string on a separate line ARGS: <INPUT>... One or more strings separated by blank space
The quickest way is to provide input strings on the command line, separated by spaces:
$ grex a ab abc ^a(bc?)?$
If an input string contains space characters, it needs to be surrounded by quotation marks:
$ grex "I ♥ cake" "I ♥ cookies" ^I ♥ c(ookies|ake)$
Every generated regular expression is surrounded by the anchors
$ so that it does not accidently match substrings. Unicode symbols which do not belong to the ASCII character set are not escaped by default because programming languages use different notations for unicode escape sequences. It is planned to support different escape sequence notations in the future by providing command-line options.
grex does not operate on scalar values but on grapheme clusters. If a grapheme cluster consists of more than one scalar value, then this is considered correctly. The letter
y̆ in the following example consists of the unicode symbols U+0079 (Latin Small Letter Y) and U+0306 (Combining Breve). Therefore, it cannot be part of the character class as this is for single characters only.
$ grex y̆ a z ^[az]|y̆$
Input strings can be read from a file as well. Every file must be encoded as UTF-8 and every input string must be on a separate line:
$ grex -f my-input-file.txt
Some more examples:
$ grex a b c ^[a-c]$ $ grex a c d e f ^[ac-f]$ $ grex a b x de ^de|[abx]$ $ grex 1 3 4 5 6 ^[13-6]$ $ grex a b bc ^bc?|a$ $ grex a b bcd ^b(cd)?|a$ $ grex abx cdx ^(ab|cd)x$ $ grex 3.5 4.5 4,5 ^3\.5|4[,.]5$
How does it work?
A deterministic finite automaton (DFA) is created from the input strings.
The number of states and transitions between states in the DFA is reduced by applying Hopcroft's DFA minimization algorithm.
The minimized DFA is expressed as a system of linear equations which are solved with Brzozowski's algebraic method, resulting in the final regular expression.
Do you want to contribute?
In case you want to contribute something to grex even though it's in a very early stage of development, then I encourage you to do so nevertheless. Do you have ideas for cool features? Or have you found any bugs so far? Feel free to open an issue or send a pull request. It's very much appreciated. :-)