Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Regular Expression Intersection
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
src/intersectx
test
.gitignore
README.md
project.clj

README.md

Intersectx

Intersectx is a Clojure/JVM library that test whether regular expressions define intersecting languages, that is, if exists at least one string matched by two given regular expressions.

Example

user=> (use 'intersectx.core)
nil
user=> (intersect? "a" "b")
false
user=> (intersect? "a" "a*")
true
user=> (intersect? "example" "exa[^m]ple*")
false
user=> (intersect? "example" "exa[mno]ple*")
true
user=> (intersect? "exa.+" "exa")
false
user=> (intersect? "exa.+" "example")
true
user=> (intersect? "exa.*" "exa")
true

Supported features

  • Literals
  • Escaped characters
  • Standard regular language constructs: "|", "?", "*" and "+" and parenthesis grouping.
  • General quantifiers: {n}, {n,m}, {n,}
  • Dot wildcards
  • Simple character classes: [abc]
  • Simple negated characted classes: [^abc]
  • Ranges in character classes: [a-z]
  • Lookahead (positive and negative), provided they appear in the top-level, outside any grouping
  • Special character classes: \w, \s, \d.

Not (yet) supported

  • Negated special character classes: \W, \S, \D.
  • Special character classes inside regular character classes: [\d\s], [\D]
  • Lookaround in arbitrary positions
  • Lookbehind

Internals

The library parses the regular expressions and builds a NFA (Nondeterministic Finite Automaton) using a variation of the Thompson algorithm. Then uses the "powerset construction" to build a DFA (Deterministic Finite Automaton). The DFA of each regular expression are intersected using a cartesian product machine construction, and then tested against the empty language. In order to support a Perl-like flavor of regular expressions, some additional processing is done: wildcards are expanded into disjunctions and lookaround is transformed into the intersection and union of simpler regular expressions.

Something went wrong with that request. Please try again.