Skip to content

Configuration

Marc-Olivier Buob edited this page May 13, 2022 · 1 revision

The entry point is pattern_clustering/boost.py file.

It contains the pattern_distance and the pattern_clustering python functions that wraps the corresponding C++ underlying functions. These two functions take in parameter some patterns, identified by a string name, characterized by a deterministic finite automaton and weighted by a density (intuitively, the density of a pattern characterizes how strict it is; for instance, a float is more strict than an alpha-numeric string).

As a end-user you just have to decide how to name your patterns and to define for each of them the corresponding regular expression. Note that many common patterns are already defined in pattern_clustering.patterns.

To define customized patterns:

  • Choose a string identifying this pattern.
  • Compute its DFA from its regular expression using pybgl.compile_dfa and insert it in a map_name_dfa dictionary.
  • Compute its density using pattern_clustering.language_density and insert it in a map_name_density dictionary.
  • Call make_densities(map_name_dfa, map_name_density) to obtain a densities list.
  • Call pattern_distance / pattern_clustering by passing map_name_dfa and densities in parameter.
Clone this wiki locally