Skip to content

3.2 Pattern Object Schema

Gabe Stocco edited this page Aug 30, 2022 · 3 revisions

The pattern object is the exposed mechanism to detect an actual issue in code. It is used both as part of the base Rule Object Schema and as part of the optional Conditions arrays within a rule. More than one pattern value may be necessary to describe the text to search for for a given rule. For readability, multiple patterns may be defined for a rule as well and to help assign varying confidence levels to each pattern based on precision or granularity.

If further refinement is necessary, the conditions array uses pattern objects to find additional patterns that either must, or must not be present to consider the initial code vulnerable.

Schema

{
    "pattern": type=string  
    Required Value 

    "type": type=enum (string)  
    Required Value     

    "modifiers": type=array of enum values (string)  
    Optional Value (default is no flags)     

    "scopes": type=array of enum (string)  
    Optional Value, default is "code"  

    "confidence": type= enum (string)  
    Optional Value, default is Unspecified

    "xpaths": type=array of values (string)  
    Optional Value

    "jsonpaths": type=array of values (string)  
    Optional Value
 
    "_comment": type=string  
    Optional Value  
}

Definitions of each Key/Value Pair

pattern

String representing the pattern to match. Usually a regular expression, but can also be a simple string The format of the pattern (regex, string, etc.) is specified by type (see below) value. Since rules are implemented in JSON, certain characters must be escaped (quotes, backslashes, etc.).

  • Example: "pattern" : "CC_(MD2|MD4|MD5|SHA1)"

type

accepted values: regex, regex-word, string, substring
specifier for the format of the pattern, to make the intent of certain patterns clearer to a human and easier to implement. The behaviors are as follows:

  • regex: specifies that the pattern is a standard javascript/c# syntax regular expression
  • regex-word: behind the scenes a \b prefix and suffix is added to the pattern so that it matches a full word. this makes the pattern a tad easier to read by allowing the author to omit the \b's within it. e.g. reading a pattern of "rand" is a bit easier than "\brand\b". The former is immediately clear that the author intended to find the word rand rather than brand.
  • string: Searches for the raw text on a word boundary. Equivalent to regex-escaping the text and then treating like 'regex-word'.
  • substring: searches the raw text, but doesn't look for word boundaries. Equivalent to simply regex-escaping the pattern

Behind the scenes each of these is transformed into a regular expression. An author could opt to only use the regex type, manually adding \b's instead of using regex-word, or escaping all of the characters if doing a string or substring search. The additional types exist for convenience and to improve human readability, rather than providing functionality that cannot be achieved with a standard regular expression.

  • example: "type" : "regex"

modifiers

accepted values: i, d, m
regular expression modifiers that can further control the behavior of the search pattern. the modifiers are:

  • i: case insensitive search
  • d: dot match all
  • m: multiline search

modifiers can be used with all pattern types (see above).

  • example: "modifiers" : ["i", "m"]

scopes

accepted values: code, html, comment, all
Array that specifies what "part" of a source file the finding should be in. If left absent, the rules engine assumes the scope is code. The scopes are:

  • code: sections of a source file that represent executable logic of that language (i.e. that part of the file is NOT a comment, nor, in the case of some languages, is it embedded html)
  • comment: non-executable documentation that the devs leave in source code to explain the code to themselves or other devs (e.g. /*I am a comment */). This scope is useful if setting up manual-review rules to spot comments like "To-do: add security".
  • html: some languages, such as PHP, allow the intermingling of HTML and server side code within the same file. providing a scope of html will scope any pattern to just the html portions of the source file (though not html written out via echo or similar response writer), whereas a scope of code will exclude any findings in those areas
  • all: look everywhere in the source file

Important if no language is specified at the parent rule level then scope at the pattern level won't be applied because there the scope type varies by language.

In the specific case of scanning a pure html file (.htm, .html, etc.), both the scope code and html will apply equally (since the code "type" is html). However, when writing rules to spot problems in html it is suggested to use a scope of html, and not use any applies_to in the parent rule. This will apply the rule to any file that contains html so that if a new server side language that allows intermingling with html comes along the rule doesn't need to be updated.

While this value is an array, on the off chance that there is a scenario where a rule applies to multiple (but not all) scopes (most likely both code and html), in most cases a pattern will only apply to one scope so will be a single item array.

  • example: "scopes" : ["html"]

confidence

One of the values of the Confidence enum: Unspecified, Low, Medium, High.

xpaths

When xpaths is set, the RuleProcessor will try to query each document for each of the xpaths specified and the pattern will be applied to the result instead of the whole file. Note for xml documents with namespaces you may need to specify your xpath query like /*[local-name(.)='project']/*[local-name(.)='properties']/*[local-name(.)='java.version'];

jsonpath

When jsonpath is set, the RuleProcessor will try to query each document for each of the jsonpaths specified and the pattern will be applied to the result instead of the whole file.

_comment

Optional string to allow the author of a rule to leave comments or notes to others reading the json file, providing a place to explain things like complicated regex logic, since the json format doesn't provide native comment syntax

  • Example: "_comment" : "this ugly regex is a catchall for all of the banned c functions that don't otherwise have their own rule"