Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
110 lines (90 sloc) 3.54 KB

regex -- Regular Expressions

Every programmer should learn to use these, but approach them with the respect of learning another programming language, because that's really what they are. Get a good book.

There are two popular regex styles:

  • Perl Compatible Regular Expressions (PCRE)
  • POSIX

PCREs in some form are used natively in some languages, e.g.:

  • javascript
  • python
  • java

Beware that even PCREs amongst languages have slight feature differences and quirks:

  • what . actually matches
  • multiple-line support
  • back references
  • advanced features

Review

PCRE basics:

  • Characters or Tokens
    • any literal (with some exceptions)
    • escape special characters or rules when you want to use literally
    • Specials:
      • . any character
      • ^ beginning of line
      • $ end of line
  • Sets are groups (or ranges) of characters
    • [ ] positive set
    • [^ ] negative set
    • Use - in a set to denote range between characters
    • character classes are sugar for sets:
      • \d equivalent to [0-9]
      • \w equivalent to [a-zA-Z0-9_]
      • \s equivalent to [ \t\r\n\f]
  • Groups
    • Use these to group or capture
    • () capturing group
    • (?:) non-capturing group
    • (?P<var>) named capturing group (named var)
  • Alternation (Boolean or) |
    • Separate items, sets or unions
  • Quanifiers
    • {M} exactly M times
    • {M,} mininum M times
    • {M,N} minimum M maximum N times
    • "greedy quantifiers" are just sugar:
      • * equivalent to {0,}
      • + equivalent to {1,}
      • ? equivalent to {0,1}
  • Assertions
    • Lookahead
      • x(?=y) match x only if it is followed by y
      • x(?!y) match x only if it is not followed by y
    • Lookbehind
      • (?<=y)x match x only if it follows y
      • (?<!y)x match x only if it does not follow y

Implementations

  • JavaScript is missing quite a few PCRE features:
    • lookbehind assertions (maybe in future)
    • named groups
    • (and more)

Notes on Bash

Bash globbing (filename expansions) look similar to regex but are not. They are still very useful:

  • ? single character wildcard
  • * multi character wildcard
  • [ ] bracket expression
  • [^ ] negated bracket expression
  • [-] options range
    • ^ negate range (place inside bracket)

TODO:

REFERENCES

Globbing. Regular Expressions. Advanced Bash-Scripting Guide.

Filenames and Pathnames in Shell: How to do it Correctly. David A. Wheeler. 2016-05-04.

Perl Style Regular Expressions in Prolog. Robert D. Cameron. CMPT 384 Lecture Notes. 1999.

Regular Expression. Wikipedia.

RegExp. MDN.

RegExp lookbehind assertions. Yang Guo. V8 JavaScript Engine. 2016-02-26.