Skip to content
Stam Kapetanakis edited this page Jul 16, 2024 · 23 revisions

Introduction

Regex is a string that describes a specific pattern of text. LiveCode supports a limited version of regex with built-in functions. This primer is not an in-depth tutorial; it covers the basics for effectively using regex with LiveCode.

Regex can appear impenetrable because of its shorthand nature. For example, a post on StackOverFlow suggests the below as a good regex to validate an email by the RFC 5322 standard, and it’s eye-watering - because it covers every conceivable permutation of a valid email, including using an IP address as the server part:

(?:[a-z0-9!#$%&'*+/=?^_\`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

This may seem overly complex and that a simpler validation could be done using LiveCodeScript, but consider simple issues like not allowing 2 consecutive dots .. in any part of the email, only having one “@“ symbol, having valid chars in respective local and domain portions as well as valid top-level domains. It rapidly becomes a complex task to allow for all the legal email format permutations and too unwieldy to do well with standard LiveCodeScript.

Regex is more or less a universal format, which means the solutions for almost any pattern to search for are available online. There are a few different “flavours” of regex that may have a small number of differences and some regex may need modification to work with LiveCode, which implements the PCRE flavour of regex.

Within LiveCode, a regex pattern is passed as a parameter to match and replace functions:

  • matchText() confirms a match is present and can return capture groups
  • replaceText() returns a string where a pattern has been replaced with a string literal
  • matchChunk() returns the start and end offsets for matched patterns