# Regular expressions (regex)

It can sometimes be useful to make comparisons based on substrings of column values. For example, a sensible approach to comparing postcodes can be to consider their constituent components (e.g. area, district, etc) by extracting them as substrings of the full postcode (see postcode comparison template for more details).

The `regex_extract` option allows users to do just this by defining a regular expression pattern on which to to evaluate a match. It is available for all(?) string comparisons and levels including exact match (see examples below).

Further regex functionality is provided in the form of the `valid_string_regex` option. This is a feature of the null comparison and level. It allows users to provide a regular expression pattern defining a valid string format that if not matched will result in the column being treated as a null (see example below). 

## Examples of using `regex_extract`

An example of using `regex_extract` with an exact match comparison on a postcode column. Here the area part of the postcode is being extracted and then compared on.
Show what is extracted, the result of extracting and the comparison being made under the hood

In [None]:
import splink.duckdb.duckdb_comparison_library as cl

pc_comparison = cl.exact_match("postcode", regex_extract="")
print(pc_comparison.human_readable_description)

In [None]:
An example of using `regex_extract` with a Jaro-Winkler comparison on a name column.

In [None]:
import splink.duckdb.duckdb_comparison_library as cl

name_comparison = cl.jaro_winkler("name", regex_extract="")
print(name_comparison.human_readable_description)

An example using Levenshtein...

In [None]:
import splink.duckdb.duckdb_comparison_library as cl

name_comparison = cl.jaro_winkler("name", regex_extract="")
print(name_comparison.human_readable_description)

The postcode comparison template provides an example of a comparison which makes use of both the `regex_extract` option in multiple exact match levels to build a comparison with levels of increasing “looseness”

## Example of using `valid_regex_string`

A simple comparison including a null level featuring the `valid_regex_string` option. Here an exact match o the postcode column is being performed with the...

In [None]:
import splink.duckdb.duckdb_comparison_library as cl

name_comparison = cl.jaro_winkler("name", regex_extract="")
print(name_comparison.human_readable_description)