# Flags

Today's Python regular expression lesson is on regex flags. We can use these optional flags to modify regex matching behaviors and enable/disable features.

In [1]:
import re

Python regex flags are accessible through 2 names, a full name and a one-letter abbreviated name.

Below is a table of all the flags:

| Full name    | Short name | Description                                                     |
|--------------|------------|-----------------------------------------------------------------|
| `ASCII`      | `A`        | Makes regex match only ASCII characters instead of full Unicode |
| `DOTALL`     | `S`        | Makes `.` match any character, including newline                |
| `IGNORECASE` | `I`        | Makes matches case-insensitive                                  |
| `LOCALE`     | `L`        | Enables locale-dependent behavior                               |
| `MULTILINE`  | `M`        | Modifies behaviors of `^` and `$` in multiline texts            |
| `VERBOSE`    | `X`        | Allows more flexible/readable formatting                        |

We will be discussing `DOTALL`, `IGNORECASE`, `MULTILINE`, and `VERBOSE` in detail.

## `re.DOTALL`

By default, the `.` is a character set that includes all characters besides the newline character (`\n`). The **DOTALL** flag makes `.` match every character, including `\n`. This is particularly useful in multiline texts.

In [2]:
print(re.findall("P.+", "Python\nProgramming"))
print(re.findall("P.+", "Python\nProgramming", flags=re.DOTALL))
print(re.findall("P.+", "Python\nProgramming", flags=re.S))

['Python', 'Programming']
['Python\nProgramming']
['Python\nProgramming']


## `re.IGNORECASE`

By default, regex matches are case-sensitive. Therefore, the words `code` and `CODE` and `cOdE` are considered entirely different.

However, in some situations, we do not care about whether a character is in upper case or lower case. In such a scenario, we can use the `re.IGNORECASE` flag to turn on case-insensitive mode.

In [3]:
print(re.findall("code", "code CODE cOdE CoDe"))
print(re.findall("code", "code CODE cOdE CoDe", flags=re.IGNORECASE))

['code']
['code', 'CODE', 'cOdE', 'CoDe']


In [4]:
print(re.findall("[a-z]+", "FutureProgrammer"))
print(re.findall("[a-z]+", "FutureProgrammer", flags=re.I))

print(re.findall("[A-Z]+", "FutureProgrammer"))
print(re.findall("[A-Z]+", "FutureProgrammer", flags=re.I))

['uture', 'rogrammer']
['FutureProgrammer']
['F', 'P']
['FutureProgrammer']


## `re.MULTILINE`

By default, the line anchor characters `^` and `$` match the beginning and end of a **string**.

When the `re.MULTILINE` flag is used, the `^` and `$` characters not only match the beginning and end of strings, but also the beginning and end of **lines**.

In [5]:
print(re.findall(r"^\w+", "Regular\nExpressions"))
print(re.findall(r"^\w+", "Regular\nExpressions", flags=re.MULTILINE))

['Regular']
['Regular', 'Expressions']


In [6]:
print(re.findall(r"\w+$", "Regular\nExpressions"))
print(re.findall(r"\w+$", "Regular\nExpressions", flags=re.M))

['Expressions']
['Regular', 'Expressions']


## `re.VERBOSE`

Using the `re.VERBOSE` flag allows regular expression patterns to be formatted in more human-readable ways.

Some regular expressions are very complicated and hard to read. Here is how the **verbose** mode makes patterns easier to read:
* **Whitespaces are ignored** unless in a character class or escaped
* **Comments are allowed** unless in a character class or escaped

In [7]:
pattern = re.compile(r"""
[a-z]{3}  # 3 letters
-
\d{4}  # 4 numbers
""", flags=re.VERBOSE)
print(pattern.search("abc-1234"))

<re.Match object; span=(0, 8), match='abc-1234'>


## Combining flags

What if we want to compile a regular expression pattern with more than one flag? Say, what if we want to write a regex in verbose mode, while ignoring whether letters are in upper case or lower case?

Multiple regex flags can be combined using the bitwise OR operator, `|`.

In [8]:
pattern = re.compile(r"""
[a-z]{3}  # 3 letters
-
\d{4}  # 4 numbers
""", flags=re.VERBOSE | re.IGNORECASE)
print(pattern.search("AbC-1234"))

<re.Match object; span=(0, 8), match='AbC-1234'>


## Summary

That is all for this lesson on compilation flags in regular expressions. Now, you have learned how to enable different features of regex for patterns to behave differently depending on your specific programs.