# Regex Cheatsheet
*Quick Reference*

## Anchors

|Anchors                |Description                                |
|-----------------------|-------------------------------------------|
| \A                    | match start of string                     |
| \Z                    | match end of string                       |
| ^                     | match start of line                       |
| $                     | end of line                               |
| \b                    | start/end of words                        |
| \B                    | inverse of \b                             |

## Groups

*Note: Ellipsis (...) is for visualization purposes*

| Group         | Description           | Consumes Characters? |
|---------------|-----------------------|----------------------|
| (?:...)       | non-capturing group   |         ✔             |
| (?P<name>...) | named capturing group |         ✔             |
| (?=...)       | positive lookahead    |         ✘             |
| (?!...)       | negative lookahead    |          ✘            |
| (?<=...)      | positive lookbehind   |           ✘           |
| (?<!...)      | negative lookbehind   |            ✘          |


## Character Classes

| Class | Description                                         |
|-------------------|-----------------------------------------------------|
| [ABC]             | Match any character in the set                      |
| [^ABC]            | Match any character not in the set                  |
| [A-z]             | Matches a range                                     |
| .                 | Match any except linebreaks. Shortcut for [^\n\r]   |
| \w                | Match word chars. Shortcut for [A-Za-z0-9_]         |
| \W                | Negated ^w. Shortcut for [^A-Za-z0-9_]              |
| \d                | Shortcut for [0-9]                                  |
| \D                | Shortcut for [^0-9]                                 |
| \s                | Whitespace                                          |
| [\uxxx-\uxxy]     | Match a character in range (see below)               |

### Import regex as re

[regex](https://pypi.org/project/regex/) is a 3rd party library that provides more advanced functionality.
It's mostly a drop-in replacement, so it's common to see
```python
import regex as re
```

Using [regex](https://pypi.org/project/regex/) we can take advantage of [Unicode Categories](https://en.wikipedia.org/wiki/Unicode_character_property#General_Category),
and [Unicode Blocks](https://www.regular-expressions.info/unicode.html#bodytext:~:text=The%20Unicode%20standard%20divides%20the%20Unicode,future%20expansion%20of%20the%20Unicode%20standard.)


In [1]:
import regex as re


# test string
chars = "".join([chr(i) for i in range(32, 0x10ffff) if chr(i).isprintable()])
chars = "".join(chars)

result = re.findall("\p{InBasicLatin}", chars)
print(result[::8])



[' ', '(', '0', '8', '@', 'H', 'P', 'X', '`', 'h', 'p', 'x']
