# Regular Expression Patterns

- Regular expressions are a powerful tool for various kinds of string manipulation.
- They are a domain specific language (DSL) that is present in many programming languages.

| character | description           | example pattern code | example match                  |
| --------- | --------------------- | -------------------- | ------------------------------ |
| \d        | a digit               | file\_\d\d           | file_25                        |
| \w        | a word                | \w\w\w\w\w           | hello                          |
| \s        | a space               | \w\s\w\s\w           | a b c                          |
| .         | any character         | a.b.c                | aabbc                          |
| \D        | a non-digit           | \D\D\D\D             | abcd                           |
| \W        | a non-word            | \W\W\W\W\W           | !@#$%                          |
| \S        | a non-space           | \S\S\S\S             | abcd                           |
| ^         | start of string       | ^\w\w\w              | abc                            |
| $         | end of string         | \w\w\w$              | abc                            |
| \b        | word boundary         | \b\w\w\w\b           | abc                            |
| \B        | non-word boundary     | \B\w\w\w\B           | aabc                           |
| [abc]     | any of a, b, or c     | [abc][abc]           | aa or bb or cc                 |
| [^abc]    | not a, b, or c        | [^abc][^abc]         | dd or ee or ff                 |
| [a-z]     | characters a to z     | [a-z][a-z]           | aa or bb or cc                 |
| [^a-z]    | not characters a to z | [^a-z][^a-z]         | AA or BB or CC                 |
| [0-9]     | numbers 0 to 9        | [0-9][0-9]           | 11 or 22 or 33                 |
| [^0-9]    | not numbers 0 to 9    | [^0-9][^0-9]         | AA or BB or CC                 |
| (abc)     | capture group         | (\d\d)               | 12                             |
| (?:abc)   | non-capture group     | (?:\d\d)             | 12                             |
| a\|b      | a or b                | (a\|b)(a\|b)         | aa or ab or ba or bb           |
| a?        | zero or one of a      | \d\d?                | 12 or 1                        |
| a\*       | zero or more of a     | \d\d\*               | 12 or 123                      |
| a+        | one or more of a      | \d\d+                | 12 or 123                      |
| a{3}      | exactly 3 of a        | \d{3}                | 123                            |
| a{3,}     | 3 or more of a        | \d{3,}               | 123 or 1234                    |
| a{3,6}    | between 3 and 6 of a  | \d{3,6}              | 123 or 1234 or 12345 or 123456 |
| (?=abc)   | positive lookahead    | \w(?=o)              | g in good                      |
| (?!abc)   | negative lookahead    | \w(?!o)              | g in gold                      |
| (?<=abc)  | positive lookbehind   | (?<=o)\w             | g in gold                      |
| (?<!abc)  | negative lookbehind   | (?<!o)\w             | g in good                      |


In [None]:
import re
text = "my phone number is 123-456-7890"
phone = re.search(r'\d{3}-\d{3}-\d{4}', text)
print(phone)
print(phone.group())

In [None]:
# compile - to reuse the pattern
phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')

results = re.search(phone_pattern, text)
print(results.group())
print(results.group(1))
print(results.group(2))

In [None]:
cat_text = "The cat in the hat sat flat"
print(re.search(r'cat|dog', cat_text))
print(re.findall(r'cat|dog', cat_text))



In [None]:
phrase = "there are 3 numbers 34 inside 5 this sentence"
print(re.findall(r'[^\d]+', phrase)) # exclude digits


In [None]:
test_phrase = "This is a string! But it has punctuation. How can we remove it?"
print(re.findall(r'[^!.? ]+', test_phrase)) # exclude punctuation

In [None]:
text = 'only find the hypen-words in this sentence. But you do not know how long-ish they are'
print(re.findall(r'[\w]+-[\w]+', text)) # find words with hyphen