# RegEx Basics

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.

A python based tutorial can be found here:
<br>https://www.w3schools.com/python/python_regex.asp

This personal guide is to expand on lesser known topics that come in handy and show examples.

## Metacharacters
Metacharacters are characters with a special meaning:

|Character|Description|Example|
|------|------|------|
| [ ] | A set of characters | "[a-m]" |
| \	| Signals a special sequence (can also be used to escape special characters) | "\d"|
| .	| Any character (except newline character) | "he..o" |
| ^	| Starts with | "^hello"|
| \$	| Ends with | "world$"|
| *	| Zero or more occurrences | "aix*"|
| +	| One or more occurrences | "aix+"|
| {}	| Exactly the specified number of occurrences | "al{2}"|
| \|	| Either or	|"falls\|stays"|
| () | Capture and group|

In [1]:
import re

In [2]:
def regExMatch(txt, expression):
    x = re.findall(expression, txt)
    if len(x) == 1 and len(x[0]) == len(txt):
        print("Yes, it's an absolute match")
        print(x)
    elif len(x) == 1 and type(x[0]) == tuple:
        print("Yes, it's an absolute match with repeating characters")
        print(x)
    elif len(x) == 1:
        print("Yes, it's a partial match")
        print(x)
    elif x:
        print("Yes, it has multiple matches")
        print(x)
    else:
        print("No match")

In [3]:
txt_list = ["!0212!",
            "Hi nice to meet you!",
            "email@gmail.com",
            "wow"]
for txt in txt_list:
    regExMatch(txt, "[^a-zA-Z0-9_]")

Yes, it has multiple matches
['!', '!']
Yes, it has multiple matches
[' ', ' ', ' ', ' ', '!']
Yes, it has multiple matches
['@', '.']
No match


In [4]:
txt_list = ["!0212!",
            "Hi nice to meet you!",
            "email@gmail.com",
            "wow"]
for txt in txt_list:
    regExMatch(txt, "[^a-zA-Z]+")

Yes, it's an absolute match
['!0212!']
Yes, it has multiple matches
[' ', ' ', ' ', ' ', '!']
Yes, it has multiple matches
['@', '.']
No match


In [5]:
txt_list = ["aba", "abababa", "aabbaa", "aabababa"]
for txt in txt_list:
    regExMatch(txt, "a(ab)*a")

No match
No match
Yes, it has multiple matches
['', '']
Yes, it's a partial match
['ab']


In [6]:
txt = "abc"
#x = re.findall("ab+c", txt)
regExMatch(txt, "ab+c")

Yes, it's an absolute match
['abc']


In [7]:
txt_list = ["azbcbc",
           "a bd",
           "a2bbbb"]
for txt in txt_list:
    regExMatch(txt, "a.[bc]+")

Yes, it's an absolute match
['azbcbc']
Yes, it's a partial match
['a b']
Yes, it's an absolute match
['a2bbbb']


In [8]:
txt = "abcxyz"
regExMatch(txt, "a.[bc]+")
regExMatch(txt, "abc|xyz",)

Yes, it's a partial match
['abc']
Yes, it has multiple matches
['abc', 'xyz']


In [9]:
txt = "Butt="
regExMatch(txt, "[a-zA-z]*[^,]=")

Yes, it's an absolute match
['Butt=']


In [10]:
txt_list = ["very fat monster", 
            "fat tall monster", 
            "very fat fat tall monster", 
            "very very fat ugly monster"]
for txt in txt_list:
    regExMatch(txt, "(very )+(fat )?(tall|ugly) monster")

No match
No match
No match
Yes, it's an absolute match with repeating characters
[('very ', 'fat ', 'ugly')]


In [11]:
txt_list = ["an xml tag>", 
            "<opentag><closetag>"]
for txt in txt_list:
    regExMatch(txt, "<[^>]+>")

No match
Yes, it has multiple matches
['<opentag>', '<closetag>']


In [12]:
txt_list = ["foo=99andfoo",
           "foo=99and99",
           "99=99and\1"]
for txt in txt_list:
    regExMatch(txt, "(\w+)=(\d+)and\1")

No match
No match
Yes, it's an absolute match with repeating characters
[('99', '99')]


In [13]:
txt_list = ["11wally99",
           "&roger123",
           "yoyoma 23456",
           "mello drama" ]
for txt in txt_list:
    regExMatch(txt, "\w+\s*\d+")

Yes, it's an absolute match
['11wally99']
Yes, it's a partial match
['roger123']
Yes, it's an absolute match
['yoyoma 23456']
No match


In [14]:
txt_list = ["11wally99",
           "&roger123",
           "yoyoma 23456",
           "mello drama" ]
for txt in txt_list:
    regExMatch(txt, "\w+\s*\D+")

Yes, it's a partial match
['11wally']
Yes, it's a partial match
['roger']
Yes, it's a partial match
['yoyoma ']
Yes, it's an absolute match
['mello drama']
