# Regular expressions

## Materials & Resources

| Material                                                                              |        Time |
|:--------------------------------------------------------------------------------------|------------:|
| [RegexOne - Learn Regular Expressions](https://regexone.com/lesson/introduction_abcs) | interactive |

### Optional Materials & Resources

| Material                                                                           |    Time |
|:-----------------------------------------------------------------------------------|--------:|
| [CS50 - Regular Expressions Tutorial](https://www.youtube.com/watch?v=9cCaCwBKf5Y) | 1:38:35 |
| [Regular-Expressions.info](https://www.regular-expressions.info/)                  | reading |

## Material Review

- What are regular expressions?
  <!--
    A regular expression (regex or regexp for short) is a special text string
    for describing a search pattern.
  -->
- Is it case sensitive?
  <!--
    Yes it is but you can disable it with the i flag.
  -->
- How can you search for specific text?
  <!--
    Texts like "apple" or "pear" are valid regular expressions.
  -->
- How can you search for digits from 0 to 4?
  <!--
    You can use [0-4].
  -->
- How can you search for certain characters?
  <!--
    You can use character sets like [abc].
  -->
- How can you search for any digits?
  <!--
    You can use the \d token. In some regular expression libraries it is
    different from [0-9].
  -->
- How can you search for non-digits?
  <!--
    You can use the \D token.
  -->
- How can you search for any character?
  <!--
    You can use the . token.
  -->
- How can you search for any alphanumeric character?
  <!--
    You can use the \w token.
  -->
- How can you search for repetitions?
  <!--
      a? - for zero or one of a
      a* - for zero or more of a
      a+ - for one or more of a
      a{3} - exactly 3 of a
      a{3,} - 3 or more of a
      a{3,6} - between 3 and 6 of a
  -->
- How can you search for any whitespace?
  <!--
    You can use the \s token.
  -->
- How can you find out if the string starts or ends with a specific word?
  <!--
    You can use the start of string (^) anchor.
  -->
- What are the capture groups?
  <!--
    They capture the text matched by the regex inside them into a numbered group
    that can be reused with a numbered backreference.
  -->

## Workshop

In [1]:
import re

def test(regex, table, cG=False):
    output = {}
    p = re.compile(regex)
    for text in table:
        obj = p.match(text)
        if obj:
            if cG:
                output[text] = ("y", obj.group())
            else:
                output[text] = "y"
        else:
            output[text] = "n"
    print(output == table, "\n", output, "\n", table)

- **Reserved admin**

Create a regular expression that matches if for the following words:

- Admin
- admin

In [2]:
p = "[a,A]dmin"
p = re.compile(p)

print(p.match("admin"))
print(p.match("Admin"))

<re.Match object; span=(0, 5), match='admin'>
<re.Match object; span=(0, 5), match='Admin'>


- Numbers below 100

Create a regular expression that matches the numbers between 0 (including) and
100 (including).

| Task  | Text |
|:------|-----:|
| Match |    0 |
| Match |    9 |
| Match |   55 |
| Match |  100 |
| Skip  |  101 |
| Skip  |   -4 |

In [3]:
table = {"0":"y", "9":"y", "55":"y", "100":"y", "101":"n", "-4":"n"}

In [4]:
p = r"^\d{1,2}0?$"
test(p, table)

True 
 {'0': 'y', '9': 'y', '55': 'y', '100': 'y', '101': 'n', '-4': 'n'} 
 {'0': 'y', '9': 'y', '55': 'y', '100': 'y', '101': 'n', '-4': 'n'}


- Hungarian mobile numbers

Create a regular expression that matches the valid
[Hungarian mobile numbers][1].

[1]: https://en.wikipedia.org/wiki/Telephone_numbers_in_Hungary

| Task  |              Text |
|:------|------------------:|
| Match |   +36 20 473 2746 |
| Match |   +36 30 217 4912 |
| Match | 00 36 70 381 1288 |
| Match | 00 36 31 471 2818 |
| Skip  |  +36 20 3173 4717 |
| Skip  |  +36 102 237 1121 |
| Skip  |   +49 20 483 1273 |
| Skip  |    36 70 381 2183 |

In [5]:
mobn = {  "+36 20 473 2746":"y", 
          "+36 30 217 4912":"y",
        "00 36 70 381 1288":"y",
        "00 36 31 471 2818":"y",
          "+36 20 3173 4717":"n",
          "+36 102 237 1121":"n",
          "+49 20 483 1273":"n",
           "36 70 381 2183":"n"}

s = "(\+|00 )36 \d{1,2} \d{3} \d{4}"

test(s, mobn)

True 
 {'+36 20 473 2746': 'y', '+36 30 217 4912': 'y', '00 36 70 381 1288': 'y', '00 36 31 471 2818': 'y', '+36 20 3173 4717': 'n', '+36 102 237 1121': 'n', '+49 20 483 1273': 'n', '36 70 381 2183': 'n'} 
 {'+36 20 473 2746': 'y', '+36 30 217 4912': 'y', '00 36 70 381 1288': 'y', '00 36 31 471 2818': 'y', '+36 20 3173 4717': 'n', '+36 102 237 1121': 'n', '+49 20 483 1273': 'n', '36 70 381 2183': 'n'}


- GFA email address

Create a regular expression that matches all Green Fox Academy email address.

| Task    |                         Text | Capture Groups |
|:--------|-----------------------------:|----------------|
| Capture |     john@greenfoxacademy.com | `john`         |
| Capture | jane.doe@greenfoxacademy.com | `jane.doe`     |
| Capture |        jane@greenfox.academy | `jane`         |
| Skip    |                john@wick.com |                |
| Skip    |            jane@citromail.hu |                |
| Skip    |      janegreenfoxacademy.com |                |

In [6]:
gfa = {"john@greenfoxacademy.com":("y", "john"), 
       "jane.doe@greenfoxacademy.com":("y","jane.doe"), 
       "jane@greenfox.academy":("y","jane"),
       "john@wick.com":"n",
       "jane@citromail.hu":"n",
       "janegreenfoxacademy.com":"n"}

s = r"[\w\.]+(?=(@greenfox\.?academy(.com)?))"

test(s, gfa, cG=True)

True 
 {'john@greenfoxacademy.com': ('y', 'john'), 'jane.doe@greenfoxacademy.com': ('y', 'jane.doe'), 'jane@greenfox.academy': ('y', 'jane'), 'john@wick.com': 'n', 'jane@citromail.hu': 'n', 'janegreenfoxacademy.com': 'n'} 
 {'john@greenfoxacademy.com': ('y', 'john'), 'jane.doe@greenfoxacademy.com': ('y', 'jane.doe'), 'jane@greenfox.academy': ('y', 'jane'), 'john@wick.com': 'n', 'jane@citromail.hu': 'n', 'janegreenfoxacademy.com': 'n'}


- Mobile numbers

Create a regular expression that matches any other country's mobile numbers than
Hungary.

In [7]:
s = "(\+|00 )(?!36)[\d\s]+(?= $)"

print(re.match(s, "00 34 189503 06261 "))

<re.Match object; span=(0, 18), match='00 34 189503 06261'>


- Image source

Create a regular expression that matches the source from
[HTML image element](1).

[1]:https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img

| Task    |                                                Text | Capture Groups        |
|:--------|----------------------------------------------------:|-----------------------|
| Capture |                               `<img src="dog.png">` | `dog.png`             |
| Capture | `<img alt="Cat picture" src="./images/cat-01.png">` | `./images/cat-01.png` |
| Skip    |                 `<script src="jquery.js"></script>` |                       |

In [8]:
table = {'<img src="dog.png">':('y','dog.png'), 
         '<img alt="Cat picture" src="./images/cat-01.png">':('y','./images/cat-01.png'),
         '<script src="jquery.js"></script>':'n'}

pattern = r'(?<=(<img )).+(?<=(src=\"))(?P<img>.+)(?=\")'

output = {}
p = re.compile(pattern)
for text in table:
    obj = p.search(text)
    if obj:
        output[text] = ("y", obj.group("img"))
    else:
        output[text] = "n"

print(output == table, "\n", output, "\n", table)

True 
 {'<img src="dog.png">': ('y', 'dog.png'), '<img alt="Cat picture" src="./images/cat-01.png">': ('y', './images/cat-01.png'), '<script src="jquery.js"></script>': 'n'} 
 {'<img src="dog.png">': ('y', 'dog.png'), '<img alt="Cat picture" src="./images/cat-01.png">': ('y', './images/cat-01.png'), '<script src="jquery.js"></script>': 'n'}


- Solve the [problems on RegexOne](https://regexone.com/problem/matching_decimal_numbers)
- Practice more on [HackerRank](https://www.hackerrank.com/domains/regex)