# Python Tutorial - Part 4
This tutorial is based on [Udemy Python Course](https://www.udemy.com/course/python-core-and-advanced)

*Section 20: Regular expressions*

## Regular expressions

python module (re) methods:
- match
- search
- findall
- split
- sub

#### Sequence characters

```python
\d digit-char
\D non-digit-char
\s whitespace
\S non-whitespace
\w alpha-numeric char
\W non-alpha-numeric char
\b space around words
\A matches only at start of the string
\Z matches only at end of the string
```

1. Sequence characters
2. Quantifiers
3. Special characters

### Search

In [14]:
import re
str="Take up oNe one idea. One idea at a time. Of the wonderful world"
result1=re.search(r'o\w', str)
result2=re.search(r'o\w\w\w\w', str)
result3=re.search(r'o\w\w', str)
print("result1: "+result1.group())
print("result2: "+result2.group())
print("result3: "+result3.group())

result1: oN
result2: onder
result3: oNe


### FindAll and Match
```
findAll: returns all the substrings that matches as a list; else returns empty string
match: searches regex right at beginning of the string and returns first match only; else returns None
```

In [2]:
import re
str="Take up oNe one idea. one idea at a time. Of the wonderful world"
result1=re.search(r'o\w\w\w', str)
result2=re.findall(r'o\w\w', str)
result3=re.search(r'o\w\w', str)
result4=re.match(r'T\w\w', str)

# search
print(result1.group())
# findall does not need .group()
print(result2)
# search
print(result3.group())
# match
print(result4.group())

onde
['oNe', 'one', 'one', 'ond', 'orl']
oNe
Tak


### Split

In [35]:
import re
str="Take 1 up oNe 2 one idea 3 one idea at a time 4 Of the wonderful world"
result=re.split(r'\d+', str)
print(result)

['Take ', ' up oNe ', ' one idea ', ' one idea at a time ', ' Of the wonderful world']


### Substitute

In [37]:
import re
str="Take up oNe one idea. one idea at a time. Of the wonderful world"
result=re.sub(r'one','two', str)
print(result)

Take up oNe two idea. two idea at a time. Of the wonderful world


### Quantifiers
use quantifiers to match more than one
```
+      one or more repetitions
          e.g: \d+ one or more digits
*      zero or more repetitions
?      zero or one repetitions
{m}    exactly m number of occurrences
{m,n}  m is min number of occurrences, n is the maximum
```

In [8]:
import re
str="Take up One idea. One idea at a time. Of the wonderful world Only"

print("\nre.findall(r'O\w+', str)")
result=re.findall(r'O\w+', str)
print(result)
print("\nre.findall(r'o\w+', str)")
result=re.findall(r'o\w+', str)
print(result)
print("\nre.findall(r'o\w*', str)")
result=re.findall(r'o\w*', str)
print(result)
print("\nre.findall(r'o\w?', str)")
result=re.findall(r'o\w?', str)
print(result)
print("\nre.findall(r'o\w{3}', str)")
result=re.findall(r'o\w{3}', str)
print(result)
print("\nre.findall(r'O\w{2}', str)")
result=re.findall(r'O\w{2}', str)
print(result)
print("\nre.findall(r'O\w{1,2}', str)")
result=re.findall(r'O\w{1,2}', str)
print(result)
print("\nre.findall(r'O\w{1,4}', str)")
result=re.findall(r'O\w{1,4}', str)
print(result)


re.findall(r'O\w+', str)
['One', 'One', 'Of', 'Only']

re.findall(r'o\w+', str)
['onderful', 'orld']

re.findall(r'o\w*', str)
['onderful', 'orld']

re.findall(r'o\w?', str)
['on', 'or']

re.findall(r'o\w{3}', str)
['onde', 'orld']

re.findall(r'O\w{2}', str)
['One', 'One', 'Onl']

re.findall(r'O\w{1,2}', str)
['One', 'One', 'Of', 'Onl']

re.findall(r'O\w{1,4}', str)
['One', 'One', 'Of', 'Only']


### Matching dates
```python
str="22-09-2019"
re.findAll(r'\d{1,2}-\d{1,2}-\d{1,4}',str)
```

In [68]:
import re

str="Take up One 20-09-2019 idea. One idea at a time. 12-11-2019 Of Only"
result=re.findall(r'\d{1,2}-\d{1,2}-\d{1,4}',str)
print(result)

['20-09-2019', '12-11-2019']


### Special characters
```
\      escape special characters
.      matches any character except new line
^      match char right at the beginning of the string
$      opposite of ^ match will happen at end of the string
[..]   range, if you specify [a..z], then all chars from a to z will match
[^..]  opposite of [..]. if [0..6] is specified, it will match everything except 0-6
(..)   regular expression
(A|B)  matches either regex A or regex B
```

In [24]:
import re

str="Take up One 20-09-2019 idea. One idea at a time. 12-11-2019 Of Only TammaT"
# ^ search should happen at the beginning
result=re.search(r'^T\w',str)
print(result.group())
result=re.search(r'^T\w*',str)
print(result.group())
result=re.findall(r'[ab]\w*',str)
print(result)

Ta
Take
['ake', 'a', 'a', 'at', 'a', 'ammaT']


In [35]:
import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
print(x.group())

x = re.search("\s", txt)
print("The first white-space character is located in position:", x.start())

x = re.split("\s", txt, 1)
print("Split string at the first occurence of white space:", x)

x = re.sub("\s", "_", txt, 2)
print("Replace the first two occurences:", x)

x = re.search(r"\bS\w+", txt)
print(x.span())

x = re.search(r"\sS\w+", txt)
print(x.span()) 

The rain in Spain
The first white-space character is located in position: 3
Split string at the first occurence of white space: ['The', 'rain in Spain']
Replace the first two occurences: The_rain_in Spain
(12, 17)
(11, 17)
