### regular expression

- re.compile(): To create a regex object
- re.search(): find a pattern in a string
- re.match(): does this entire string conform to this pattern
- re.findall(): find all patterns in this string and returns all the matches in it not just the first match
- re.group(): to get the matched string

- Searching with Regex
match = re.search(pattern,string)

### Pattern type(Character Classes)
\w : sequence of word-like characters [a-zA-Z0–9_] that are not space
\d: Any numeric digit[0–9]
\s: whitespace characters(space,newline,tab)
\D: match characters that are NOT numeric digits
\W: match characters that are NOT words,digit or underscore
\S: match characters that are NOT spaces,tab or newline

In [None]:
### Repetition Group
+ : 1 or more
* : 0 or more
? : 0 or 1
{k}: exactly integer K occurence
{m,n}: m to n occurence inclusive
. :matches any character except the newline(\n)
^: start of the string
$: end of string
\: escape character

In [85]:
import re
example = "Welcome to the world of Python"
pattern = r'Python'
match = re.search(pattern,example)

print(match)
if match:
    print("found", match.group())
else:
    print("No match found")


<_sre.SRE_Match object; span=(24, 30), match='Python'>
found Python


In [99]:
message = 'my number is 123-4567'
# Here we are creating regex object,which define the pattern we are looking for 
myregex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
# Then we are trying to find a pattern in the string
match = myregex.search(message)
# This will tell us the actual text
print(match.group())


123-4567


In [116]:
### In case we have multiple phone number, use findall

message = 'my number is 510-123-4567 and my office number is 510-555-6677'
# Here we are creating regex object,which define the pattern we are looking for 
myregex = re.compile(r'(\d\d\d-)?(\d\d\d-\d\d\d\d)')
# Find all pattern of the string and return a list objects
print(myregex.findall(message))


[('510-', '123-4567'), ('510-', '555-6677')]


#### group()
Let's use the group to separate area code with phone number. Here parenthesis has special meaning where group starts and where group end.

In [90]:
message = 'my number is 510-123-4567'
# Here we are creating regex object,which define the pattern we are looking for 
myregex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
# Then we are trying to find a pattern in the string
match = myregex.search(message)
match

<_sre.SRE_Match object; span=(13, 25), match='510-123-4567'>

In [91]:
match.group()

'510-123-4567'

In [92]:
match.group(1)

'510'

In [93]:
match.group(2)

'123-4567'

- To find out parentheses literally in string, we need to escape parentheses using backslash \(

In [112]:
message = "my number is 123-4567"

In [113]:
myregex = re.compile(r'\(\d\d\d\)-(\d\d\d-\d\d\d\d)')
match = myregex.search('My number is 123-4567')

In [114]:
match.group()

AttributeError: 'NoneType' object has no attribute 'group'

In [111]:
myregex = re.compile(r'\(\d\d\d\)-(\d\d\d-\d\d\d\d)')

In [78]:
myregex = re.compile(r'\(?(\d\d\d)?\)?-?(\d\d\d-\d\d\d\d)')

In [83]:
match = myregex.search('My number is 123-4567')

In [84]:
match.group()

'123-4567'

- Pipe Character(|) match one of many possible groups

In [103]:
lang = re.compile(r'Pyt(hon|con|mon)')
match = lang.search('Pytmon is a wonderful language')
match.group()


'Pytmon'

- If regular expression not able to find that pattern it will return None, to verify that


In [104]:
match = lang.search('Pytut is a wonderful language')


In [105]:
match == None

True

### ?: zero or one time


In [106]:
myexpr = re.compile(r'Pyt(ho)?n')

In [109]:
match = myexpr.search('Pytn a wonderful language')

In [110]:
match.group()

'Pytn'

In [45]:
match = myexpr.search('Pytn a wonderful language')

In [46]:
match.group()

'Pytn'

In [47]:
match = myexpr.search('Pythohon a wonderful language')

In [None]:
pattern = '^M?M?M?$'
re.search(pattern, 'M')
re.search(pattern, 'MM')
re.search(pattern, 'MMM')
re.search(pattern, 'MMMM')

#### Example of Phone Number we can make area code optional

In [None]:
message1 = "My phone number is 123-4567"
message2 = "My phone number is 201-123-4567"
message3 = "My phone number is (201)-123-4567"


In [55]:
myphone = re.compile(r'\(?(\d\d\d)?(\))?-?\d\d\d-\d\d\d\d')

In [61]:
match = myphone.search("My phone number is 123-4567")

In [62]:
match.group()

'123-4567'

#### “*” zero or more time

In [63]:
myexpr = re.compile(r'Pyth(on)*')

In [66]:
match = myexpr.search("Welcome to the world of Pyth")
match.group()

'Pyth'

##### “+” must appear at least 1 or more time


In [68]:
myexpr = re.compile(r'Pyth(on)+')

In [69]:
match = myexpr.search("Welcome to the world of Pyth")
match.group()

AttributeError: 'NoneType' object has no attribute 'group'

In [70]:
match = myexpr.search("Welcome to the world of Python")
match.group()

'Python'

In [71]:
match = myexpr.search("Welcome to the world of Pythononon")
match.group()

'Pythononon'

- Now if we want to match a specific number of times


In [None]:
myregex = re.compile(r'(Re){3}'')

In [None]:
match = myregex.search("My matching string is ReReRe")

In [None]:
match.group()

- Range of repetitions


In [None]:
myregex = re.compile(r'(Re){3,5}')
match = myregex.search("My matching string is ReReReRe")
match.group()

- The regular expression in Python do greedy matches i.e it try to match longest possible string


In [None]:
mydigit = re.compile(r'(\d){3,5}'')

In [None]:
match = mydigit.search('123456789')
match.group()

In [None]:
mydigit = re.compile(r'(\d){3,5}?'')