## Regular Expressions

### <font color ='brown'> Notes </font>

#### Format: match = re.search(pat, str) 

+ a, X, 9, < -- ordinary characters just match themselves exactly. 
+ The meta-characters which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \ | ( ) 
+ . (a period) -- matches any single character except newline '\n'
+ . (a period) -- matches any
+ \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. 
+ \b -- boundary between word and non-word
+ \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. 
+ \S (upper case S) matches any non-whitespace character.
+ \t, \n, \r -- tab, newline, return
+ \d -- decimal digit [0-9] 

+ ^ = start, $ = end -- match the start or end of the string
+ \ -- inhibit the "specialness" of a character. 

In [1]:
import re

In [3]:
match = re.search('ix','six')
print(match.group())

ix


In [5]:
match = re.search('ie','six')

if match:
    print('Found ', match.group())
else:
    print('No Match')

No Match


In [7]:
search_str = 'Patronising'
match = re.search('isi',search_str)

if match:
    print('Found ', match.group())
else:
    print('No Match')

Found  isi


In [8]:
## End with
match = re.search('sing$',search_str)

if match:
    print('Found ', match.group())
else:
    print('No Match')

Found  sing


In [9]:
### Starts with
match = re.search('^Patr',search_str)

if match:
    print('Found ', match.group())
else:
    print('No Match')

Found  Patr


In [13]:
## . = any char but \n

match =  re.search('....','A2356789')
if match:
    print('Found ', match.group())
else:
    print('No Match')

Found  A235


+ \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. 
+ \d -- decimal digit [0-9] 

In [16]:
## \d = digit , \w =  char
#match = re.search(r'\d\d\d', '753 B.C') 
match = re.sub(r'\w\w\w', 'abc@xy.com','') 
if match:
    print('Found ', match.group())
else:
    print('No Match')


No Match


''

In [18]:
## \s = space
match = re.search(r'\d\d\d\s', '753 B.C') 

if match:
    print('Found ', match.group())
else:
    print('No Match')

Found  753 


#### Repetition

+ '+'   -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's
+ '*'   -- 0 or more occurrences of the pattern to its left
+ '?'   -- match 0 or 1 occurrences of the pattern to its left


In [19]:
  match = re.search(r'o+', 'hellooooo') 
  if match:
    print('Found ', match.group())

Found  ooooo


In [20]:
match = re.search(r'o*', 'hell') 
if match:
    print('Found ', match.group())

Found  


In [23]:
## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx12   3xx')
if match:
    print('Found ', match.group())

Found  12   3


#### Matching emails

In [27]:
str = 'My email is pink-elephant@google.com. Thank you'

match = re.search(r'\w+@\w+.com', str)
if match:
    print(match.group()  )


elephant@google.com


In [30]:
str = 'My email is pink_elephant@google.com. Thank you'

## Remove specialness of a character
match = re.search(r'[\w.-]+@[\w.-]+\.com', str)
if match:
    print(match.group() )

pink_elephant@google.com


#### Group Extraction

In [37]:
my_str = 'My email is pink.elephant@gmail.com. Thank you'

## Remove specialness of a character
match = re.search(r'([\w.-]+)@([\w.-]+\.com)', my_str)
if match:
    print('user name: ', match.group(1) )
    print('email provider: ', match.group(2) )

In [33]:
## Suppose we have a text with many email addresses
my_str = 'Please contact support@gmail.com for any queries. You can also reach me at pink.elephant@gmail.com. Thank you'

  ## re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', my_str) 
for email in emails:
    print (email)


support@gmail.com
pink.elephant@gmail.com.


### String Substitution

In [38]:
my_str = 'Anarkali'
print(re.sub(r'Anar',r'Champa',my_str))

Champakali


### Advanced

In [39]:
my_str = 'Please contact support@gmail.com for any queries. '

print(re.sub(r'([\w\.-]+)@([\w\.-]+\.com)',r'geethika@\2',my_str))

Please contact geethika@gmail.com for any queries. 


In [40]:
my_str = 'Please contact support@gmail.com for any queries. You can also reach me at pink.elephant@gmail.com. Thank you'
## re.sub(pat, replacement, str) -- returns new string with all replacements,
## \1 is group(1), \2 group(2) in the replacement
print(re.sub(r'([\w\.-]+)@([\w\.-]+)', r'\1@yahoo.com', my_str))
print(re.sub(r'([\w\.-]+)@([\w\.-]+)', r'blah@\2', my_str))

Please contact support@yahoo.com for any queries. You can also reach me at pink.elephant@yahoo.com Thank you
Please contact blah@gmail.com for any queries. You can also reach me at blah@gmail.com. Thank you
