**Regular Expression Examples**

1. The match() function 
2. The search() function
3. The findall() function
4. The finditer() function
5. The sub() function
6. The subn() function
7. The start() and end() functions

See other examples at [regular expression](https://docs.python.org/3/library/re.html) documentation.

In [1]:
import re

s1 = 'Python is an excellent language.'
s2 = 'I like the Python language. I use Python to build ML applications!'

The `match()` function only returns a match if a match is found at the beginning of the `s1` string.

In [2]:
pattern = 'python'

print(re.match(pattern, s1))

None


`pattern` is in lower case, hence the `ignore case` flag helps in matching same pattern with lower and upper cases.

In [3]:
re.match(pattern, s1, flags=re.IGNORECASE)

<re.Match object; span=(0, 6), match='Python'>

Print matched string and its indices in the original string. The `start()` and `end()` functions return the indices of the start and end of the substring matched. 

In [4]:
m = re.match(pattern, s1, flags=re.IGNORECASE)
print(f'Found match {m.group(0)} ranging from index {m.start()} - {m.end()} in the string "{s1}"'#.format(m.group(0), 
                                                                            #m.start(), 
                                                                            #m.end(), s1)
     )

Found match Python ranging from index 0 - 6 in the string "Python is an excellent language."


The `match()` function does not work when pattern is not at the beginning of the`s2` string.

In [5]:
re.match(pattern, s2, re.IGNORECASE)

See functions that find and search in `re`. There is the `search()` function checks for a match anywhere in the string.

In [6]:
re.search(pattern, s2, re.IGNORECASE)

<re.Match object; span=(11, 17), match='Python'>

The `findall()` function finds and searches `pattern` in `s2`.

In [7]:
re.findall(pattern, s2, re.IGNORECASE)

['Python', 'Python']

The `finditer()` function returns an iterator that yield match objects over all non-overlapping matches for the re pattern in string.

In [8]:
match_objs = re.finditer(pattern, s2, re.IGNORECASE)
match_objs

<callable_iterator at 0x295a93549d0>

In [9]:
print(f'String: {s2}')
for m in match_objs:
    print(f"Found match '{m.group(0)}' ranging from index {m.start()} - {m.end()}")

String: I like the Python language. I use Python to build ML applications!
Found match 'Python' ranging from index 11 - 17
Found match 'Python' ranging from index 34 - 40


Replace pattern with another string. The `sub()` function replaces every occurrence of a pattern with a string or the result of a function.

In [10]:
re.sub(pattern, 'Java', s2, flags=re.IGNORECASE)

'I like the Java language. I use Java to build ML applications!'

The `subn()` function is similar to the `sub()` function but returns a tuple.

In [11]:
print(re.subn(pattern, 'Java', s2, flags=re.IGNORECASE))
type(re.subn(pattern, 'Java', s2, flags=re.IGNORECASE))

('I like the Java language. I use Java to build ML applications!', 2)


tuple

Dealing with unicode matching using regexes. 

In [12]:
s = u'H\u00e8llo! this is Python 🐍'
s

'Hèllo! this is Python 🐍'

Match a space and the immediate word.

In [13]:
re.findall(r'\w+', s)

['Hèllo', 'this', 'is', 'Python']

In [14]:
re.findall(r"[A-Z]\w+", s, re.UNICODE)

['Hèllo', 'Python']