# Regex Notebook

Import the regex library _re_ as follows:

In [1]:
import re

Just creating a string sentence to search on

In [2]:
text1 = "This is a beautiful day"

## Regex Modules:

There are a few regex modules that are used to search, replace etc. 
We'll see these modules in action here:

### The search() Module

This module is used to search for as pattern present _anywhere_ in the string.

**Syntax:** ``re.search(pattern, string, flags)``

This module returns a re.Match Object which has a few more interesting modules like:

- ``group()``
- ``groups()``
- ``start()``
- ``end()``
- ``span()``


And a few others.

In [3]:
re.search(r'is', text1)

<re.Match object; span=(2, 4), match='is'>

In [4]:
m = re.search(r'is', text1)
print(type(m))

<class 're.Match'>


#### The group() Sub Module:

This module returns the matched string.

In [5]:
m.group()

'is'

#### The start(), end(), span() Sub Modules:
These modules return:
 - The pattern start index
 - The pattern end index
 - The pattern [start, end]
 

In [6]:
m.start(), m.end(), m.span()

(2, 4, (2, 4))

### The match() Module:
Very similar to the ``re.search`` module, 
but this module searches for the patterns only at the beginning of the string.

Syntax: `re.match(pattern, string, flags)`
<hr>

Here, we are searching for 'is' at the start of text1,
 but beginning characters of text1 are 'Th', so this returns nothing. 

In [7]:
m = re.match(r'is', text1)
print(m)

None


But here, Since 'Th' is searched and it's present at the beginning,
 it returns the expected re.Match Object.

In [8]:
m = re.match(r'Th', text1)
print(m)

<re.Match object; span=(0, 2), match='Th'>


Since it is the same re.Match Object as previously seen, 
all the sub modules work in this as well.

In [9]:
m.group(), m.start(), m.end(), m.span()

('Th', 0, 2, (0, 2))

Also, you can access the span elements just like accessing a list element. 
Simple as using the index values.

In [10]:
x = m.span()
print(x[0], [1])

0 [1]


### The findall() Module:
This module, returns a list of all matched strings.

In [11]:
re.findall(r'is', text1)

['is', 'is']

In [12]:
text2 = "abbbaaabbbbabababa"

In [13]:
re.findall(r'ba', text2)

['ba', 'ba', 'ba', 'ba', 'ba']

In [14]:
mat = re.finditer(r'ba', text2)

In [15]:
for m in mat:
    print(m.group(), m.start(), m.end(), m.span())

ba 3 5 (3, 5)
ba 10 12 (10, 12)
ba 12 14 (12, 14)
ba 14 16 (14, 16)
ba 16 18 (16, 18)


In [16]:
print(re.sub(r'ba', 'xy', text2, count=2))

abbxyaabbbxybababa


In [17]:
pat = re.compile(r'ba')
type(pat)

re.Pattern

In [18]:
re.findall(pat, text2)

['ba', 'ba', 'ba', 'ba', 'ba']

In [19]:
text3 = "akasad kadkad; asdadnnas; asdkakds: ajdasdj, sjdjdj; sisisiu;      hshs"

In [20]:
text3_list = re.split(r'[ ;:,]\s*', text3)

In [21]:
print(text3_list)

['akasad', 'kadkad', 'asdadnnas', 'asdkakds', 'ajdasdj', 'sjdjdj', 'sisisiu', 'hshs']
