# Python regex examples

- Cheatsheet: https://www.debuggex.com/cheatsheet/regex/python

In [1]:
import re

findall	Returns a list containing all matches
search	Returns a Match object if there is a match anywhere in the string
split	Returns a list where the string has been split at each match
sub	Replaces one or many matches with a string

In [57]:
text = '''
RegExr was created by GSkinner.com, and is proudly hosted by Media Temple.

Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.

The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community, and view patterns you create or favorite in My Patterns.

www.website.com

Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.

'''

## Regex Functions

**<code>compile()</code>**: Using <code>re.compile()</code> and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program. 

In [58]:
pattern = re.compile(r"\w+\.com")

In [59]:
pattern = re.compile(r"\w+\.com", flags=re.IGNORECASE)

Raw strings <code>r\"..."</code> are particularly handy for defining regex patterns, as when using normal strings you would have to escape most of your backslashes by using \\\

**<code>findall()</code>**: Returns all matches in a list of strings 

In [60]:
re.findall(pattern, text)

['GSkinner.com', 'website.com']

**<code>finditer(string, [position, end_position])</code>**: Similar to findall() but finds all the possible matches in the entire sequence but returns regex match objects as an iterator.



In [61]:
list(re.finditer(pattern, text))

[<re.Match object; span=(23, 35), match='GSkinner.com'>,
 <re.Match object; span=(432, 443), match='website.com'>]

**<code>search()</code>**: Returns first occurence of a pattern in a text. Unlike findall it returns a match object

In [62]:
result = re.search(pattern, text)

result

<re.Match object; span=(23, 35), match='GSkinner.com'>

Truthy

In [63]:
if result:
    print('Found')
else:
    print('Not found')

Found


Match object properties

In [64]:
result.span(), result.group()

((23, 35), 'GSkinner.com')

In [65]:
text[result.span()[0]:result.span()[1]]

'GSkinner.com'

In [66]:
.span() returns a tuple containing the start-, and end positions of the match.
.string returns the string passed into the function
.group()

SyntaxError: invalid syntax (<ipython-input-66-7c07586a33e7>, line 1)

**<code>match()</code>**: Unlike <code>search()</code> only returns first occuring match on the <b>first line</b>

In [67]:
re.match(pattern, text)

**<code>sub(pattern, repl, string, count=0, flags=0)</code>**: substitute function. It returns the string obtained by replacing or substituting the leftmost non-overlapping occurrences of pattern in string by the replacement repl

In [68]:
re.sub(pattern, 'REPLACED.com', text)

'\nRegExr was created by REPLACED.com, and is proudly hosted by Media Temple.\n\nEdit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\n\nThe side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community, and view patterns you create or favorite in My Patterns.\n\nwww.REPLACED.com\n\nExplore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.\n\n'

**<code>split(string, maxsplit = 0)</code>**: splits the strings wherever the pattern matches and returns a list

In [69]:
split_text = 'blasdf;::;safasdfs;::;asdsdsad;::;addd'
split_pattern = re.compile(r';::;')

In [73]:
re.split(split_pattern, split_text)

['blasdf', 'safasdfs', 'asdsdsad', 'addd']

**<code>re.groups()</code>**: returns a tuple containing all the subgroups of the match

In [88]:
email_text = 'bla@gmail.com loremipsumbla dada@outlook.com asdddd'
email_pattern = r'([\w\.-]+)@([\w\.-]+)'

In [89]:
re.search(email_pattern, email_text).groups()

('bla', 'gmail.com')