# Regular Expressions: Regexes in Python (Part 2)

### `re` Module Functions

In addition to `re.search()`, the re module contains several other functions to help you perform regex-related tasks.

The available regex functions in the Python re module fall into the following three categories:
1. Searching Function
2. Substitution Function
3. Utility Function

### Searching Functions
Searching functions scan a search string for one or more matches of the specified regex:

|Function|	Description|
|---:|:-------------|
|re.search()|	Scans a string for a regex match|
|re.match()|	Looks for a regex match at the beginning of a string|
|re.fullmatch()|	Looks for a regex match on an entire string
|re.findall()|	Returns a list of all regex matches in a string|
|re.finditer()|	Returns an iterator that yields regex matches from a string|

As you can see from the table, these functions are similar to one another. But each one tweaks the searching functionality in its own way.

## `re.search(<regex>, <string>, flags=0)`

Scans a string for a regex match.

`re.search(<regex>, <string>)` looks for any location in `<string>` where `<regex>` matches:



In [1]:
import re

In [2]:
re.search(r'(\d+)', 'foo123bar')

<re.Match object; span=(3, 6), match='123'>

In [3]:
re.search(r'[a-z]+', '123FOO456', flags=re.IGNORECASE)

<re.Match object; span=(3, 6), match='FOO'>

In [4]:
print(re.search(r'\d+', 'foo.bar'))

None


The function returns a match object if it finds a match and `None` otherwise.

## `re.match(<regex>, <string>, flags=0)`

Looks for a regex match at the beginning of a string.

This is identical to `re.search()`, except that `re.search()` returns a match if `<regex>` matches anywhere in `<string>`, whereas `re.match()` returns a match only if `<regex>` matches at the beginning of `<string>`:

In [5]:
re.search(r'\d+', '123foobar')
# searches match in whole string

<re.Match object; span=(0, 3), match='123'>

In [6]:
re.search(r'\d+', 'foo123bar')
# searches match in whole string

<re.Match object; span=(3, 6), match='123'>

In [7]:
re.match(r'\d+', '123foobar')
# matches because of digits at the biginning

<re.Match object; span=(0, 3), match='123'>

In [8]:
print(re.match(r'\d+','foo123bar'))
# not matches because of digits at the middle

None


In the above example, `re.search()` matches when the digits are both at the beginning of the string and in the middle, but `re.match()` matches only when the digits are at the beginning.

Remember from the previous tutorial in this series that if `<string>` contains embedded newlines, then the MULTILINE flag causes `re.search()` to match the caret `(^)` anchor metacharacter either at the beginning of `<string>` or at the beginning of any line contained within `<string>`:

In [9]:
s = 'foo\nbar\nbaz'

In [10]:
re.search('^foo', s)

<re.Match object; span=(0, 3), match='foo'>

In [11]:
re.search('^bar', s, re.MULTILINE)


<re.Match object; span=(4, 7), match='bar'>

The MULTILINE flag does not affect re.match() in this way:

In [12]:
s = 'foo\nbar\nbaz'

In [13]:
re.match('^foo', s)

<re.Match object; span=(0, 3), match='foo'>

In [14]:
print(re.match('^bar', s, re.MULTILINE))
#it does not work for match function

None


Even with the `MULTILINE` flag set, `re.match()` will match the caret (^) anchor only at the beginning of `<string>`, not at the beginning of lines contained within `<string>`.

Note that, although it illustrates the point, the caret (^) anchor on line 3 in the above example is redundant. With re.match(), matches are essentially always anchored at the beginning of the string.

## `re.fullmatch(<regex>, <string>, flags=0)`

Looks for a regex match on an entire string.

This is similar to `re.search()` and `re.match()`, but `re.fullmatch()` returns a match only if `<regex> ``matches <string> in its entirety`:

In [15]:
print(re.fullmatch(r'\d+', '123foo'))
# it has to match entirety

None


In [16]:
print(re.fullmatch(r'\d+', '123foo'))
# it has to match entirety

None


In [17]:
re.fullmatch(r'\d+', '123')
# it matches entirety

<re.Match object; span=(0, 3), match='123'>

In [18]:
re.search(r'^\d+$', '123')

<re.Match object; span=(0, 3), match='123'>

## `re.findall(<regex>, <string>, flags=0)`
Returns a list of all matches of a regex in a string.

`re.findall(<regex>, <string>)` returns a list of all non-overlapping matches of `<regex>` in `<string>`. It scans the search string from left to right and returns all matches in the order found:

In [19]:
re.findall(r'\w+', '...foo,,,,bar:%$baz//|')

['foo', 'bar', 'baz']

If <regex> contains a capturing group, then the return list contains only contents of the group, not the entire match:

In [20]:
re.findall(r'#(\w+)#', '#foo#.#bar#.#baz#')
# (#)  does not match because it is outside the grouping parentheses.

['foo', 'bar', 'baz']

In this case, the specified regex is `#(\w+)#`. The matching strings are `'#foo#'`, `'#bar#'`, and '`#baz#'`. But the hash `(#)` characters don’t appear in the return list because they’re outside the grouping parentheses.

If `<regex>` contains more than one capturing group, then `re.findall()` returns a list of tuples containing the captured groups.
<br>The length of each tuple is equal to the number of groups specified:

In [21]:
# i.e 1
re.findall(r'(\w+),(\w+)','foo,bar,baz,qux,quux,corge' )
#In the above example, the regex contains two capturing groups, so re.findall() 
#returns a list of three two-tuples, 
#each containing two captured matches.

[('foo', 'bar'), ('baz', 'qux'), ('quux', 'corge')]

In [22]:
#i.e 2
re.findall(r'(\w+),(\w+),(\w+)','foo,bar,baz,qux,quux,corge') 
#contains three groups, so the return value is a list of two three-tuples.

[('foo', 'bar', 'baz'), ('qux', 'quux', 'corge')]

## `re.finditer(<regex>, <string>, flags=0)`
Returns an iterator that yields regex matches.

`re.finditer(<regex>, <string>)` scans `<string>` for non-overlapping matches of `<regex>` and returns an iterator that yields the match objects from any it finds. `It scans the search string from left to right and returns matches in the order it finds them`:

In [23]:
it = re.finditer(r'\w+','...foo,,,,bar:%$baz//|')


In [24]:
print(it)

<callable_iterator object at 0x000002847CAA8D90>


In [25]:
next(it)
# first iter will finds 'foo'

<re.Match object; span=(3, 6), match='foo'>

In [26]:
next(it)
# second iter finds 'bar'

<re.Match object; span=(10, 13), match='bar'>

In [27]:
next(it)
# third iter finds 'baz'

<re.Match object; span=(16, 19), match='baz'>

In [28]:
#next(it)
# now it will shows error 
# StopIteration

In [29]:
for i in re.finditer(r'\w+', '...foo,,,,bar:%$baz//|'):
...     print(i)

<re.Match object; span=(3, 6), match='foo'>
<re.Match object; span=(10, 13), match='bar'>
<re.Match object; span=(16, 19), match='baz'>


`re.findall()` and `re.finditer()` are very similar, but they differ in two respects:

1. `re.findall()` returns a list, whereas `re.finditer()` returns an iterator.

2. The items in the list that `re.findall()` returns are the actual matching strings, whereas the items yielded by the iterator that `re.finditer()` returns are match objects.

## Substitution Functions
Substitution functions replace portions of a search string that match a specified regex:

|Function|	Description|
|---:|:-------------|
|`re.sub()`|	Scans a string for regex matches, replaces the matching portions of the string with the specified replacement string, and returns the result|
|`re.subn()`|	Behaves just like `re.sub()` but also returns information regarding the number of substitutions made|

Both `re.sub()`and `re.subn()` create a new string with the specified substitutions and return it. The original string remains unchanged. (Remember that strings are immutable in Python, so it wouldn’t be possible for these functions to modify the original string.)

## `re.sub(<regex>, <repl>, <string>, count=0, flags=0)`
Returns a new string that results from performing replacements on a search string

`re.sub(<regex>, <repl>, <string>)` finds the leftmost non-overlapping occurrences of `<regex>` in `<string>`, replaces each match as indicated by `<repl>`, and returns the result. `<string>` remains unchanged.

`<repl>` can be either a string or a function, as explained below.

## Substitution by String
If `<repl>` is a string, then `re.sub()` inserts it into `<string>` in place of any sequences that match `<regex>`:

In [30]:
s = 'foo.123.bar.789.baz'

In [31]:
re.sub(r'\d+','#', s)
# it works like replace function
# in this case digits are replaced by #

'foo.#.bar.#.baz'

In [32]:
re.sub('[a-z]+', '(*)', s)
# re.sub(<regex>, <repl>, <string>)
# in this case lowercase alphabets replaced with (*)

'(*).123.(*).789.(*)'

`re.sub()` replaces numbered backreferences `(\<n>)` in `<repl>` with the text of the corresponding captured group:

## replace with different string with different way

In [33]:
re.sub(r'(\w+),bar,baz,(\w+)',
       r'\2,bar,baz,\1',
        'foo,bar,baz,qux')

'qux,bar,baz,foo'

* To avoid ambiguity in cases where a numbered backreference is immediately followed by a literal digit character
* For example, suppose you have a string like `'foo 123 bar'` and want to add a `'0'` at the end of the digit sequence

In [34]:
# re.sub(r'(\d+)', r'\10', 'foo 123 bar')
# it shows error 
# error: invalid group reference 10 at position 1

Alas, the regex parser in Python interprets `\10` as a backreference to the `tenth captured group, which doesn’t exist` in this case. Instead, you can use `\g<1>` to refer to the group:

In [35]:
 re.sub(r'(\d+)', r'\g<1>7', 'foo 123 bar')

'foo 1237 bar'

In [36]:
re.sub(r'(\d+)', r'\g<1>0', 'foo 123 bar')

'foo 1230 bar'

The backreference `\g<0>`refers to the text of the entire match. This is valid even when there are no grouping parentheses in `<regex>`:

In [37]:
re.sub(r'\d+', '/\g<0>/', 'foo 123 bar')

'foo /123/ bar'

If `<regex>` specifies a zero-length match, then `re.sub()` will substitute `<repl>` into every character position in the string:

In [38]:
re.sub('x*', '-', 'foo')
#regex x* matches any zero-length sequence
#o re.sub() inserts the replacement string at every character position

'-f-o-o-'

In the example above, the regex `x*` matches any zero-length sequence, so `re.sub()` `inserts` the `replacement string` at every character position in the string—before the first character, between each pair of characters, and after the last character.

If `re.sub()` doesn’t find any matches, then it always returns `<string>` unchanged.

## Substitution by Function

If you specify `<repl>` as a function, then `re.sub()` calls that function for each match found. It passes each corresponding match object as an argument to the function to provide information about the match. The function return value then becomes the replacement string:

In [39]:
def f(match_obj):
    s = match_obj.group(0) # the matching string
    
    # s.isdigit() returns True if all characters in s are digit
    if s.isdigit():
        return str(int(s) * 10)
    else:
        return s.upper()
    

In [40]:
re.sub(r'\w+',f,'foo.10.bar.20.baz.30')


'FOO.100.BAR.200.BAZ.300'

### understanding
n this example, `f()` gets called for each match. As a result, `re.sub()` converts each alphanumeric portion of `<string>` to all `uppercase` and `multiplies` each numeric portion by `10`.

### Limiting the Number of Replacements
If you specify a positive integer for the optional count parameter, then `re.sub()` performs at most that many replacements:

In [41]:
re.sub(r'\w+','xxXX','foo.ba.baz.qux' )
# all alphabets replaced by group which is seperated by . replaced 
# with xxXX

'xxXX.xxXX.xxXX.xxXX'

As with most re module functions, `re.sub()` accepts an optional `<flags>` argument as well.

## `re.subn(<regex>, <repl>, <string>, count=0, flags=0)`
Returns a new string that results from performing replacements on a search string and also returns the number of substitutions made.

`re.subn()` is identical to `re.sub()`, except that `re.subn()` returns a two-tuple consisting of the modified string and the number of substitutions made:



In [42]:
re.subn(r'\w+', 'xxx', 'foo.bar.baz.qux')

('xxx.xxx.xxx.xxx', 4)

this digit occures because 4 group of characters replaced with xxx 

In [43]:
re.subn(r'\w+','xxx', 'foo.bar.baz.qux', count=2)
# Introducing count
# count will define how many character groups will replace with replace string
# in this case count= 2 
# hence first two groups will replace with replace string

('xxx.xxx.baz.qux', 2)

In [44]:
# creating function

def f(match_obj):
     m = match_obj.group(0)
     if m.isdigit():
        return str(int(m) * 10)
     else:
        return m.upper()

In [45]:
re.subn(r'\w+', f, 'foo.10.bar.20.baz.30')

('FOO.100.BAR.200.BAZ.300', 6)

In all other respects, `re.subn()` behaves just like `re.sub()`.

## Utility Functions
There are two remaining regex functions in the Python re module that you’ve yet to cover:

|Function|	Description|
|---:|:-------------|
|re.split()|	Splits a string into substrings using a regex as a delimiter|
|re.escape()|	Escapes characters in a regex|

These are functions that involve regex matching but don’t clearly fall into either of the categories described above.

## `re.split(<regex>, <string>, maxsplit=0, flags=0)`

Splits a string into substrings.

re.split(`<regex>, <string>`) splits `<string>` into substrings using `<regex>` as the delimiter and returns the substrings as a list.

The following example splits the specified string into substrings delimited by a comma (,), semicolon (;), or slash (/) character, surrounded by any amount of whitespace:

In [46]:
re.split('\s*[,;/]\s*','foo,bar ; baz / qux')
# spliting with matches
#\s* any number of whitespace characters. a comma

['foo', 'bar', 'baz', 'qux']

If `<regex>` contains capturing groups, then the return list includes the matching delimiter strings as well:

In [47]:
re.split('(\s*[,;/]\s*)', 'foo,bar  ;  baz / qux')
# as because of perenthesis
# list contains matches also 

['foo', ',', 'bar', '  ;  ', 'baz', ' / ', 'qux']

This time, the return list contains not only the substrings 'foo', 'bar', 'baz', and 'qux' but also several delimiter strings:
* ','
* ' ; '
* ' / '



This can be useful if you want to split `<string>` apart into delimited tokens, process the tokens in some way, then piece the string back together using the same delimiters that originally separated them:

In [48]:
string = 'foo,bar ; baz / qux'

In [49]:
regex = r'(\s*[,;/]\s*)'
# perenthesis will allows to contains in matches 

In [50]:
a = re.split(regex, string)

In [51]:
# list of tokens and delimiters
a

['foo', ',', 'bar', ' ; ', 'baz', ' / ', 'qux']

In [52]:
# Enclosed each token in <>'s 
for i, s in enumerate(a):
    
    # This will be True for the tokens but not delimiters
    if not re.fullmatch(regex,s):
        a[i] = f'<{s}>'
        


In [53]:
# Put the tokens back together using the same delimiters
''.join(a)

'<foo>,<bar> ; <baz> / <qux>'

If you need to use groups but don’t want the delimiters included in the return list, then you can use noncapturing groups:

In [54]:
string = 'foo,bar  ;  baz / que'
regex = r'(?:\s*[,;/]\s*)'
re.split(regex, string)

#\s* for white space
#[,;/] matches this characters

['foo', 'bar', 'baz', 'que']

If the optional maxsplit argument is present and greater than zero, then `re.split()` performs at most that many splits. The final element in the return list is the remainder of `<string>` after all the splits have occurred:

In [55]:
s = 'foo, bar, baz, qux, quux, corge'

In [56]:
re.split(r',\s*', s)

['foo', 'bar', 'baz', 'qux', 'quux', 'corge']

In [57]:
re.split(r',\s*', s, maxsplit=3)
# group 1 foo
# group 2 bar
# group 3 baz
# group 4 qux, quux, corge

# introduced maxsplit flag

['foo', 'bar', 'baz', 'qux, quux, corge']

If `<regex>` contains capturing groups so that the return list includes delimiters, and `<regex>` matches the start of `<string>`, then re.split() places an empty string as the first element in the return list. Similarly, the last item in the return list is an empty string if `<regex>` matches the end of `<string>`:

In this case, the `<regex>` delimiter is a single slash (/) character. In a sense, then, there’s an empty string to the left of the first delimiter and to the right of the last one. So it makes sense that `re.split()` places empty strings as the first and last elements of the return list.

## `re.escape(<regex>)`

Escapes characters in a regex.

`re.escape(<regex>)` returns a copy of `<regex>` with each nonword character (anything other than a letter, digit, or underscore) preceded by a backslash.

This is `useful if you’re calling one of the re module functions`, and the `<regex>` you’re passing in has a lot of special characters that you want the parser to take literally instead of as metacharacters. It saves you the trouble of putting in all the backslash characters manually:

In [58]:
print(re.match('foo^bar(baz)|qux', 'foo^bar(baz)|qux'))
#there isn’t a match on line 1 because the regex 'foo^bar(baz)|qux' 
#contains special characters that behave as metacharacters.

None


In [59]:
re.match('foo\^bar\(baz\)\|qux', 'foo^bar(baz)|qux')
# \^ used for special character like ^,(,)... so on
# they’re explicitly escaped with backslashes, so a match occurs

<re.Match object; span=(0, 16), match='foo^bar(baz)|qux'>

In [60]:
re.escape('foo^bar(baz)|qux') == 'foo\^bar\(baz\)\|qux'
#  demonstrate that you can achieve the 
#  same effect using re.escape()

True

In [61]:
re.match(re.escape('foo^bar(baz)|qux'), 'foo^bar(baz)|qux')
#  demonstrate that you can achieve the 
#  same effect using re.escape()

<re.Match object; span=(0, 16), match='foo^bar(baz)|qux'>

### Compiled Regex Objects in Python
The `re`module supports the capability to precompile a regex in Python into a **regular expression object** that can be repeatedly used later.

## `re.compile(<regex>, flags=0)`

Compiles a regex into a regular expression object.

`re.compile(<regex>)` compiles `<regex>` and returns the corresponding regular expression object. If you include a `<flags>` value, then the corresponding flags apply to any searches performed with the object.

There are two ways to use a compiled regular expression object. You can specify it as the first argument to the `re` module functions in place of `<regex>`:

`re.obj = re.compile(<regex>, <flags>)`
<br>`result = re.search(re_obj, <string>)`

You can also invoke a method directly from a regular expression object:

`re_obj = re.compile(<regex>, <flags>)`
<br>`result = re_obj.search(<string>)`

Both of the examples above are equivalent to this:

`result = re.search(<regex>, <string>, <flags>)`

Here’s one of the examples you saw previously, recast using a compiled regular expression object:

In [62]:
re.search(r'(\d+)', 'foo123bar')


<re.Match object; span=(3, 6), match='123'>

In [63]:
re_obj = re.compile(r'(\d+)')
# re.comile wor as storing the info in the variable to which will act as a <regex>

In [64]:
re.search(re_obj, 'foo123bar')
# here re_obj works as the <regex> 

<re.Match object; span=(3, 6), match='123'>

In [65]:
# i.e 1
r1 = re.search('ba[rz]', 'FOOBARBAZ', flags=re.I)
# this is without re.compile 

In [66]:
#i.e2 
re_obj = re.compile('ba[rz]', flags=re.I)
# creating <regex> object by re.compile also 
# it can contain flags 

In [67]:
# i.e 3
r2 = re.search(re_obj, 'FOOBARBAZ')
# applying created object 

In [68]:
# i.e 4
r3 = re_obj.search('FOOBARBAZ')
# now this object workes as a function

In [69]:
print("printing 1st variable:{}".format(r1))

printing 1st variable:<re.Match object; span=(3, 6), match='BAR'>


In [70]:
print("printing 2st variable:{}".format(r2))
# this variable contains obj as a flag

printing 2st variable:<re.Match object; span=(3, 6), match='BAR'>


In [71]:
print("printing 3st variable:{}".format(r3))
# this variable contains obj as a function

printing 3st variable:<re.Match object; span=(3, 6), match='BAR'>


## Why Bother Compiling a Regex?
What good is precompiling? There are a couple of possible advantages.

If you use a `particular regex in your Python code frequently`, then `precompiling allows you to separate out the regex definition` from its uses. This `enhances modularity`. Consider this example:

In [72]:
s1, s2, s3, s4 = 'foo.bar', 'foo123bar', 'baz99', 'qux & grault'

In [73]:
print(re.search('\d+', s1))
# does not match 

None


In [74]:
re.search('\d+', s2)
# does match digits in s2 

<re.Match object; span=(3, 6), match='123'>

In [75]:
re.search('\d+', s3)
# does match digits in s3

<re.Match object; span=(3, 5), match='99'>

In [76]:
re.search('\d+', s4)
# s4 does not contain any digits 
# does not match 

* Here, the regex \d+ appears several times
* If, in the course of maintaining this code, you decide you need a different regex, then you’ll need to change it in each location.
* That’s not so bad in this small example because the uses are close to one another
* But in a larger application, they might be widely scattered and difficult to track down.

The following is more modular and more maintainable:

In [77]:
s1, s2, s3, s4 = 'foo.bar', 'foo123bar', 'baz99', 'qux & grault'
re_obj = re.compile('\d+')
# created object from re.compile

In [78]:
# print(re.search('\d+', s1))
# vs 
re_obj.search(s1)

In [79]:
# re.search('\d+', s2)
# vs
re_obj.search(s2)

<re.Match object; span=(3, 6), match='123'>

In [80]:
# re.search('\d+', s3)
# vs
re_obj.search(s3)

<re.Match object; span=(3, 5), match='99'>

In [81]:
# re.search('\d+', s4)
# vs
re_obj.search(s4)

Then again, you can achieve similar modularity without precompiling by using variable assignment:

In [82]:
s1, s2, s3, s4 = 'foo.bar', 'foo123bar', 'baz99', 'qux & grault'
regex = '\d+'

In [83]:
re.search(regex, s1)

In [84]:
re.search(regex, s2)

<re.Match object; span=(3, 6), match='123'>

In [85]:
re.search(regex, s3)

<re.Match object; span=(3, 5), match='99'>

In [86]:
re.search(regex, s4)

It might seem like compiling the regex once ahead of time would be more efficient than recompiling it each of the thousands of times it’s used

In theory, you might expect precompilation to result in faster execution time as well. Suppose you call `re.search()` many thousands of times on the same regex. It might seem like compiling the regex once ahead of time would be `more efficient than recompiling` it each of the `thousands of times it’s used`.



In practice, though, that isn’t the case. The truth is that the re module compiles and caches a regex when it’s used in a function call

If the same regex is used subsequently in the same Python code, then it isn’t recompiled. The compiled value is fetched from cache instead. So the performance advantage is minimal

All in all, there isn’t any immensely compelling reason to compile a regex in Python

Like much of Python, it’s just one more tool in your toolkit that you can use if you feel it will improve the readability or structure of your code.

# Regular Expression Object Methods

A compiled regular expression object `re_obj` supports the following methods:

* `re_obj.search(<string>[, <pos>[, <endpos>]])`
* `re_obj.match(<string>[, <pos>[, <endpos>]])`
* `re_obj.fullmatch(<string>[, <pos>[, <endpos>]])`
* `re_obj.findall(<string>[, <pos>[, <endpos>]])`
* `re_obj.finditer(<string>[, <pos>[, <endpos>]])`

These all behave the same way as the corresponding `re` functions that you’ve already encountered, with the exception that they also support the optional `<pos>` and `<endpos>` parameters. If these are present, then the search only applies to the portion of `<string>` indicated by `<pos>` and `<endpos>`, which act the same way as indices in slice notation:

In [87]:
# ie.2

re_obj = re.compile(r'\d+')
s = 'foo123barbaz'

In [88]:
# ie.2
re_obj.search(s)

<re.Match object; span=(3, 6), match='123'>

In [89]:
# ie.3

s[6:9]

'bar'

In [90]:
# ie.4

print(re_obj.search(s, 6, 9))
# same as example 3 

None


In the above example, the regex is `\d+`, a sequence of digit characters. The `.search()` call on ie 2 searches all of s, so there’s a match. On ie. 4, the `<pos>` and `<endpos>` parameters effectively restrict the search to the substring starting with character 6 and going up to but not including character 9 (the substring 'bar'), which doesn’t contain any digits.

If you specify `<pos>` but omit `<endpos>`, then the search applies to the substring from `<pos>` to the end of the string.

Note- that anchors such as caret `(^)` and dollar sign `($)` still refer to the start and end of the entire string, not the substring determined by `<pos>` and `<endpos>`:

In [91]:
re_obj = re.compile('^bar')

In [92]:
s = 'foobarbaz'

In [93]:
s[3:]

'barbaz'

In [94]:
print(re_obj.search(s, 3))

None


Here, even though `'bar'` does occur at the start of the substring beginning at character 3, it isn’t at the start of the entire string, so the caret (^) anchor fails to match.

The following methods are available for a compiled regular expression object re_obj as well:



* `re_obj.split(<string>, maxsplit=0)`
* `re_obj.sub(<repl>, <string>, count=0)`
* `re_obj.subn(<repl>, <string>, count=0)`

These also behave analogously to the corresponding `re` functions, but they don’t support the `<pos>` and `<endpos>` parameters.

## Regular Expression Object Attributes

The re module defines several useful attributes for a compiled regular expression object:

|Attribute|	Meaning|
| ----------- | ----------- |
|re_obj.flags|	Any <flags> that are in effect for the regex|
|re_obj.groups|	The number of capturing groups in the regex
|re_obj.groupindex|	A dictionary mapping each symbolic group name defined by the (?P<name>) construct (if any) to the corresponding group number|
|re_obj.pattern|	The <regex> pattern that produced this object|

The code below demonstrates some uses of these attributes:

In [95]:
# i.e 1
re_obj = re.compile(r'(?m)(\w+),(\w+)', re.I)
re_obj.flags

42

In [96]:
re.I|re.M|re.UNICODE
# that the value of re_obj.
# flags is the logical OR of these three values, 
# which equals 42

re.IGNORECASE|re.UNICODE|re.MULTILINE

In [98]:
re_obj.groups

2

In [99]:
re_obj.pattern

'(?m)(\\w+),(\\w+)'

In [100]:
re_obj = re.compile(r'(?P<w1>),(?P<w2>)')
# ?P<write goup name > used for group name
# The value of the .groupindex attribute for the regular expression object defined
# is technically an object of type mappingproxy
# it functions like a dictionary.

In [101]:
re_obj.groupindex

mappingproxy({'w1': 1, 'w2': 2})

In [102]:
re_obj.groupindex['w1']

1

In [103]:
re_obj.groupindex['w2']

2

Note that `.flags` includes any flags `specified as arguments` to `re.compile()`, any specified within the `regex` with the `(?flags)` metacharacter sequence, and any that are in effect by default.

In the regular expression object defined on i.e1, there are three flags defined:

1. **re.I:** Specified as a `<flags>` value in the `re.compile()` call
2. **re.M:** Specified as `(?m)` within the regex
3. **re.UNICODE:** Enabled by default

## Match Object Methods and Attributes

As you’ve seen, most functions and methods in the `re` module return a match **object** when there’s a successful match. Because a match object is truthy, you can use it in a conditional:

In [104]:
m = re.search('bar', "foo.bar.baz")

In [105]:
m
# only matches given bar <regex>

<re.Match object; span=(4, 7), match='bar'>

In [108]:
bool(m)
# if matches occure bool function will returns True

True

In [109]:
if re.search('bar', 'foo.bar.baz'):
     print('Found a match')
        
# this is same as above 

Found a match


But match objects also `contain` quite a bit of handy `information about the match`.

You’ve already seen some of it`—the span= and match= `data that the interpreter shows when it `displays a match object`.

You can obtain much more from a match object using its methods and attributes.

## Match Object Methods

The table below summarizes the methods that are available for a match object match:


|Method|	Returns|
| ----------- | ----------- |
|`match.group()`|	The specified captured group or groups from match|
|`match.__getitem__()`|	A captured group from match|
|`match.groups()`|	All the captured groups from match|
|`match.groupdict()`|	A dictionary of named captured groups from match|
|`match.expand()`|	The result of performing backreference substitutions from match|
|`match.start()`|	The starting index of match|
|`match.end()`|	The ending index of match|
|`match.span()`|	Both the starting and ending indices of match as a tuple|

The following sections describe these methods in more detail.

## `match.group([<group1>, ...])`
Returns the specified captured group(s) from a match

For numbered groups, match.group(n) returns the nth group:

In [111]:
m = re.search(r'(\w+),(\w+),(\w+)', 'foo,bar,baz')

In [112]:
m.group(1)

'foo'

In [113]:
m.group(2)

'bar'

In [114]:
m.group(3)

'baz'

If you capture groups using `(?P<name><regex>)`, then `match.group(<name>)` returns the corresponding named group:

In [118]:
m = re.match(r'(?P<w1>\w+),(?P<w2>\w+),(?P<w3>\w+)','quux,corge,grault')


In [120]:
m.group('w1')

'quux'

In [121]:
m.group('w2')

'corge'

In [122]:
m.group('w3')

'grault'

With more than one argument, `.group()` returns a tuple of all the groups specified.

A given group can appear multiple times, and you can specify any captured groups in any order:

In [124]:
m = re.search(r'(\w+),(\w+),(\w+)','foo,bar,baz')

In [125]:
m.group(1, 3)

('foo', 'baz')

In [127]:
m.group(3, 3, 1, 1, 2, 2)
# we can call groups many time as we want

('baz', 'baz', 'foo', 'foo', 'bar', 'bar')

In [128]:
m = re.match(r'(?P<w1>\w+),(?P<w2>\w+),(?P<w3>\w+)','quux,corge,grault' )

In [129]:
m.group("w3", "w1", "w1", 'w2', 'w2')

('grault', 'quux', 'quux', 'corge', 'corge')

If you specify a group that’s out of range or nonexistent, then `.group()` raises an `IndexError` exception:

In [133]:
m = re.search(r'(\w+),(\w+),(\w+)','foo,bar,baz')
#m.group(4)
# there is no group 4 prent hence it will return error
# IndexError: no such group

In [134]:
m = re.match(r'(?P<w1>\w+),(?P<w2>\w+),(?P<w3>\w+)','quux,corge,grault' )

In [136]:
# m.group('foo')
# it will return error 

It’s possible for a regex in Python to match as a whole but to contain a group that `doesn’t participate in the match`. In that case, `.group()` returns None for the `nonparticipating group`. Consider this example:

In [147]:
m = re.search(r'(\w+),(\w+),(\w+)?', "foo,bar,")
# , is important because even blank will consider as a groupt 3 
# and that will show as the None
# `(?)` quantifier metacharacter
# follows the third group, though, 
# so that group is optional

In [148]:
print(m)

<re.Match object; span=(0, 8), match='foo,bar,'>


In [149]:
m.group(1,2)

('foo', 'bar')

In [143]:
m.group(1,2,3)

('foo', 'bar', None)

This regex matches, as you can see from the match object. The first two captured groups contain `'foo'` and '`bar'`, respectively.

A question mark `(?)` quantifier metacharacter follows the third group, though, so that `group is optional`. A match will occur if there’s a third sequence of word characters following the `second comma (,)` but also if there isn’t.

In this case, there isn’t. So there is match overall, but the third group doesn’t participate in it. As a result, `m.group(3)` is still defined and is a valid reference, but it returns None:

In [150]:
print(m.group(3))


None


It can also happen that a group participates in the overall match multiple times.

 If you call `.group()` for that group number, then it returns only the part of the search string that matched the last time.

The earlier matches aren’t accessible:

In [151]:
m = re.match(r'(\w{3},)+','foo,bar,baz,qux')

In [152]:
m

<re.Match object; span=(0, 12), match='foo,bar,baz,'>

In [153]:
m.group(1)

'baz,'

In this example, the full match is `foo,bar,baz,'`, as shown by the displayed match object. 

 Each of `'foo,', 'bar,', and 'baz,'` `matches what’s inside the group`, but `m.group(1)` returns only the `last match`, `'baz,'`.

If you call `.group()` with an argument of 0 or no argument at all, then it returns the entire match

In [156]:
m = re.search(r'(\w+),(\w+),(\w+)', 'foo,bar,baz')

In [157]:
m

<re.Match object; span=(0, 11), match='foo,bar,baz'>

In [158]:
m.group()

'foo,bar,baz'

In [159]:
# or
m.group(0)

'foo,bar,baz'

This is the same data the interpreter shows following match= when it displays the match object

## `match.__getitem__(<grp>)`

Returns a captured group from a match.

`match.__getitem__(<grp>)` is identical to `match.group(<grp>)` and returns the single group specified by `<grp>`:

In [160]:
m = re.search(r'(\w+),(\w+),(\w+)', 'foo,bar,baz')

In [161]:
m.group(2)

'bar'

In [162]:
m.__getitem__(2)
# it works as the same but 
# You probably wouldn’t directly, but you might indirectly

'bar'

If `.__getitem__()` simply replicates the functionality of `.group()`, then why would you use it? You probably wouldn’t directly, but you might indirectly. Read on to see why.

## A Brief Introduction to Magic Methods

`.__getitem__()` is one of a `collection of methods` in Python called `magic methods`. These are `special methods` that the `interpreter calls when a Python statement contains specific corresponding syntactical elements`.

Note: `Magic methods` are also referred to as `dunder methods` because of the `double underscore (__)` at the `beginning and end` of the method name.

Later in this series, there are several `tutorials` on `object-oriented programming`. You’ll learn much more about magic methods there.

The particular syntax that `.__getitem__() corresponds` to is `indexing with square brackets`. For any object `obj`, whenever you use the expression `obj[n]`, behind the scenes Python quietly translates it to a call to `.__getitem__()`. The following expressions are effectively equivalent:

`obj[n]`
`obj.__getitem__(n)`

The syntax `obj[n]` is only `meaningful if a .__getitem()__ method exists` for the `class or type` to which `obj belongs`. Exactly how Python interprets `obj[n]` will then depend on the implementation of `.__getitem__()` for that class.

## Back to Match Objects
As of Python version 3.6, the re module does implement `.__getitem__()` for match objects. The implementation is such that match`.__getitem__(n)` is the same as `match.group(n)`.

The result of all this is that, instead of calling `.group()` directly, you can access captured groups from a match object using square-bracket indexing syntax instead:

In [163]:
m= re.search(r'(\w+),(\w+),(\w+)','foo,bar,baz')

In [164]:
m.group(2)

'bar'

In [165]:
m.__getitem__(2)

'bar'

In [166]:
m[2]

'bar'

works with named captured groups

In [167]:
m = re.match(
    r'foo,(?P<w1>\w+),(?P<w2>\w+),qux',
    'foo,bar,baz,qux')
m.group('w2')

'baz'

In [169]:
m["w2"]

'baz'

This is something you could achieve by just calling `.group()` explicitly, but it’s a pretty shortcut notation nonetheless.

When a programming language provides alternate syntax that isn’t strictly necessary but allows for the expression of something in a cleaner, easier-to-read way, it’s called syntactic sugar. For a match object, `match[n]` is syntactic sugar for `match.group(n)`.

## `match.groups(default=None)`
Returns all captured groups from a match.



match.groups() returns a tuple of all captured groups:

In [171]:
m = re.search(r'(\w+),(\w+),(\w+)', 'foo,bar,baz')

In [172]:
m.groups()

('foo', 'bar', 'baz')

As you saw previously, when a group in a regex in Python doesn’t participate in the overall match,

`.group()` returns None for that group. By default, .groups() does likewise.

If you want `.groups()` to return something else in this situation, then you can use the default keyword argument:

In [178]:
m = re.search(r'(\w+),(\w+),(\w+)?', 'foo,bar,')
# ? it will gives taken as a optional for regex even it is nothing

In [175]:
m

<re.Match object; span=(0, 8), match='foo,bar,'>

In [176]:
print(m.group(3))

None


In [177]:

m.groups()

('foo', 'bar', None)

In [180]:
m.groups(default='---')
# it will gives --- for  none

('foo', 'bar', '---')

Here, the third `(\w+)` group doesn’t participate in the match because the question mark `(?) metacharacter makes it optional`,and the string `'foo,bar,'` `doesn’t contain a third sequence` of word characters.

By default, `m.groups()` returns `None` for the `third group`, as shown on `m.groups()` result. On result of `m.groups(default='---')` , you can see that specifying `default='---'` causes it to `return the string '---' instead`.

There isn’t any corresponding default keyword for `.group()`. It always `returns None` for `nonparticipating groups`.

## `match.groupdict(default=None)`

Returns a dictionary of named captured groups.

`match.groupdict()` returns a dictionary of all named groups captured with the `(?P<name><regex>)` metacharacter sequence. The dictionary keys are the group names and the dictionary values are the corresponding group values:



In [183]:
m = re.match(
    r'foo,(?P<w1>\w+),(?P<w2>\w+),qux',
    'foo,bar,baz,qux')

In [184]:
m.groupdict()
# it will shows dictionary with given group name 

{'w1': 'bar', 'w2': 'baz'}

In [185]:
m.groupdict()['w2']

'baz'

As with `.groups()`, for `.groupdict()` the default argument determines the return value for nonparticipating groups:

In [187]:
m = re.match(
    r'foo,(?P<w1>\w+),(?P<w2>\w+)?,qux',
    'foo,bar,,qux')
#<regex>? optionl

In [188]:
m.groupdict()


{'w1': 'bar', 'w2': None}

change none to '---' this character
<br>by using default='---' argument

In [190]:
m.groupdict(default='---')

{'w1': 'bar', 'w2': '---'}

Again, the final group `(?P<w2>\w+)` `doesn’t participate in the overall match because` of the `question mark (?) metacharacter`. By default, `m.groupdict()` returns `None` for this group, but you `can change` it with the `default argument`.

## `match.expand(<template>)`
    Performs backreference substitutions from a match.

`match.expand(<template>)` returns the string that results from performing `backreference` `substitution on <template> exactly as re.sub()` would do:

In [192]:
m = re.search(r'(\w+),(\w+),(\w+)','foo,bar,baz')

In [193]:
m

<re.Match object; span=(0, 11), match='foo,bar,baz'>

In [194]:
m.groups()

('foo', 'bar', 'baz')

In [195]:
m.expand(r'\2')

'bar'

In [197]:
m=re.search(r'(?P<num>\d+)','foo123qux')

In [198]:
m

<re.Match object; span=(3, 6), match='123'>

In [199]:
m.group(1)

'123'

In [202]:
m.expand(r'--- \g<num> ---')
# it works like re.sub
# the method for replacing <what prefix adding>\g<groupname><what sufix ading>
# in this case '--- '  adedd

'--- 123 ---'

This works for numeric backreferences

And also for named backreferences,

## `match.start([<grp>])`

## `match.end([<grp>])`

Return the starting and ending indices of the match.

`match.start()` returns the index in the search string where the match begins, 

`match.end()` returns the index immediately after where the match ends

In [203]:
s = 'foo123bar456baz'

In [204]:
m = re.search('\d+', s)
m

<re.Match object; span=(3, 6), match='123'>

In [205]:
m.start()
# it will show start index 

3

In [206]:
m.end()

6

When Python displays a `match object`, these are the values listed with the `span= keyword`, as shown on `line 4` above. They behave like `string-slicing` values, so if you use them to slice the original search string, then you should get the matching substring:

In [207]:
m

<re.Match object; span=(3, 6), match='123'>

In [208]:
s[m.start():m.end()]

'123'

In [209]:
s[3:6]

'123'

`match.start(<grp>)` and `match.end(<grp>)` return the starting and ending indices of the substring matched by `<grp>`, which may be a numbered or named group

In [210]:
s = 'foo123bar456baz'

In [218]:
m = re.search(r'(\d+)\D*(?P<num>\d+)', s)
#<regex>* min zero
# group 2 are created 
# second group name 'num'

In [212]:
m.group(1)

'123'

In [213]:
m.start(1), m.end(1)

(3, 6)

In [214]:
s[m.start(1):m.end(1)]

'123'

In [215]:
m.group('num')

'456'

In [216]:
m.start('num'), m.end('num')

(9, 12)

In [217]:
s[m.start('num'):m.end('num')]

'456'

If the specified group matches a null string, then `.start()` and `.end()` are equal

In [221]:
m = re.search('foo(\d*)bar', 'foobar')
# <regex>* zero or more

In [223]:
m

<re.Match object; span=(0, 6), match='foobar'>

In [230]:
m[1]
# 1 group 1

''

In [222]:
m.start(1), m.end(1)


(3, 3)

This makes sense if you remember that `.start()` and `.end()` act like `slicing indices`. Any string slice where the beginning and ending indices are equal will always be an `empty string`.

A `special case occurs` when the regex `contains a group` that doesn’t participate in the match:

In [232]:
m = re.search(r'(\w+),(\w+),(\w+)?',"foo,bar,")

In [233]:
m

<re.Match object; span=(0, 8), match='foo,bar,'>

In [234]:
print(m.group(3))

None


In [235]:
m.start(3), m.end(3)

(-1, -1)

As you’ve seen previously, in this case the third group doesn’t participate. `m.start(3)` and `m.end(3)` aren’t really `meaningful here, so they return -1`.

## `match.span([<grp>])`
Returns both the starting and ending indices of the match.

`match.span()` returns both the starting and ending indices of the match as a tuple. 

If you specified `<grp>`, then the return tuple applies to the given group:

In [236]:
s = 'foo123bar456baz'

In [237]:

 m = re.search(r'(\d+)\D*(?P<num>\d+)', s)


In [238]:
m

<re.Match object; span=(3, 12), match='123bar456'>

In [239]:
m[0]

'123bar456'

In [240]:
m.span()

(3, 12)

In [241]:
m['num']

'456'

In [242]:
m.span('num')

(9, 12)

The following are effectively equivalent:

* `match.span(<grp>)`
* `(match.start(<grp>), match.end(<grp>))`

`match.span()` just provides a convenient way to obtain both `match.start()` and `match.end()` in one method call

## Match Object Attributes

Like a compiled regular expression object, a match object also has several useful attributes available

|Attribute|	Meaning|
|---------|--------|
|match.pos
match.endpos|	The effective values of the <pos> and <endpos> arguments for the match|
|match.lastindex|	The index of the last captured group|
|match.lastgroup|	The name of the last captured group|
|match.re|	The compiled regular expression object for the match|
|match.string|	The search string for the match|

The following sections provide more detail on these match object attributes.

## `match.pos`

## `.endpos`

Contain the effective values of `<pos>` and `<endpos>` for the search.

Remember that some methods, when invoked on a `compiled regex`, a`ccept optional <pos> and <endpos> arguments` that `limit the search to a portion` of the `specified search` string. These values are `accessible from the match object` with the `.pos` and `.endpos`attributes:

In [243]:
re_obj = re.compile(r'\d+')

In [246]:
m = re_obj.search('foo123bar',2, 7)
# 2 is the pos
# 7 is the endpos

In [247]:
m

<re.Match object; span=(3, 6), match='123'>

In [248]:
m.pos, m.endpos

(2, 7)

If the `<pos>` and `<endpos>` arguments aren’t included in the call, either because they were omitted or because the function in question doesn’t accept them, 

then the `.pos` and `.endpos` attributes effectively indicate the start and end of the string

In [249]:
re_obj = re.compile(r'\d+')

In [253]:
m = re_obj.search('foo123bar')
# The `re_obj.search()` call above on  could take `<pos>` and `<endpos>` arguments, 
# but they aren’t specified
m

<re.Match object; span=(3, 6), match='123'>

In [254]:
m.pos, m.endpos

(0, 9)

The `re_obj.search()` call above on `m= re_obj.search('foo123bar')` could take `<pos>` and `<endpos>` arguments, but they aren’t specified

The re.search() call on `m = re.search(r'\d+', 'foo123bar')` can’t take them at all. In either case, `m.pos` and `m.endpos` are 0 and 9,

## `match.lastindex`

Contains the index of the last captured group.

match.lastindex is equal to the integer index of the last captured group:

In [255]:
m = re.search(r'(\w+),(\w+),(\w+)', 'foo,bar,baz')

In [256]:
m.lastindex

3

In [257]:
m[m.lastindex]

'baz'

In cases where the regex contains potentially nonparticipating groups, this allows you to determine how many groups actually participated in the match:

In [258]:
m = re.search(r'(\w+),(\w+),(\w+)?', 'foo,bar,baz')

In [259]:
m.groups()

('foo', 'bar', 'baz')

In [260]:
m.lastindex, m[m.lastindex]


(3, 'baz')

In [261]:
m = re.search(r'(\w+),(\w+),(\w+)?', 'foo,bar,')
# third group is optional (?)

In [262]:
m.groups()

('foo', 'bar', None)

In [264]:
m.lastindex, m[m.lastindex]
# her 3 group not count as a last index

(2, 'bar')

In the first example, the third group, which is `optional because of the question mark (?) metacharacter`, does participate in the match. But in the `second example it doesn’t.` You can tell because `m.lastindex is 3` in the first case and `2 in the second`.

There’s a `subtle point` to be aware of regarding `.lastindex`. It i`sn’t always the case` that the last group `to match is also the last group` encountered syntactically.

In [265]:
m = re.match('((a)(b))', 'ab')

In [266]:
m.groups()

('ab', 'a', 'b')

In [267]:
m.lastindex

1

In [268]:
m[m.lastindex]

'ab'

The outermost group is `((a)(b))`, which matches `'ab'`. This is the first group the parser encounters, so it becomes `group 1`. But it’s also the last group to match, which is why `m.lastindex is 1`.

## match.lastgroup

Contains the name of the last captured group.

If the last captured group originates from the `(?P<name><regex>)` metacharacter sequence, then `match.lastgroup` returns the name of that group

In [269]:
s = 'foo123bar456baz'

In [270]:
m = re.search(r'(?P<n1>\d+)\D*(?P<n2>\d+)', s)

In [271]:
m.lastgroup

'n2'

`match.lastgroup` returns `None` if the last captured group `isn’t a named group`

In [272]:
s = 'foo123bar456baz'

In [279]:
m = re.search(r'(\d+)\D*(\d+)', s)
# \D*this will help for deteting middle not digit character (*)zero or more
m

<re.Match object; span=(3, 12), match='123bar456'>

In [280]:
m.groups()


('123', '456')

In [281]:
print(m.lastgroup)

None


In [282]:
m = re.search(r'\d+\D*\d+', s)
# paentheses removed 
# hence group not created

In [283]:
m.groups()
# groups not created

()

In [284]:
print(m.lastgroup)

None


As shown above, this can be either because the last captured group isn’t a named group or because there were no captured groups at all.

## match.re

Contains the regular expression object for the match.

`match.re` contains the regular expression object that produced the match. This is the same object you’d get if you passed the regex to `re.compile()`

In [285]:
regex = r'(\w+),(\w+),(\w+)'
# three groups created 

In [286]:
m1 = re.search(regex, 'foo,bar,baz')

In [287]:
m1

<re.Match object; span=(0, 11), match='foo,bar,baz'>

In [288]:
m1.re

re.compile(r'(\w+),(\w+),(\w+)', re.UNICODE)

In [289]:
re_obj = re.compile(regex)
# created object 

In [290]:
re_obj

re.compile(r'(\w+),(\w+),(\w+)', re.UNICODE)

In [292]:
re_obj is m1.re

True

In [293]:
m2 = re_obj.search('qux,quux,corge')

In [294]:
m2

<re.Match object; span=(0, 14), match='qux,quux,corge'>

Remember from earlier that the `re` module caches `regular expressions` after it `compiles` them, so they `don’t need to be recompiled` if used again.

For that reason, as the identity comparisons on `re_obj is m1.re` and `m2.re is re_obj is m1.re` show,

all the various regular expression objects in the above example are the exact same object.

Once you have access to the regular expression object for the match, all of that object’s attributes are available as well

In [295]:
m1.re.groups

3

In [296]:
m1.re.pattern

'(\\w+),(\\w+),(\\w+)'

In [297]:
m1.re.pattern == regex

True

In [298]:
m1.re.flags

32

You can also invoke any of the methods defined for a `compiled regular expression object`

In [299]:
m = re.search(r'(\w+),(\w+),(\w+)', 'foo,bar,baz')

In [300]:
m.re

re.compile(r'(\w+),(\w+),(\w+)', re.UNICODE)

In [301]:
m.re.match('quux,corge,grault')

<re.Match object; span=(0, 17), match='quux,corge,grault'>

Here, `.match()` is invoked on `m.re` to perform another search using the same regex but on a different search string

## `match.string`
Contains the search string for a match.

`match.string` contains the s`earch string` that is the `target of the match`

In [302]:
m = re.search(r'(\w+),(\w+),(\w+)', 'foo,bar,baz')

In [303]:
m

<re.Match object; span=(0, 11), match='foo,bar,baz'>

In [304]:
m.string

'foo,bar,baz'

In [306]:
re_obj = re.compile(r'(\w+),(\w+),(\w+)')

In [307]:
m = re_obj.search('foo,bar,baz')

In [308]:
m.string

'foo,bar,baz'

As you can see from the example, the `.string` attribute is available when the match object `derives from a compiled regular expression object` as well