A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.

<b>Regexes in Python and Their Uses</b><br>
Imagine you have a string object s. Now suppose you need to write Python code to find out whether s contains the substring '123'. There are at least a couple ways to do this. You could use the in operator:



In [1]:
s = 'foo123bar'
print('123' in s)

True


If you want to know not only whether '123' exists in s but also where it exists, then you can use .find() or .index(). Each of these returns the character position within s where the substring resides:

In [2]:
s = 'foo123bar'
print(s.find('123'))
print(s.index('123'))

3
3


In these examples, the matching is done by a straightforward character-by-character comparison. That will get the job done in many cases. But sometimes, the problem is more complicated than that.

For example, rather than searching for a fixed substring like '123', suppose you wanted to determine whether a string contains any three consecutive decimal digit characters, as in the strings 'foo123bar', 'foo456bar', '234baz', and 'qux678'.

<b>The re Module</b><br>
Regex functionality in Python resides in a module named re. The re module contains many useful functions and methods, most of which you’ll learn about in the next tutorial in this series.

RegEx Module
Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module:

In [3]:
import re

In [4]:
# Search the string to see if it starts with "The" and ends with "Spain":

txt = "Ali Huseynoglu bugun Bakiya gelir"
x = re.search("^Ali.*gelir$", txt)

if x is not None:
  print("YES! We have a match!")
else:
  print("No match")

YES! We have a match!


<b>RegEx Functions</b><br>
The re module offers a set of functions that allows us to search a string for a match:

findall -> Returns a list containing all matches


re.findall(regex, string) 

**returns a list** of all non-overlapping matches of regex in string. It scans the search string from left to right and returns all matches in the order found:

In [5]:
print(re.findall(r'\w+', '...Azərbaycan,,,,Ümid:%$salam//|'))

['Azərbaycan', 'Ümid', 'salam']


In [6]:
txt = "Əli Hüseynoğlu bugün Bakıya gəlir"
x = re.findall("Ba", txt)
print(x)

['Ba']


In [7]:
txt = "Əli Hüseynoğlu bugün Bakıya gəlir"
x = re.findall("Portugal", txt)
print(x)

[]


If we know that Ali uses the bus, we need to find which bus he uses

In [8]:
strings='Əli Hüseynoğlu bugün Bakıya 47 nömrəli avtobusla gəlir'
work=re.findall(r'\d{1,1}',strings)
print(work)


['4', '7']


In [9]:
strings='Əli Hüseynoğlu bugün Bakıya 47 nömrəli avtobusla gəlir'
work=re.findall(r'\d{1,2}',strings)
print(work)


['47']


In [10]:
name=re.findall(r'[a-z]*',strings)
print(name)

['', 'li', '', '', '', 'seyno', '', 'lu', '', 'bug', '', 'n', '', '', 'ak', '', 'ya', '', '', '', '', 'n', '', 'mr', '', 'li', '', 'avtobusla', '', 'g', '', 'lir', '']


In [11]:
name=re.findall(r'[AaBbCcÇçDdEeƏəFfGgĞğHhXxIıİiJjKkQqLlMmNnOoÖöPpRrSsŞşTtUuÜüVvYyZz]*',strings)
print(name)

['Əli', '', 'Hüseynoğlu', '', 'bugün', '', 'Bakıya', '', '', '', '', 'nömrəli', '', 'avtobusla', '', 'gəlir', '']


In [12]:
s='Əli Hüseynoğlu bugün Bakıya 47 nömrəli avtobusla gəlir'
re.findall( '\w+', s )

['Əli', 'Hüseynoğlu', 'bugün', 'Bakıya', '47', 'nömrəli', 'avtobusla', 'gəlir']

In [13]:
# look for every word in a string
pattern = re.compile(r"\w+")
result = pattern.findall("Salam, Necəsən")
print (result)


['Salam', 'Necəsən']


In [14]:

patt = re.compile(r"[Salam]*Necəsən")
res = patt.findall("SalamSalamSalamNecəsən")
print (res)



['SalamSalamSalamNecəsən']


In [15]:

patt = re.compile(r"Salam*Necəsən")
res = patt.findall("SalamSalamSalammmNecəsən")
print (res)



['SalammmNecəsən']


#### Regular Expressions ####
     ->Regular expressions are patterns that specify a matching rule.
     ->Generally contain a mix of text and special characters.
     ->Provides regular expression pattern matching and replacement.
     ## Regular expression pattern rules ##
        text    -Match literal text
        .       -Match any character except newline
        ^       -Match the start of a string(^ kerat-sign)
        $       -Match the end of a string
        *       -Match 0 or more repetitions
        +       -Match 1 or more repetitions
        ?       -Match 0 or 1 repetitions
        *?      -Match 0 or more, few as possible
        +?      -Match 1 or more, few as possible
        {m,n}   -Match m to n repetitions
        {m,n}?  -Match m to n repetitions, few as possible
        [...]   -Match a set of characters(range or "variance",[A-Z])
        [^...]  -Match characters not in set
        A | B   -Match A or B( | bar)
        (...)   -Match regex in parenthesis as a group
      ## Special characters ## (Identifiers:)
      \number   -Matches text matched by previous group
      \A        -Matches start of string
      \b        -Matches empty string at beginning or end of word
      \B        -Matches empty string not at begin or end of word
      \d        -Matches any decimal digit
      \D        -Matches any non-digit
      \s        -Matches any whitespace
      \S        -Matches any non-whitespace
      \w        -Matches any alphanumeric character
      \W        -Matches characters not in \w
      \Z        -Match at end of string.
      \\        -Literal backslash
      ## White Space characters: ##
      \n        -new line
      \s        -space
      \t        -tab
      \e        -escape
      \f        -form feed
      \r        -return
      ## Regular Expressions Objects ##
        re.search('patten','string')                 # Search for a match
        re.match(s [,pos [,endpos]])                 # Check string for match
        re.split(s)                                  # Split on a regex match
        re.findall(s)                                # Find all matches
        re.sub(repl,s)                               # Replace all matches with repl
## What we need
1- import re
2- re.findall( r' here match ', string)             # findall method take two arg(patten,string)


re.finditer(regex, string, flags=0)

Returns an iterator that yields regex matches.

re.finditer(regex, string) scans string for non-overlapping matches of regex and returns an iterator that yields the match objects from any it finds. It scans the search string from left to right and returns matches in the order it finds them:

In [37]:
it = re.finditer(r'\w+', '...foo,,,,bar:%$baz//|')
print(next(it))
print(next(it))
print(next(it))


<re.Match object; span=(3, 6), match='foo'>
<re.Match object; span=(10, 13), match='bar'>
<re.Match object; span=(16, 19), match='baz'>


In [18]:
for i in re.finditer(r'\w+', '...foo,,,,bar:%$baz//|'):
    print(i)

<re.Match object; span=(3, 6), match='foo'>
<re.Match object; span=(10, 13), match='bar'>
<re.Match object; span=(16, 19), match='baz'>


re.findall() and re.finditer() are very similar, but they differ in two respects:

re.findall() returns a list, whereas re.finditer() returns an iterator.

The items in the list that re.findall() returns are the actual matching strings, whereas the items yielded by the iterator that re.finditer() returns are match objects.

<b>search -></b> Returns a Match object if there is a match anywhere in the string

re.search(regex, string)

Scans a string for a regex match.

re.search(regex,string) scans string looking for the first location where the pattern regex matches. If a match is found, then re.search() returns a match object. Otherwise, it returns None.

re.search() takes an optional third flags argument

The search() function searches the string for a match, and returns a Match object if there is a match.

If there is more than one match, only the first occurrence of the match will be returned:

In [19]:
#Search for the first white-space character in the string:

txt = "Bu gün hava istidi"
x = re.search("\s", txt)
print(x)
print("The first white-space character is located in position:", x.start())

<re.Match object; span=(2, 3), match=' '>
The first white-space character is located in position: 2


In [21]:
print(txt[2:3])

 


In [20]:
print(x.start(),x.end())

2 3


In [22]:
print(txt[x.start():x.end()])

 


In [23]:
print(x)
print(x.re.pattern)
print(x.string)
print(x.start())
print(x.end())

<re.Match object; span=(2, 3), match=' '>
\s
Bu gün hava istidi
2
3


A match object is truthy, so you can use it in a Boolean context like a conditional statement:

In [24]:
if re.search('123', txt):
    print('Found a match.')
else:
    print('No match.')

No match.


In [25]:
# If no matches are found, the value None is returned:

#Example
# Make a search that returns no match:

txt = "Bu gün hava istidi"
x = re.search("soyuq", txt)
print(x)

None


Python Regex Metacharacters<br>
The real power of regex matching in Python emerges when <regex> contains special characters called metacharacters. These have a unique meaning to the regex matching engine and vastly enhance the capability of the search.

Consider again the problem of how to determine whether a string contains any three consecutive decimal digit characters.

In [26]:
s = 'Bu gün hava 234 derecedi :D'
search = re.search('[0-9][0-9][0-9]', s)

print(search)



<re.Match object; span=(12, 15), match='234'>


[0-9] matches any single decimal digit character—any character between '0' and '9', inclusive. The full expression [0-9][0-9][0-9] matches any sequence of three decimal digit characters. In this case, s matches because it contains three consecutive decimal digit characters, '123'.

In [27]:
print(re.search('[0-9][0-9][0-9]', 'qux678'))
print(re.search('[0-9][0-9][0-9]', 'foo456bar'))

<re.Match object; span=(3, 6), match='678'>
<re.Match object; span=(3, 6), match='456'>


In [28]:
print(re.search('[0-9][0-9][0-9]', '12foo34'))

None


Take a look at another regex metacharacter. The dot (.) metacharacter matches any character except a newline, so it functions like a wildcard:

In [29]:
s = 'foo123bar'
print(re.search('1.3', s))

s = 'foo13bar'
print(re.search('1.3', s))

print(re.search('foo.bar', 'foobar'))
print(re.search('foo.bar', 'foo\nbar'))

<re.Match object; span=(3, 6), match='123'>
None
None
None


In the first example, the regex 1.3 matches '123' because the '1' and '3' match literally, and the . matches the '2'. Here, you’re essentially asking, “Does s contain a '1', then any character (except a newline), then a '3'?” The answer is yes for 'foo123bar' but no for 'foo13bar'.

[]

Specifies a specific set of characters to match.

Characters contained in square brackets ([]) represent a character class—an enumerated set of characters to match from. A character class metacharacter sequence will match any single character contained in the class.

In [30]:
print(re.search('Cava[db]', 'Cavad'))
print(re.search('Cava[db]', 'Cavab'))

print(re.search('Cava[db]', 'Cavaz'))

<re.Match object; span=(0, 5), match='Cavad'>
<re.Match object; span=(0, 5), match='Cavab'>
None


The metacharacter sequence [artz] matches any single 'a', 'r', 't', or 'z' character. In the example, the regex ba[artz] matches both 'bar' and 'baz' (and would also match 'baa' and 'bat').

A character class can also contain a range of characters separated by a hyphen (-), in which case it matches any single character within the range. For example, [a-z] matches any lowercase alphabetic character between 'a' and 'z', inclusive:

In [31]:
print(re.search('[a-z]', 'CAVad'))

<re.Match object; span=(3, 4), match='a'>


In [40]:
print(re.search('[0-9][0-9]', 'Cavad17il'))

<re.Match object; span=(5, 7), match='17'>


You can complement a character class by specifying ^ as the first character, in which case it matches any character that isn’t in the set. In the following example, [^0-9] matches any character that isn’t a digit:



In [33]:
print(re.search('[^0-9]', 'Cavad17il'))

<re.Match object; span=(0, 1), match='C'>


Here, the match object indicates that the first character in the string that isn’t a digit is 'f'.

\w


Match based on whether a character is a word character.

\w matches any alphanumeric word character. Word characters are uppercase and lowercase letters, digits, and the underscore (_) character, so \w is essentially shorthand for [a-zA-Z0-9_]:

In [44]:
print(re.search('\w', '#(.a$@& '))

<re.Match object; span=(3, 4), match='a'>


In [35]:
print(re.search('[a-zA-Z0-9_]', '#(.a$@&'))

<re.Match object; span=(3, 4), match='a'>


\W is the opposite. It matches any non-word character and is equivalent to [^a-zA-Z0-9_]:

In [43]:
print(re.search('\W', 'a_1*3Qb'))

<re.Match object; span=(1, 2), match=' '>


In [42]:
print(re.search('[^a-zA-Z0-9_]', 'a_1*3Qb'))

<re.Match object; span=(3, 4), match='*'>


\d

Match based on whether a character is a decimal digit.

\d matches any decimal digit character. \D is the opposite. It matches any character that isn’t a decimal digit:

In [47]:
print(re.search('\d+', 'abc48685def'))

<re.Match object; span=(3, 8), match='48685'>


In [None]:
print(re.search('\D', '234Q678'))

\d is essentially equivalent to [0-9], and \D is equivalent to [^0-9].

\s matches any whitespace character:

In [49]:
print(re.search('\s', 'Cavad Muho Ela'))

<re.Match object; span=(5, 6), match=' '>


\S is the opposite of \s. It matches any character that isn’t whitespace:

In [48]:
print(re.search('\S', '   foo    '))


<re.Match object; span=(3, 4), match='f'>


The character class sequences \w, \W, \d, \D, \s, and \S can appear inside a square bracket character class as well:

In [50]:
print(re.search('[\d\w\s]', '---3---'))
print(re.search('[\d\w\s]', '---a---'))
print(re.search('[\d\w\s]', '--- ---'))

<re.Match object; span=(3, 4), match='3'>
<re.Match object; span=(3, 4), match='a'>
<re.Match object; span=(3, 4), match=' '>


backslash (\)

Removes the special meaning of a metacharacter.

In [51]:
print(re.search('.', 'foo.bar'))
print(re.search('\.', 'foo.bar'))
# equivalent
print(re.search(r'.', 'foo.bar'))


<re.Match object; span=(0, 1), match='f'>
<re.Match object; span=(3, 4), match='.'>
<re.Match object; span=(0, 1), match='f'>


In the regex on line 1, the dot (.) functions as a wildcard metacharacter, which matches the first character in the string ('f'). The . character in the regex on line 4 is escaped by a backslash, so it isn’t a wildcard. It’s interpreted literally and matches the '.' at index 3 of the search string.

<a>Anchors</a><br>
Anchors are zero-width matches. They don’t match any actual characters in the search string, and they don’t consume any of the search string during parsing. Instead, an anchor dictates a particular location in the search string where a match must occur.

^
\A

Anchor a match to the start of <string>.

When the regex parser encounters ^ or \A, the parser’s current position must be at the beginning of the search string for it to find a match.

In other words, regex ^foo stipulates that 'foo' must be present not just any old place in the search string, but at the beginning:

In [53]:
print(re.search('^foo', 'foobar'))
print(re.search('^foo', 'barfoo'))

<re.Match object; span=(0, 3), match='foo'>
None


In [54]:
print(re.search('\Afoo', 'foobar'))
print(re.search('\Afoo', 'barfoo'))

<re.Match object; span=(0, 3), match='foo'>
None


$
\Z

Anchor a match to the end of <string>.

When the regex parser encounters \$ or \Z, the parser’s current position must be at the end of the search string for it to find a match. Whatever precedes $ or \Z must constitute the end of the search string:



In [55]:
print(re.search('bar$', 'foobar'))
print(re.search('bar$', 'barfoo'))
print(re.search('bar\Z', 'foobar'))
print(re.search('bar\Z', 'barfoo'))

<re.Match object; span=(3, 6), match='bar'>
None
<re.Match object; span=(3, 6), match='bar'>
None


In [56]:
print(re.search('\b', '\bar')) 


<re.Match object; span=(0, 1), match='\x08'>


\b

Anchors a match to a word boundary.

\b asserts that the regex parser’s current position must be at the beginning or end of a word. A word consists of a sequence of alphanumeric characters or underscores ([a-zA-Z0-9_]), the same as for the \w character class:

In [57]:
print(re.search('\\bbar', 'foo bar')) 
print(re.search(r'\bbar', 'foo bar'))# same
# \b = ASCII Backspace (BS)
print(re.search(r'\bbar', 'foo.bar'))
print(re.search(r'\bbar', 'foobar'))
print(re.search(r'foo\b', 'foo bar'))
print(re.search(r'foo\b', 'foo.bar'))
print(re.search(r'foo\b', 'foobar'))

<re.Match object; span=(4, 7), match='bar'>
<re.Match object; span=(4, 7), match='bar'>
<re.Match object; span=(4, 7), match='bar'>
None
<re.Match object; span=(0, 3), match='foo'>
<re.Match object; span=(0, 3), match='foo'>
None


In [61]:
print(re.search(r'\bbar\b', 'foo bar baz'))
print(re.search(r'\bbar\b', 'foo(bar)baz'))
print((re.search(r'\bbar\b', 'barfoobaz')))

<re.Match object; span=(4, 7), match='bar'>
<re.Match object; span=(4, 7), match='bar'>
None


\B

Anchors a match to a location that isn’t a word boundary.

\B does the opposite of \b. It asserts that the regex parser’s current position must not be at the start or end of a word:

In [59]:
print(re.search(r'\Bfoo\B', 'foo bar baz'))
print(re.search(r'\Bfoo\B', 'foo(bar)baz'))
print(re.search(r'\Bfoo\B', 'barfoobaz'))

None
None
<re.Match object; span=(3, 6), match='foo'>


<a>Quantifiers</a><br>
A quantifier metacharacter immediately follows a portion of a <regex> and indicates how many times that portion must occur for the match to succeed.

*

Matches zero or more repetitions of the preceding regex.

For example, a* matches zero or more 'a' characters. That means it would match an empty string, 'a', 'aa', 'aaa', and so on.

In [65]:
print(re.search('foo-*bar', 'foobar'))
print(re.search('foo-*bar', 'foo-bar'))
print(re.search('foo-*bar', 'foo--bar'))

<re.Match object; span=(0, 6), match='foobar'>
<re.Match object; span=(0, 7), match='foo-bar'>
<re.Match object; span=(0, 8), match='foo--bar'>


On line 1, there are zero '-' characters between 'foo' and 'bar'. On line 3 there’s one, and on line 5 there are two. The metacharacter sequence -* matches in all three cases.

You’ll probably encounter the regex .* in a Python program at some point. This matches zero or more occurrences of any character. In other words, it essentially matches any character sequence up to a line break. (Remember that the . wildcard metacharacter doesn’t match a newline.)

In [63]:
# In this example, .* matches everything between 'foo' and 'bar':

print(re.search('foo.*bar', '# foo $qux@grault % bar #'))

<re.Match object; span=(2, 23), match='foo $qux@grault % bar'>


+

Matches one or more repetitions of the preceding regex.

This is similar to *, but the quantified regex must occur at least once:

In [66]:
print(re.search('foo-+bar', 'foobar'))
print(re.search('foo-+bar', 'foo-bar'))
print(re.search('foo-+bar', 'foo--bar'))

None
<re.Match object; span=(0, 7), match='foo-bar'>
<re.Match object; span=(0, 8), match='foo--bar'>


?

Matches zero or one repetitions of the preceding regex.

Again, this is similar to * and +, but in this case there’s only a match if the preceding regex occurs once or not at all:

In [67]:
print(re.search('foo-?bar', 'foobar'))
print(re.search('foo-?bar', 'foo-bar'))
print(re.search('foo-?bar', 'foo--bar'))

<re.Match object; span=(0, 6), match='foobar'>
<re.Match object; span=(0, 7), match='foo-bar'>
None


Here are some more examples showing the use of all three quantifier metacharacters:

In [68]:
print(re.match('foo[1-9]*bar', 'foobar'))
print(re.match('foo[1-9]*bar', 'foo42bar'))
print(re.match('foo[1-9]+bar', 'foobar'))
print(re.match('foo[1-9]+bar', 'foo42bar'))
print(re.match('foo[1-9]?bar', 'foobar'))
print(re.match('foo[1-9]?bar', 'foo42bar'))

<re.Match object; span=(0, 6), match='foobar'>
<re.Match object; span=(0, 8), match='foo42bar'>
None
<re.Match object; span=(0, 8), match='foo42bar'>
<re.Match object; span=(0, 6), match='foobar'>
None


In [69]:
print("White Spcae char ")
this=' this is \n \t  ood'
print(re.search('\n',this ))
print(re.search('\t',this ))

White Spcae char 
<re.Match object; span=(9, 10), match='\n'>
<re.Match object; span=(11, 12), match='\t'>


split -> Returns a list where the string has been split at each match

re.split(regex, string, maxsplit=0, flags=0)

Splits a string into substrings.

re.split(regex, string) splits string into substrings using regex as the delimiter and returns the substrings as a list.

The following example splits the specified string into substrings delimited by a comma (,), semicolon (;), or slash (/) character, surrounded by any amount of whitespace:

In [70]:
print(re.split('\s*[,;/]\s*', 'foo,bar  ;  baz / qux'))

['foo', 'bar', 'baz', 'qux']


In [71]:
# The split() function returns a list where the string has been split at each match:

# Split at each white-space character:

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

['The', 'rain', 'in', 'Spain']


In [72]:
# You can control the number of occurrences by specifying the maxsplit parameter:

# Split the string only at the first occurrence:

txt = "The rain in Spain"
x = re.split("\s", txt, 2)
print(x)

['The', 'rain', 'in Spain']


In [73]:
# first lets split a string into words
pattern = re.compile(r"\W")  # Match non alphanumeric xters.
result = pattern.split("hello world")
print(result)  # will split using the whitespace match.


['hello', 'world']


In [74]:

# specifying the maxsplit parameter
# split a max of 2 times then return the rest of the string too.
print(pattern.split("I have a dream in code", 2))



['I', 'have', 'a dream in code']


In [77]:
# To capture the pattern too, we use groups
p = re.compile(r"(-)")
print(p.split("hello-world"))  # - character is also split :-)

['hello', '-', 'world']


sub	-> Replaces one or many matches with a string

re.sub(regex, repl, string) finds the leftmost non-overlapping occurrences of regex in string, replaces each match as indicated by repl, and returns the result. string remains unchanged.

repl can be either a string or a function, as explained below.

f rep> is a string, then re.sub() inserts it into string in place of any sequences that match regex:

In [78]:
s = 'foo.123.bar.789.baz'
print(re.sub(r'\d+', '#', s))
print(re.sub('[a-z]+', '(*)', s))

foo.#.bar.#.baz
(*).123.(*).789.(*)


In [79]:
# The sub() function replaces the matches with the text of your choice:

# Replace every white-space character with the number 9:

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

The9rain9in9Spain


In [None]:
# You can control the number of replacements by specifying the count parameter:

# Replace the first 2 occurrences:

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

In [None]:
# Replace whitespace
s = re.sub( r'\s', '-', 'whitespace by hyphen' )
print(s)
# Replce multiple occurences of whitespace
print(re.sub( r'\s{2,}', '', s ))

In [None]:
pattern = re.compile(r"[0-9]+")
result = pattern.sub("-", "there is only 1 thing 2 do")
print (result)

# this example replaces matched 00 with - to
# resulting in 1--0
print (re.sub('00', '-', '100000'))

Match Object<br>
A Match Object is an object containing information about the search and the result.

Note: If there is no match, the value None will be returned, instead of the Match Object.

The function returns a match object if it finds a match and None otherwise.

re.match(regex, string, flags=0)

Looks for a regex match at the beginning of a string.

This is identical to re.search(), except that re.search() returns a match if regex matches anywhere in string, whereas re.match() returns a match only if regex matches at the beginning of string:

In [87]:
print(re.search(r'\d+', '123foobar'))
print(re.search(r'\d+', 'foo123bar'))
print(re.match(r'\d+', '123foobar'))
print(re.match(r'\d+', 'foo123bar'))

<re.Match object; span=(0, 3), match='123'>
<re.Match object; span=(3, 6), match='123'>
<re.Match object; span=(0, 3), match='123'>
None


In [88]:
print(re.match("you" , "you are not good."))
print(re.match("are" , "you are not good."))
print(re.match("good.", "you are not good."))


<re.Match object; span=(0, 3), match='you'>
None
None


The Match object has properties and methods used to retrieve information about the search, and the result:

.span() returns a tuple containing the start-, and end positions of the match.<br>
.string returns the string passed into the function<br>
.group() returns the part of the string where there was a match


In [80]:
re.match(r'\d+', '123foobar').span()

(0, 3)

In [81]:
re.match(r'\d+', '123foobar').string

'123foobar'

In [82]:
re.match(r'\d+', '123foobar').group()

'123'

In [83]:
# Print the position (start- and end-position) of the first match occurrence.

# The regular expression looks for any words that starts with an upper case "ə":

txt = "Bu gün hava əladı"
x = re.search(r"\bə\w+", txt)
print(x.span())

(12, 17)


In [84]:
# Print the string passed into the function:

txt = "Bu gün hava əladı"
x = re.search(r"\bə\w+", txt)
print(x.string)

Bu gün hava əladı


Pattern Rules

In [89]:
print(" pattern rules")

print(re.findall(r'xy*' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'xy+' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'.' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'^t' , 'this is string'))
print(re.findall(r'g$' , 'this is string'))
print(re.findall(r'xy?' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'yy' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'x{2,}' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'xy{1,3}' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'[a-x]' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'[^x]' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'[x^]' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'y|x' , 'xyyxxxxxyyyyxxxxxyy'))
print(re.findall(r'(yxx)' , 'xyyxxxxxyyyyxxxxxyy'))

 pattern rules
['xyy', 'x', 'x', 'x', 'x', 'xyyyy', 'x', 'x', 'x', 'x', 'xyy']
['xyy', 'xyyyy', 'xyy']
['x', 'y', 'y', 'x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'x', 'x', 'x', 'x', 'x', 'y', 'y']
['t']
['g']
['xy', 'x', 'x', 'x', 'x', 'xy', 'x', 'x', 'x', 'x', 'xy']
['yy', 'yy', 'yy', 'yy']
['xxxxx', 'xxxxx']
['xyy', 'xyyy', 'xyy']
['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x']
['y', 'y', 'y', 'y', 'y', 'y', 'y', 'y']
['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x']
['x', 'y', 'y', 'x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'x', 'x', 'x', 'x', 'x', 'y', 'y']
['yxx', 'yxx']


In [86]:
print('Special char')
strings='you are 47 -21 is 4.56 ab453457 not good. or'
print(re.findall(r'\Ayou',strings))
print(re.findall(r'\b',strings))
print(re.findall(r'\B',strings))
print(re.findall(r'\d',strings))
print(re.findall(r'\D',strings))
print(re.findall(r'\s',strings))
print(re.findall(r'\S',strings))
print(re.findall(r'\w',strings))
print(re.findall(r'\W',strings))
print(re.findall(r'or\Z',strings))

Special char
['you']
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
['4', '7', '2', '1', '4', '5', '6', '4', '5', '3', '4', '5', '7']
['y', 'o', 'u', ' ', 'a', 'r', 'e', ' ', ' ', '-', ' ', 'i', 's', ' ', '.', ' ', 'a', 'b', ' ', 'n', 'o', 't', ' ', 'g', 'o', 'o', 'd', '.', ' ', 'o', 'r']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
['y', 'o', 'u', 'a', 'r', 'e', '4', '7', '-', '2', '1', 'i', 's', '4', '.', '5', '6', 'a', 'b', '4', '5', '3', '4', '5', '7', 'n', 'o', 't', 'g', 'o', 'o', 'd', '.', 'o', 'r']
['y', 'o', 'u', 'a', 'r', 'e', '4', '7', '2', '1', 'i', 's', '4', '5', '6', 'a', 'b', '4', '5', '3', '4', '5', '7', 'n', 'o', 't', 'g', 'o', 'o', 'd', 'o', 'r']
[' ', ' ', ' ', '-', ' ', ' ', '.', ' ', ' ', ' ', '.', ' ']
['or']
