*Regular Expression Module*

Regular expressions, often abbreviated as regex or regexp, provide a powerful and flexible way to search, match, and manipulate strings in Python. The `re` module in Python enables the use of regular expressions, allowing developers to perform complex pattern-matching operations with ease. Whether it's validating user input, extracting specific information from a text, or replacing patterns within strings, regular expressions play a crucial role in text processing tasks. Understanding and utilizing the `re` module can significantly enhance a programmer's ability to work with textual data efficiently.

In [1]:
import re

str = "Python Course: Python Programming for Beginners | 40 Hours" # This string value is used for the following examples

In [2]:
# re.findall() method: returns a list containing all matches

result = re.findall("Python", str)

# re.split() method: returns a list where the string has been split at each match

result = re.split(" ", str) # \s is used for whitespace

# re.sub() method: replaces one or many matches with a string

result = re.sub("Python", "Java", str)

#re.search() method: returns a Match object if there is a match anywhere in the string

result = re.search("Python", str)
# result = result.span()
# result = result.start()
# result = result.end()
# result = result.group()
result = result.string

print(result)

Python Course: Python Programming for Beginners | 40 Hours


In [18]:
"""

    [] - All characters written between square brackets

         [abc] => a      : 1 match
                  ac     : 2 match 
                  Python : No matches

         [a-e]  => [abcde]
         [1-5]  => [12345]
         [0-39] => [01239]   

         [^abc] => characters except abc
         [^0-9] => non-digit characters

"""

result = re.findall("[abc]", str)
result = re.findall("[a-e]", str)
result = re.findall("[a-z]", str)
result = re.findall("[0-5]", str)
result = re.findall("[^python]", str)
result = re.findall("[^0-9]", str)

print(result)

['P', 'y', 't', 'h', 'o', 'n', ' ', 'C', 'o', 'u', 'r', 's', 'e', ':', ' ', 'P', 'y', 't', 'h', 'o', 'n', ' ', 'P', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g', ' ', 'f', 'o', 'r', ' ', 'B', 'e', 'g', 'i', 'n', 'n', 'e', 'r', 's', ' ', '|', ' ', ' ', 'H', 'o', 'u', 'r', 's']


In [19]:
"""
    . - Any single character except newline character

        .. => a    : No match
              ab   : 1 match
              abc  : 1 match
              abcd : 2 matches

    
"""

result = re.findall(".", str) # It finds any character in the string
result = re.findall("..", str) # It finds any two characters in the string
result = re.findall("Py..on", str) 

print(result)

['Python', 'Python']


In [20]:
"""
    ^ - Does the specified string start with characters? 

    ^a => a:    1 match
          abc:  1 match
          bac:  No match

"""

result = re.findall("^P", str)

print(result)

['P']


In [22]:
"""
    $ - Does it end with the specified character?

    a$ => a      : 1 match
          lamba  : 1 match
          Python : No match 

"""

result = re.findall("s$", str)
result = re.findall("Hours$", str)

print(result)

['Hours']


In [None]:
"""
     * - zero or more

         ma*n => mn     : 1 match
                 man    : 1 match
                 maaan  : 1 match
                 main   : No match ('n' is not after 'a')
"""

# It finds the words that start with 'ma' and end with 'n'

"""
    ? - It checks whether a character

        ma+n => mn     : No match
                man    : 1 match
                maaan  : 1 match
                main   : No match (a' nın arkasına n gelmiyor.) 
"""

# It finds the words that start with 'ma' and end with 'n' but 'a' is optional

In [25]:
"""
    {} - It checks the number of characters.

        al{2}   => 'l' character must be repeated twice after 'a' character.
        al{2,3} => a karakterinin arkasına l karakteri en az 2 en fazla 3 kez tekrarlamalı.
        [0-9]{2,4} => en az 2 en çok 4 basamaklı sayılar.
"""

result = re.findall("m{2}", str)
result = re.findall("[0-9]{2}", str)

print(result)

['40']


In [None]:
"""
    | - one of the alternative options must be true.

        a|b => a ya da b

            cde =>    no match
            ade =>    1 match
            acdbea => 3 match 
"""

In [None]:
"""
    () - It is used to group.

         (a|b|c)xz => a,b,c karakterlerinin arkasına xz gelmelidir.
"""

In [None]:
"""  
     \A - Returns a match if the specified characters are at the beginning of the string
     \b - Returns a match where the specified characters are at the beginning or at the end of a word
     \B - Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
     \d - Returns a match where the string contains digits (numbers from 0-9)
     \D - Returns a match where the string DOES NOT contain digits
     \s - Returns a match where the string contains a white space character
     \S - Returns a match where the string DOES NOT contain a white space character
     \w - Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)
     \W - Returns a match where the string DOES NOT contain any word characters
    
"""