# Python

---

# Regular Expressions

Regular expressions are a powerful tool for various kinds of string manipulation.
They are a domain specific language (DSL) that is present as a library in most modern programming languages, not just Python.
They are useful for two main tasks:
- verifying that strings match a pattern (for instance, that a string has the format of an email address),
- performing substitutions in a string (such as changing all American spellings to British ones).


Regular expressions in Python can be accessed using the re module, which is part of the standard library.
After you've defined a regular expression, the re.match function can be used to determine whether it matches at the beginning of a string.
If it does, match returns an object representing the match, if not, it returns None.
To avoid any confusion while working with regular expressions, we would use raw strings as r"expression".
Raw strings don't escape anything, which makes use of regular expressions easier.

In [2]:
import re
word = input()

pattern = r"gl"

if re.match(pattern, word):
    print("Match")
else:
    print("No match")

Match


Other functions to match patterns are re.search and re.findall.

The function re.search finds a match of a pattern anywhere in the string.

The function re.findall returns a list of all substrings that match a pattern.

In [4]:
import re

pattern = r"spam"

if re.match(pattern, "eggspamsausagespam"):
    print("Match")
else:
    print("No match")

if re.search(pattern, "eggspamsausagespam"):
    print("Match")
else:
    print("No match")

print(re.findall(pattern, "eggspamsausagespam"))

No match
Match
['spam', 'spam']


In [7]:
import re

quote = "Always do your best. Your best is going to change from moment to moment; it will be different when you are healthy as opposed to sick. Under any circumstance, simply do your best, and you will avoid self-judgment, self-abuse and regret"

pattern = input()

len(re.findall(pattern,quote))



3

The regex search returns an object with several methods that give details about it.
These methods include group which returns the string matched, start and end which return the start and ending positions of the first match, and span which returns the start and end positions of the first match as a tuple.

Example:

In [13]:
import re

pattern = r"pam"

match = re.search(pattern, "eggspamsausage")
if match:
    print(match.group())
    print(match.start())
    print(match.end())
    print(match.span())

pam
4
7
(4, 7)


### Search and replace

In [8]:
import re

str = "My name is David. Hi David."
pattern = r"David"
newstr = re.sub(pattern, "Amy", str)
print(newstr)

My name is Amy. Hi Amy.


We need to create a number formatting system for a contacts database.
Create a program that will take the phone number as input, and if the number starts with "00", replace them with "+".
The number should be printed after formatting.

In [12]:
import re

number = input()
pattern = r"00"

if number[0] == "0":
    newstr = re.sub(pattern, "+", number)

else:
    newstr = number

print(newstr)


+161


### Metacharacters

Metacharacters are what make regular expressions more powerful than normal string methods.
They allow you to create regular expressions to represent concepts like "one or more repetitions of a vowel".

The existence of metacharacters poses a problem if you want to create a regular expression (or regex) that matches a literal metacharacter, such as "$". You can do this by escaping the metacharacters by putting a backslash in front of them.
However, this can cause problems, since backslashes also have an escaping function in normal Python strings. This can mean putting three or four backslashes in a row to do all the escaping.


#### Dot

In [16]:
#The first metacharacter we will look at is . (dot).
#This matches any character, other than a new line.

import re

pattern = r"gr.y"

if re.match(pattern, "grey"):
    print("Match 1")

if re.match(pattern, "gray"):
    print("Match 2")

if re.match(pattern, "blue"):
    print("Match 3")

Match 1
Match 2


In [17]:
#Start and end
import re

pattern = r"^gr.y$"

if re.match(pattern, "grey"):
    print("Match 1")

if re.match(pattern, "gray"):
    print("Match 2")

if re.match(pattern, "stingray"):
    print("Match 3")

Match 1
Match 2
