### Regular expressions are used to search, match, and manipulate text. They consist of patterns that describe sets of strings. Example:

In [1]:
import re

pattern = r"hello"
text = "hello world"
match = re.search(pattern, text)
print("Match found:", match.group() if match else "No match")

Match found: hello


In [2]:
pattern = r"cat"
text = "The cat sat on the mat."
match = re.search(pattern, text)
print("Match found:", match.group() if match else "No match")

Match found: cat


In [3]:
pattern = r"c.t"
text = "The cgt sat cat."
match = re.search(pattern, text)
print("Match found:", match.group() if match else "No match")

Match found: cgt


### [A-Z]: Matches any uppercase letter 
[0-9]: Matches any digit

In [4]:
pattern = r"[a-z]"
text = "Hello World 123"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['e', 'l', 'l', 'o', 'o', 'r', 'l', 'd']


In [5]:
pattern = r"[a-z]+"
text = "Hello World 123"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['ello', 'orld']


In [6]:
pattern = r"[A-Z]"
text = "Hello World 123 RAj"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['H', 'W', 'R', 'A']


In [7]:
pattern = r"[A-Z]+"
text = "Hello World 123 RAj"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['H', 'W', 'RA']


In [8]:
pattern = r"[A-Z]+"
text = "Hello World /d RAj"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['H', 'W', 'RA']


### \d is used to match number in text

In [9]:
pattern = r"\d"
text = "Hello World 123 RAj 4561 2587"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['1', '2', '3', '4', '5', '6', '1', '2', '5', '8', '7']


In [10]:
pattern = r"\d+"
text = "Hello World 123 RAj 4561 2587"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['123', '4561', '2587']


In [11]:
pattern = r"\d{3}+"
text = "Hello World 123 RAj 4561 258756"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['123', '456', '258', '756']


In [12]:
pattern = r"[\w]"
text = "Hello World 123 RAj 4561 2587"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd', '1', '2', '3', 'R', 'A', 'j', '4', '5', '6', '1', '2', '5', '8', '7']


In [13]:
pattern = r"[\w]+"
text = "Hello World 123 RAj 4561 2587"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['Hello', 'World', '123', 'RAj', '4561', '2587']


### The dot character matches any character except a newline. Example:

In [14]:
pattern = r"h.llo"
text = "hello"
match = re.search(pattern, text)
print("Match found:", match.group() if match else "No match")


Match found: hello


In [15]:
pattern = r"h.llo"
text = "h llo"
match = re.search(pattern, text)
print("Match found:", match.group() if match else "No match")


Match found: h llo


In [16]:
pattern = r"h.llo"
text = "h.llo"
match = re.search(pattern, text)
print("Match found:", match.group() if match else "No match")


Match found: h.llo


In [17]:
pattern = r"h.llo"
text = "h\nllo"
match = re.search(pattern, text)
print("Match found:", match.group() if match else "No match")


Match found: No match


In [18]:
pattern = r"[\w-]+"
text = "Hell--"
matches = re.findall(pattern, text)
print("Matches found:", matches)

Matches found: ['Hell--']


### Greedy Matches By default, regex is greedy, meaning it matches the longest possible string. To make it lazy (match as few characters as possible), use ?. Example:

In [19]:
pattern = r".*"
text = "content"
match = re.search(pattern, text)
print("Greedy match:", match.group())


Greedy match: content


In [20]:
pattern_lazy = r".?"
text = "content"
match_lazy = re.search(pattern_lazy, text)
print("Lazy match:", match_lazy.group())

Lazy match: c


### Grouping
=> Parentheses () are used to group parts of a pattern for extraction or backreference. Example:

In [21]:
pattern = r"(\d{3})-(\d{2})"
text = "Phone number: 123-45"
match = re.search(pattern, text)
print("Area code:", match.group(1))
print("Local code:", match.group(2))

Area code: 123
Local code: 45


### Matching at Beginning or End

    ^: Match at the start

    $: Match at the end

In [22]:
pattern = r"^Hello"
text = "Hello world"
match = re.search(pattern, text)
print("Match at start:", match.group() if match else "No match")

pattern = r"world$"
match = re.search(pattern, text)
print("Match at end:", match.group() if match else "No match")

Match at start: Hello
Match at end: world


### Match Objects, The match object contains details about the match. Use .group() to get matched text, and .start() / .end() to get positions. Example:

In [23]:
pattern = r"world"
text = "Hello world"
match = re.search(pattern, text)
print("Matched text:", match.group())
print("Start position:", match.start())
print("End position:", match.end())

Matched text: world
Start position: 6
End position: 11


### Substituting, re.sub() replaces parts of the string that match a pattern. Example:

In [28]:
pattern = r"cat"
text = "The cat sat on the mat."
result = re.sub(pattern, "dog", text)
print("After substitution:", result)

After substitution: The dog sat on the mat.


### Splitting a String

#### re.split() splits a string at each match of the pattern.

In [30]:
pattern = r"\s+"
text = "Split this    sentence by spaces"
result = re.split(pattern, text)
print("Split result:",result)

Split result: ['Split', 'this', 'sentence', 'by', 'spaces']


In [31]:
pattern = r"\s"
text = "Split this    sentence by spaces"
result = re.split(pattern, text)
print("Split result:",result)

Split result: ['Split', 'this', '', '', '', 'sentence', 'by', 'spaces']


### Compiling Regular Expressions You can compile regular expressions for repeated use with re.compile(). Example:

In [32]:
pattern = re.compile(r"\d+")
text = "123 apples, 456 bananas"
matches = pattern.findall(text)
print("Matches:", matches)

Matches: ['123', '456']


### Flags
Flags modify the behavior of regex. Common flags:


re.IGNORECASE or re.I: case-insensitive matching

re.DOTALL: makes . match newline characters

re.MULTILINE: changes ^ and $ to match the start/end of lines.
Example:

In [33]:
pattern = r"hello"
text = "Hello"
match = re.search(pattern, text, re.IGNORECASE)
print("Case-insensitive match:", match.group() if match else "No match")

Case-insensitive match: Hello
