<a href="https://colab.research.google.com/github/epythonlab/PythonLab/blob/master/Regex_Tutorials.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learn Regular Expressions from Beginner to Advanced

# Description:

In this tutorial, you will learn everything you need to know about regular expressions, from beginner to advanced. I will cover the basics of regular expressions, as well as more advanced topics with practical examples. By the end of this tutorial, you will be able to use regular expressions to solve a variety of problems.

Keywords

regular expressions, regex, beginner, advanced, Python, programming, tutorial, regex tutorial, regular expressions tutorial, regex examples, regex exercises, beginner regex, advanced regex

 # I. Introduction to Regular Expressions


A. What are regular expressions?
- Regular expressions `(regex)` are sequences of characters that define a search pattern.
- They are widely used for pattern matching and text manipulation tasks.

B. Why are regular expressions useful?
- Regular expressions offer a powerful and flexible way to search, validate, and manipulate text data.
- They can be applied in various programming languages, text editors, and command-line tools.

C. How are regular expressions applied in various fields?
- Regular expressions find applications in fields like data validation, web scraping, text mining, log analysis, search and replace operations, and more.

# II. Regex Basics


Importing the `re` Module

In Python, regular expressions are handled using the built-in `re` module. Before using regular expressions, you need to import this module

In [None]:
import re

A. Literal Matches
- Matching literal characters directly.

## Example:
- The regex pattern `cat` matches the word "cat" in the input text.

In [None]:
text = "The cat is on the mat."
pattern = "cat"

result = re.findall(pattern, text)
print(result)  # Output: ['cat']

- `re.findall(pattern, string)`: Returns all non-overlapping matches of the pattern in the string as a list.

B. Metacharacters and Escaping
- Special characters in regex that have a predefined meaning. Escaping them allows treating them as literal characters.
## Example:
- To match a dot `(.)`, you need to escape it like `\.`

In [None]:
text = "I have a dot in my sentence."
pattern = r"\."

result = re.findall(pattern, text)
print(result)  # Output: ['.']

- `. (dot)`: Matches any single character except a newline.

C. Character Classes and Ranges
- Matching a specific set of characters or ranges.
## Example:
- The regex pattern `[aeiou]` matches any vowel in the input text.

In [None]:
text = "I love apples and oranges."
pattern = "[aeiou]"

result = re.findall(pattern, text)
print(result)  # Output: ['o', 'e', 'a', 'e', 'o', 'a', 'o', 'e']

D. Quantifiers
- Specifying the number of occurrences of a character or group.
## Example:
- The regex pattern `a{2,4}` matches "aa," "aaa," or "aaaa."


In [None]:
text = "aaa ab aab aaab"
pattern = "a{2,4}b"

result = re.findall(pattern, text)
print(result)  # Output: ['aab', 'aaab']

- `a{2,4}`: Matches the character "a" repeated 2 to 4 times consecutively.
- `b`: Matches the character "b" immediately following the "a" sequence.
Here's a breakdown of the pattern:

- `a{2,4}`: The curly braces `{2,4}` indicate a quantifier.
 In this case, it specifies that the preceding character "a" should occur between 2 and 4 times. So, it will match patterns like "aa", "aaa", or "aaaa", but not "a" or "aaaaa".
- `b`: Matches the character "b" immediately following the "a" sequence. It ensures that the pattern ends with the character "b".

E. Anchors
- Matching specific positions in the input text.
## Example:
- The regex pattern `^start` matches "start" at the beginning of a line.

In [None]:
text = "Start with a newline\nstart with a word"
pattern = "^start"

result = re.findall(pattern, text, re.MULTILINE)
print(result)  # Output: ['start']

# III. Intermediate Regex Concepts


A. Alternation
- Matching multiple alternatives using the OR operator `(|)`.
## Example:
- The regex pattern `apple|orange` matches either "apple" or "orange."


In [None]:
text = "I have an apple and an orange."
pattern = "apple|orange"

result = re.findall(pattern, text)
print(result)  # Output: ['apple', 'orange']

B. Grouping and Capturing
-  Grouping parts of a regex pattern and capturing matched substrings.
##  Example:
- The regex pattern `(ab)+` matches "ab," "abab," or "ababab."


In [None]:
text = "abab abc ab"
pattern = "(ab)+"

result = re.findall(pattern, text)
print(result)  # Output: ['ab', 'ab', 'ab']

C. Backreferences
-  Referring to previously captured groups within the same regex pattern.
## Example:
- The regex pattern `(\d{2})-\1` matches "22-22" but not "22-33."


In [None]:
text = "22-22 22-33"
pattern = r"(\d{2})-\1"

result = re.findall(pattern, text)
print(result)  # Output: ['22']

D. Lookaheads and Lookbehinds
- Matching based on the presence or absence of certain patterns ahead or behind the current position.
## Example:
- The regex pattern `(?<=prefix)\w+` matches a word preceded by the word "prefix."


In [None]:
text = "prefixword suffix"
pattern = r"\w+(?<=prefix)"

result = re.findall(pattern, text)
print(result)  # Output: ['word']

E. Greedy vs. Lazy Matching
- Controlling the matching behavior to be either greedy `(`matches as much as possible`)` or lazy `(`matches as little as possible`)`.
## Example:
- The regex pattern `a.+b` matches "aabbbb" in "aaabbbbbaabbbb" greedily, but "aab" lazily.

In [None]:
text = "aaabbbbbaabbbb"
pattern = r"a.+b"

result = re.findall(pattern, text)
print(result)  # Output: ['aabbbbbaabbbb']

pattern = r"a.+?b"

result = re.findall(pattern, text)
print(result)  # Output: ['aab']

# IV. Advanced Regex Techniques


A. Advanced Character Classes
- Utilizing advanced character class constructs for matching specific character types.
## Example:
-The regex pattern `\p{L}` matches any Unicode letter.


In [None]:
text = "Hello, こんにちは, مرحبًا"
pattern = r"\p{L}"

result = re.findall(pattern, text, re.UNICODE)
print(result)  # Output: ['H', 'e', 'l', 'l', 'o', 'こ', 'ん', 'に', 'ち', 'は', 'م', 'ر', 'ح', 'ب', 'ا']


B. Modifiers and Flags
- Adding modifiers or flags to change regex behavior (e.g., case-insensitive matching).
## Example:
- The regex pattern `pattern(?i)` matches "pattern" case-insensitively.


In [None]:
text = "Pattern matching is case-insensitive."
pattern = r"pattern(?i)"

result = re.findall(pattern, text)
print(result)  # Output: ['Pattern']

C. Conditional Matching
-  Matching patterns conditionally based on certain criteria.
## Example:
- The regex pattern `(?(expression)true|false)` matches "true" if the expression is satisfied; otherwise, it matches "false."


In [None]:
text = "There are 10 items."
pattern = r"(?(10)true|false)"

result = re.findall(pattern, text)
print(result)  # Output: ['true']

D. Recursive Patterns
-  Creating patterns that can match nested or repetitive structures.
## Example:
- The regex pattern `(?R)` matches the entire pattern recursively.


In [None]:
text = "Nested brackets: (((text)))"
pattern = r"(\(((?>[^()]+|(?1))*)\))"

result = re.findall(pattern, text)
print(result)  # Output: [('(((text)))', 'text')]

E. Assertions
-  Making assertions about the surrounding text without including it in the actual match.
## Example:
- The regex pattern `word(?=\W|$)` matches "word" only if it's followed by a non-word character or end of line.

In [None]:
text = "word, word1, word2"
pattern = r"word(?=\W|$)"

result = re.findall(pattern, text)
print(result)  # Output: ['word', 'word']

# V. Practical Examples and Demo Exercises


A. Validating Email Addresses
- Using regex to validate the format of an email address.
## Exercise:
- Write a regex pattern to validate email addresses with the format "username@epythonlab.com."


In [None]:
email = "example@example.com"
pattern = r"^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$"

result = re.match(pattern, email)
if result:
    print("Valid email address.")
else:
    print("Invalid email address.")

B. Extracting URLs from Text
- Extracting URLs from a text document using regex.
## Exercise:
- Write a regex pattern to extract URLs from a given text input.


In [None]:
text = "Visit my website at https://www.example.com"
pattern = r"https?://[\w./-]+"

result = re.findall(pattern, text)
print(result)  # Output: ['https://www.example.com']

C. Parsing HTML/XML Tags
- Parsing and extracting information from HTML/XML tags using regex.
## Exercise:
- Write a regex pattern to extract the content within HTML `<title>` tags.


In [None]:
html = "<title>Regex Tutorial</title>"
pattern = r"<title>(.*?)<\/title>"

result = re.findall(pattern, html)
print(result)  # Output: ['Regex Tutorial']

D. Formatting and Manipulating Text
- Using regex to format and manipulate text strings.
## Exercise:
- Write a regex pattern to remove all non-alphanumeric characters from a given text.


In [None]:
text = "Remove!@#$non-alphanumeric%^characters"
pattern = r"\W+"

result = re.sub(pattern, "", text)
print(result)  # Output: 'Removenonalphanumericcharacters'

E. Data Extraction and Transformation
-  Applying regex to extract and transform data from structured or semi-structured text.
## Exercise:
- Write a regex pattern to extract phone numbers from a text document.

In [None]:
text = "Contact us at: Phone: 123-456-7890 Email: info@example.com"
    pattern = r"Phone: (\d{3}-\d{3}-\d{4}) Email: (\w+@\w+\.\w+)"

    result = re.findall(pattern, text)
    print(result)  # Output: [('123-456-7890', 'info@example.com')]

# Conclusion:


In conclusion, this comprehensive regex tutorial covered various topics from basics to advanced techniques. You learned how regex empowers you to tackle pattern matching and text manipulation tasks. With practical examples and exercises, you gained hands-on experience. Keep practicing and exploring the vast world of regex for endless possibilities in text-related problem-solving.

## Join [Epythonlab](https://telegram.me/epythonlab/)