# <font color="#418FDE" size="6.5" uppercase>**Regex Basics**</font>

>Last update: 20251224.
    
By the end of this Lecture, you will be able to:
- Explain how literal characters and metacharacters form the core of regex patterns. 
- Construct simple patterns using character classes and predefined shorthand classes in Python. 
- Differentiate between greedy and non-greedy quantifiers in simple matching scenarios. 


## **1. Literals and Metacharacters**

### **1.1. Matching Literal Text**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_B/image_01_01.jpg?v=1766629932" width="250">



>* Regex starts with matching simple literal text
>* Engine searches for exact character sequences like editor

>* Literal patterns match exact sequences across the text
>* Engine scans left to right using predictable steps

>* Literal-only patterns are simple and good for basics
>* They are rigid, inspiring flexible metacharacter-based patterns



In [None]:
#@title Python Code - Matching Literal Text

# Demonstrate simple literal regex matching in plain Python text examples.
# Show how pattern letters must appear exactly in correct order.
# Compare matches for refund and cat inside different example sentences.

import re  # Import regular expression module for literal pattern matching.

text = "The customer requested a refund, then another refunded item later."  # Example feedback text.
pattern_refund = r"refund"  # Literal pattern matching letters r e f u n d exactly.

matches_refund = re.findall(pattern_refund, text)  # Find all literal refund occurrences inside text.
print("Literal pattern 'refund' matches:", matches_refund)  # Show found literal refund matches.

text_animals = "The catalog listed a cat, a scatter rug, and a bobcat."  # Example animal text.
pattern_cat = r"cat"  # Literal pattern matching letters c a t exactly in sequence.

matches_cat = re.findall(pattern_cat, text_animals)  # Find all literal cat sequences inside text.
print("Literal pattern 'cat' matches:", matches_cat)  # Show cat matches including catalog and scatter.

pattern_exact_cat = r"\bcat\b"  # Literal cat with word boundaries for standalone word only.
exact_cat_matches = re.findall(pattern_exact_cat, text_animals)  # Find standalone literal cat word.
print("Standalone word 'cat' matches:", exact_cat_matches)  # Show only exact cat word occurrences.




### **1.2. Core Regex Anchors**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_B/image_01_02.jpg?v=1766629948" width="250">



>* Anchors control match position, not matched characters
>* Start and end anchors enforce exact line boundaries

>* Anchors keep matches at the line boundaries
>* They ensure entire inputs follow one exact pattern

>* Anchors target endings or whole simple responses
>* They act as invisible boundaries, not characters



In [None]:
#@title Python Code - Core Regex Anchors

# Demonstrate core regex anchors for string start and string end.
# Compare anchored patterns with unanchored patterns in simple examples.
# Show how anchors restrict matches to specific string positions.

import re  # Import regular expression module for pattern matching.

lines = ["2024-12-25 Error started.", "Error on 2024-12-25 occurred.", "2024-12-25 Another message."]

pattern_anywhere = re.compile(r"2024-12-25")
pattern_start = re.compile(r"^2024-12-25")
pattern_full = re.compile(r"^2024-12-25 Error started\.$")

print("Pattern without anchors matches these lines:")
for line in lines:
    if pattern_anywhere.search(line):
        print("ANYWHERE match found in:", line)

print("\nPattern anchored at start matches these lines:")
for line in lines:
    if pattern_start.search(line):
        print("START anchored match in:", line)

print("\nPattern anchored at start and end matches:")
for line in lines:
    if pattern_full.fullmatch(line):
        print("FULL anchored match in:", line)



### **1.3. Escaping Special Characters**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_B/image_01_03.jpg?v=1766629963" width="250">



>* Special regex symbols need escaping for literals
>* Escaping prevents wrong matches and regex errors

>* Escaping makes special symbols match literally in text
>* Prevents metacharacters from triggering unwanted regex behavior

>* Escaping clarifies literal characters and prevents bugs
>* Consistent escaping aids collaboration and future pattern changes



In [None]:
#@title Python Code - Escaping Special Characters

# Demonstrate escaping special regex characters in simple Python examples.
# Show difference between unescaped and escaped dot characters in patterns.
# Help beginners see why literal symbols require backslash escaping.

import re  # Import regular expression module for pattern matching operations.

text = "Version 2.0? Use file[name].txt + backup.txt"  # Example text containing special characters.

pattern_unescaped_dot = r"2.0"  # Unescaped dot acts as wildcard matching any single character.
pattern_escaped_dot = r"2\.0"  # Escaped dot matches literal period character between digits.

match_unescaped = re.search(pattern_unescaped_dot, text)  # Search using unescaped dot wildcard pattern.
match_escaped = re.search(pattern_escaped_dot, text)  # Search using escaped dot literal period pattern.

print("Unescaped dot pattern:", pattern_unescaped_dot, "->", match_unescaped.group())  # Show wildcard result.
print("Escaped dot pattern:", pattern_escaped_dot, "->", match_escaped.group())  # Show literal period result.

pattern_unescaped_brackets = r"file[name]"  # Unescaped brackets define character class inside pattern.
pattern_escaped_brackets = r"file\[name\]"  # Escaped brackets match literal square bracket characters.

match_unescaped_brackets = re.search(pattern_unescaped_brackets, text)  # Search using character class pattern.
match_escaped_brackets = re.search(pattern_escaped_brackets, text)  # Search using literal brackets pattern.

print("Unescaped brackets pattern:", pattern_unescaped_brackets, "->", match_unescaped_brackets.group())
print("Escaped brackets pattern:", pattern_escaped_brackets, "->", match_escaped_brackets.group())



## **2. Character Classes Basics**

### **2.1. Bracket Character Sets**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_B/image_02_01.jpg?v=1766629987" width="250">



>* Bracket sets match one character from choices
>* They simplify patterns for small, predictable variations

>* Inside brackets, most special symbols lose meaning
>* Some symbols stay special, affecting ranges and matching

>* Use brackets to allow or forbid characters
>* Control text validation while keeping patterns readable



In [None]:
#@title Python Code - Bracket Character Sets

# Demonstrate basic bracket character sets with simple Python regex examples.
# Show how one bracket set matches several similar words at once.
# Compare inclusive and exclusive bracket sets using small sample strings.

import re  # Import regular expression module for pattern matching.

words = ["cat", "cot", "cut", "cit", "coat"]  # Sample words list.
pattern_any_vowel = re.compile(r"c[aeiou]t")  # Bracket set allows any vowel.

print("Words matching pattern c[aeiou]t:")  # Describe first matching result.
for word in words:  # Loop through each sample word.
    if pattern_any_vowel.fullmatch(word):  # Check full word match.
        print("  ", word)  # Print word that matches pattern.

pattern_only_a_or_u = re.compile(r"c[au]t")  # Bracket set restricts allowed vowels.
print("\nWords matching pattern c[au]t:")  # Describe second matching result.
for word in words:  # Loop again through sample words.
    if pattern_only_a_or_u.fullmatch(word):  # Check restricted vowel match.
        print("  ", word)  # Print word that matches restricted pattern.

pattern_not_vowel = re.compile(r"c[^aeiou]t")  # Caret excludes listed vowels.
print("\nWords matching pattern c[^aeiou]t:")  # Describe exclusive set result.
for word in words:  # Loop through words for exclusive pattern.
    if pattern_not_vowel.fullmatch(word):  # Match any nonvowel middle character.
        print("  ", word)  # Print word that matches exclusive pattern.




### **2.2. Character Range Patterns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_B/image_02_02.jpg?v=1766630012" width="250">



>* Use ranges to represent continuous character spans
>* They make regex shorter, clearer, and less error-prone

>* Use ranges to allow flexible, constrained characters
>* Combine ranges for letters, digits, and identifiers

>* Dash position and range order matter greatly
>* Ranges cover basic ASCII; include others explicitly



In [None]:
#@title Python Code - Character Range Patterns

# Demonstrate character range patterns using simple Python regex examples.
# Show how ranges match letters and digits inside bracket sets.
# Compare patterns using ranges with equivalent longer explicit character lists.

import re  # Import regular expression module for pattern matching.

text_samples = ["A7", "g3", "Z9", "mX"]

pattern_range = re.compile(r"^[A-Z][0-9]$")

pattern_explicit = re.compile(r"^[ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789]$")

print("Using range pattern [A-Z][0-9]:")

for sample in text_samples:
    print(sample, "->", bool(pattern_range.match(sample)))

print("\nUsing explicit characters without ranges:")

for sample in text_samples:
    print(sample, "->", bool(pattern_explicit.match(sample)))




### **2.3. Shorthand Character Classes**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_B/image_02_03.jpg?v=1766630028" width="250">



>* Shorthand classes compactly represent common character groups
>* They simplify readable patterns for many text tasks

>* Digit, word, whitespace shorthands match common characters
>* Their opposites match non-digit, non-word, non-whitespace

>* Combine shorthands, quantifiers, anchors for powerful patterns
>* Use them carefully to keep matches precise, flexible



In [None]:
#@title Python Code - Shorthand Character Classes

# Demonstrate Python regex shorthand character classes with simple examples.
# Show how digits, word characters, and whitespace are matched concisely.
# Compare shorthand classes with equivalent longer bracket expressions.

import re  # Import regular expression module for pattern matching.

text_example = "Order ABC_123 costs 45 dollars on 12 05 2025."  # Example text string.

pattern_digits = r"\d+"  # Shorthand pattern matching one or more digit characters.
pattern_words = r"\w+"  # Shorthand pattern matching one or more word characters.
pattern_spaces = r"\s+"  # Shorthand pattern matching one or more whitespace characters.

pattern_not_digits = r"\D+"  # Shorthand pattern matching one or more non digit characters.
pattern_not_words = r"\W+"  # Shorthand pattern matching one or more non word characters.
pattern_not_spaces = r"\S+"  # Shorthand pattern matching one or more non whitespace characters.

print("Text example string:", text_example)  # Display the example text string.

print("Digits using \\d+:", re.findall(pattern_digits, text_example))  # Show matched digit sequences.
print("Digits using [0-9]+:", re.findall(r"[0-9]+", text_example))  # Equivalent explicit pattern.

print("Word chunks using \\w+:", re.findall(pattern_words, text_example))  # Show matched word chunks.
print("Whitespace using \\s+:", re.findall(pattern_spaces, text_example))  # Show matched whitespace chunks.

print("Non digits using \\D+:", re.findall(pattern_not_digits, text_example))  # Show non digit chunks.
print("Non words using \\W+:", re.findall(pattern_not_words, text_example))  # Show non word separators.
print("Non spaces using \\S+:", re.findall(pattern_not_spaces, text_example))  # Show non space segments.



## **3. Regex Quantifiers Essentials**

### **3.1. Core Quantifier Usage**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_B/image_03_01.jpg?v=1766630060" width="250">



>* Quantifiers control how many times patterns repeat
>* They separate what to match from repetition rules

>* Quantifiers attach to a target and control repeats
>* They fine-tune repetition for characters, classes, groups

>* Quantifiers interact with neighbors to shape matches
>* Their behavior underlies greedy versus lazy matching



In [None]:
#@title Python Code - Core Quantifier Usage

# Demonstrate basic regex quantifiers with simple repeated character patterns.
# Show how quantifiers attach to specific target elements in patterns.
# Print matches for different quantifiers using the same base text.

import re  # Import regular expression module for pattern matching.

text = "Sooo good, sooooo tasty, so good."  # Example feedback text with elongated words.

pattern_one_or_more = r"o+"  # Quantifier plus sign means one or more letter o characters.

pattern_zero_or_one = r"o?"  # Quantifier question mark means zero or one letter o character.

pattern_exact_three = r"o{3}"  # Quantifier braces mean exactly three letter o characters.

matches_one_or_more = re.findall(pattern_one_or_more, text)  # Find repeated o sequences.

matches_zero_or_one = re.findall(pattern_zero_or_one, "so so so")  # Show optional o matches.

matches_exact_three = re.findall(pattern_exact_three, text)  # Find sequences with three o characters.

print("Text:", text)  # Display the original text for reference.

print("One or more o matches:", matches_one_or_more)  # Show plus quantifier results.

print("Zero or one o matches count:", len(matches_zero_or_one))  # Show optional quantifier count.

print("Exactly three o matches:", matches_exact_three)  # Show exact repetition quantifier results.



### **3.2. Greedy and Lazy Quantifiers**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_B/image_03_02.jpg?v=1766630082" width="250">



>* Greedy quantifiers grab as much text as possible
>* Lazy quantifiers take minimal text needed to match

>* Greedy quantifiers grab text up to last bracket
>* Lazy quantifiers stop at earliest valid closing bracket

>* Greedy versus lazy affects sentence and text extraction
>* Choose quantifier type based on desired match size



In [None]:
#@title Python Code - Greedy and Lazy Quantifiers

# Demonstrate greedy and lazy quantifiers using simple tag like text.
# Show how different patterns capture different text spans between markers.
# Help beginners visualize regex matching behavior using clear printed output.

import re  # Import regular expression module for pattern matching.

text = "<tag>small</tag> and <tag>very long text</tag> here"  # Example text.

greedy_pattern = re.compile(r"<tag>.*</tag>")  # Greedy quantifier pattern example.

lazy_pattern = re.compile(r"<tag>.*?</tag>")  # Lazy quantifier pattern example.

print("Source text string:", text)  # Show the original example text.

print("\nGreedy pattern matches:")  # Announce greedy matching results clearly.
for match in greedy_pattern.findall(text):  # Loop through greedy matches list.
    print("  ", match)  # Print each greedy match result line.

print("\nLazy pattern matches:")  # Announce lazy matching results clearly.
for match in lazy_pattern.findall(text):  # Loop through lazy matches list.
    print("  ", match)  # Print each lazy match result line.




### **3.3. Repetition Pattern Demos**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_B/image_03_03.jpg?v=1766630131" width="250">



>* Log examples show greedy versus non-greedy behavior
>* Greedy overcaptures; non-greedy stops at earliest match

>* Greedy quantifiers can swallow multiple markup tags
>* Non-greedy quantifiers capture only the first tag

>* Greedy quantifiers grab text across multiple parentheses
>* Non-greedy quantifiers capture only the first aside



In [None]:
#@title Python Code - Repetition Pattern Demos

# Demonstrate greedy and lazy repetition with simple bracketed text examples.
# Show how different quantifiers change matched timestamp and aside segments.
# Help beginners visualize smallest versus largest possible repeated text matches.

import re  # Import regular expression module for pattern matching operations.

log_line = "[2025-12-25 10:30] User clicked [settings] button."  # Example log line.
pattern_greedy = r"\[.+\]"  # Greedy pattern grabs largest possible bracketed segment.
pattern_lazy = r"\[.+?\]"  # Lazy pattern grabs smallest possible bracketed segment.

match_greedy = re.search(pattern_greedy, log_line)  # Search using greedy quantifier pattern.
match_lazy = re.search(pattern_lazy, log_line)  # Search using lazy quantifier pattern.

print("Log line text:", log_line)  # Display original log line for reference understanding.
print("Greedy match result:", match_greedy.group())  # Show greedy matched substring segment.
print("Lazy match result:", match_lazy.group())  # Show lazy matched substring segment.

message = "Please review this (short note) and this (longer aside here)."  # Example message.
paren_greedy = r"\(.+\)"  # Greedy pattern spans from first opening to last closing parenthesis.
paren_lazy = r"\(.+?\)"  # Lazy pattern stops at earliest closing parenthesis encountered.

greedy_aside = re.search(paren_greedy, message)  # Apply greedy pattern to message string.
lazy_aside = re.search(paren_lazy, message)  # Apply lazy pattern to message string.

print("Message text:", message)  # Display original message for context understanding.
print("Greedy aside match:", greedy_aside.group())  # Show combined aside using greedy quantifier.
print("Lazy aside match:", lazy_aside.group())  # Show first aside using lazy quantifier.




# <font color="#418FDE" size="6.5" uppercase>**Regex Basics**</font>


In this lecture, you learned to:
- Explain how literal characters and metacharacters form the core of regex patterns. 
- Construct simple patterns using character classes and predefined shorthand classes in Python. 
- Differentiate between greedy and non-greedy quantifiers in simple matching scenarios. 

In the next Lecture (Lecture C), we will go over 'Using Python `re`'