### `r''` (Raw String in Python)
- In Python, strings use escape characters like \n (newline) or \t (tab).
- Regex uses a lot of backslashes (\d, \w, etc.).
- To avoid Python treating them as special characters, we use raw strings by adding r before the quotes.

Before we can use the Regular Expression, we must import re

In [20]:
import re

### `\d` (Digit)
- Means digit (0 - 9)

In [21]:
# using re.findall(pattern, string) --> scans the entire string and returns a list of all matches of the pattern
pattern = r'\d'
my_string = 'The numbers representing "I love you" is 143'
re.findall(pattern, my_string)

['1', '4', '3']

### `\w` (Word Character)
- Word Character = letters(a-z,A-Z), digits(0,9), and underscore _.

In [22]:
pattern = r'\w'
my_string = 'Hello_456o_zen'
re.findall(pattern, my_string)

['H', 'e', 'l', 'l', 'o', '_', '4', '5', '6', 'o', '_', 'z', 'e', 'n']

### `[]` (Character Set)
- Matches any one character inside the brackets.

In [23]:
# to match a vowel
pattern = r'[aeiou]'
my_string = 'An apple a day, makes 7 apples a week.'
re.findall(pattern, my_string)

['a', 'e', 'a', 'a', 'a', 'e', 'a', 'e', 'a', 'e', 'e']

In [24]:
# match a digit
pattern = r'[0-9]'
my_string = 'I have 3 apples. I gave 1 to Anna. How much is left?'
re.findall(pattern, my_string)

['3', '1']

In [25]:
# not a digit [^0-9]
pattern = r'[^0-9]'
my_string = '123456W7890'
re.findall(pattern, my_string)

['W']

### `{}` (Quantifier - how many times)
- Tells regex how many times an item should appear
- {n} - exactly n times
- {n,} - at least n times
- {n,m} - between n and m times

In [26]:
pattern = r'\d{3}'
my_string = 'My phone number is 4035551234.'
re.findall(pattern, my_string)

['403', '555', '123']

### `\s` → whitespace (space, tab, newline)
- Matches any kind of blank space.

In [27]:
pattern = r'\s'
my_string = 'Hello   World\nof Python!'
re.findall(pattern, my_string)

[' ', ' ', ' ', '\n', ' ']

### `.` → any character except newline
- Matches any single character.

In [28]:
pattern = r'.'
my_string = 'cat, dog'
re.findall(pattern, my_string)

['c', 'a', 't', ',', ' ', 'd', 'o', 'g']

### `+` → 1 or more times
- Means the pattern must appear at least once.

In [29]:
pattern = r'o+'
my_string = 'Hellllloooo Woorld, yahoo!'
re.findall(pattern, my_string)

['oooo', 'oo', 'oo']

### `*` → 0 or more times
- Means the pattern may appear any number of times, even zero.

In [30]:
pattern = r'lo*'
re.findall(pattern, my_string)

['l', 'l', 'l', 'l', 'loooo', 'l']

### `?` → 0 or 1 time (optional)
- Means the character or group is optional.

In [31]:
pattern = r'colou?r'
my_string = 'My favorite color is blue. Her favorite colour is green.'
re.findall(pattern, my_string)

['color', 'colour']

## Mini Expression Parser
---
### Rule 1: Variable names
- Variable must contain only letters or underscore (no digits, no special chars).
- On the left side, exactly one variable.
- On the right side, exactly two variables with a + between them.

Pattern: `^[A-Aa-z_]+$`
1. `^`
- Means start of the string.
- Makes sure the match begins at the very beginning.

2. `[A-Za-z_]`

- A character set.
- Matches one character only, but that character must be:
- A-Z → uppercase letters
- a-z → lowercase letters
- _ → underscore
- So [A-Za-z_] means: “one letter or underscore.”

3. `+`
- A quantifier: one or more times.
- So [A-Za-z_]+ means: “one or more letters/underscores.”

Examples:

- "x" (allowed)
- "variableName" (allowed)
- "_" (allowed)
- "x1" (NOT ALLOWED because digit 1 is not allowed)

4. `$`
- Means end of the string.
- Ensures that the match stops at the very end, so no extra characters allowed.

Example:

`re.fullmatch(r"[A-Za-z_]+", "hello")`  # match

`re.fullmatch(r"[A-Za-z_]+", "hello1")` # no match

### Rule 2: Assignment operator (=)
- There must be exactly one equal sign.

Pattern: `=`

### Rule 3: Expression (right-hand side, only + allowed)
- Must be `variable + variable`.

Examples:

`"x+b"` → valid

`"x-b"` → invalid (- sign not allowed)

`"x+1"` → invalid (variable + variable only)


Pattern: `^[A-Za-z_]+\s*\+\s*[A-Za-z_]+$`

1. The `\s*`

- `\s` = any whitespace character (space, tab, newline).

- `*` = zero or more times.

So `\s*` means:“allow any number of spaces (including none).”

Examples of `\s*`:

`print(bool(re.match(r"x\s*=\s*y", "x=y")))`     # True (no spaces)

`print(bool(re.match(r"x\s*=\s*y", "x = y")))`   # True (one space)

`print(bool(re.match(r"x\s*=\s*y", "x     =y")))` # True (many spaces)

2. `+` = one or more times

- So this matches a variable name like x, abc, _var

3. `\+`
- Here the + is escaped with a backslash → `\+`
- Means literal plus sign (+) instead of the regex quantifier.
- Ensures the expression contains a plus operator.

### Rule 4: Termination by semicolon
- Must end with `;`

Pattern: `;$`

### PUTTING IT ALL TOGETHER
- Full validation regex: `^([A-Za-z_]+)\s*=\s*([A-Za-z_]+)\s*\+\s*([A-Za-z_]+);$`

In [32]:
rules = {
    "variable": r"^[A-Za-z_]+$",
    "assignment": r"=",
    "expression": r"^[A-Za-z_]+\s*\+\s*[A-Za-z_]+$",
    "termination": r";$",
    "full": r"^([A-Za-z_]+)\s*=\s*([A-Za-z_]+)\s*\+\s*([A-Za-z_]+);$"
}

In [33]:
tests = [
    "y = x + b;",
    "y = x - b;",
    "1y = x + b;",
    "y = x + 1;",
    "y = x + b"
]

In [39]:
for test in tests:
    if re.fullmatch(rules["full"], test):
        print(f"{test} is a valid expression.")
    else:
        if not re.match(rules["variable"], test.split("=")[0].strip()):
            print(f"{test.split('=')[0]} is not a valid variable name.")
        elif rules["assignment"] not in test:
            print(f"{test} does not include an assignment operator (=).")
        elif not re.search(r"\s*=\s*(.+)", test) or not re.search(rules["expression"], test.split("=")[1].strip("; ")):
            # .+ means any character, at least once
            # \s*=\s*(.+) = “Skip optional spaces, match an equal sign, skip optional spaces again,
            # then capture the rest of the expression.”
            print(f"{test}  Invalid Expression")
        elif not test.endswith(";"):
            print(f"{test}  Not properly terminated")
        else:
            print(f"{test}  Unknown error")

y = x + b; is a valid expression.
y = x - b;  Invalid Expression
1y  is not a valid variable name.
y = x + 1;  Invalid Expression
y = x + b  Not properly terminated
