# Regex Quest ðŸ§©ðŸ”¥
Welcome to the gamified journey through Regular Expressions. Advance through levels, earn points, build streaks, unlock boss fights, and master powerful pattern skills.

## How To Play
- Run each Level cell in order.
- Write your regex or code in the Attempt cell.
- Check the inline Solution cell (after you try first!).
- Call `complete_level(level_id, used_hint=False, success=True)` to bank points.
- Use `get_hint(level_id)` if stuck (penalty applied).
- Streak grows with consecutive successes (no hints, no failures) â†’ higher multipliers.

## Scoring System
- Base points per level: increases as difficulty rises.
- Streak multiplier: `mult = 1 + (streak * 0.10)`.
- Hint penalty: reduces awarded points by tier (early 5%, mid 15%, late/boss 25%).
- Failure resets streak to 0.

## Files
- Persistent score stored in `regex_score.json`.

## Goal
Finish all levels + Boss + Tournament with maximum score and zero hints.

---
Run the next cell to load scoring utilities.

In [2]:
# Scoring & Persistence Utilities
import json, time, math, os, re
from typing import Dict, Any
SCORE_PATH = 'regex_score.json'

BASE_POINTS = {
    # Levels 1-12
    1: 50, 2: 60, 3: 70, 4: 80,
    5: 90, 6: 100, 7: 110, 8: 120,
    9: 130, 10: 140, 11: 160, 12: 180,
    # Special IDs
    100: 250,   # Mini Boss
    200: 350,   # Boss
    300: 500    # Tournament
}

HINT_PENALTY_TIERS = {
    'early': 0.05,  # Levels 1-4
    'mid': 0.15,    # Levels 5-8
    'late': 0.25    # Levels 9-12 + bosses
}

def _hint_tier(level_id:int)->str:
    if level_id <= 4: return 'early'
    if level_id <= 8: return 'mid'
    return 'late'

def load_score()->Dict[str,Any]:
    if not os.path.exists(SCORE_PATH):
        return {'user':'default','score':0,'streak':0,'history':[]}
    with open(SCORE_PATH,'r',encoding='utf-8') as f:
        return json.load(f)

def save_score(data:Dict[str,Any])->None:
    with open(SCORE_PATH,'w',encoding='utf-8') as f:
        json.dump(data,f,indent=2)

def current_multiplier(streak:int)->float:
    return 1 + (streak * 0.10)

def complete_level(level_id:int, used_hint:bool=False, success:bool=True):
    data = load_score()
    if success:
        base = BASE_POINTS.get(level_id, 50)
        mult = current_multiplier(data['streak'])
        points = base * mult
        if used_hint:
            tier = _hint_tier(level_id)
            penalty = HINT_PENALTY_TIERS[tier]
            points *= (1 - penalty)
        awarded = math.floor(points)
        data['score'] += awarded
        data['streak'] = data['streak'] + 1 if not used_hint else 0  # hint breaks streak growth
        data['history'].append({
            'ts': time.time(),
            'level': level_id,
            'base': base,
            'multiplier': round(mult,2),
            'used_hint': used_hint,
            'awarded': awarded
        })
        save_score(data)
        print(f"Level {level_id} COMPLETE â†’ +{awarded} points (streak={data['streak']})")
    else:
        data['streak'] = 0
        data['history'].append({'ts': time.time(),'level':level_id,'failure':True})
        save_score(data)
        print(f"Level {level_id} FAILED â†’ streak reset.")

def show_status():
    d = load_score()
    print(f"User: {d['user']} | Score: {d['score']} | Streak: {d['streak']}")
    print(f"Completed: {[h['level'] for h in d['history'] if 'awarded' in h]}")

_HINTS = {}

def register_hint(level_id:int, text:str):
    _HINTS[level_id] = text.strip()

def get_hint(level_id:int):
    if level_id not in _HINTS:
        print('No hint registered for this level yet.')
        return None
    print(f"HINT (penalty applies on completion): {_HINTS[level_id]}")
    return _HINTS[level_id]

show_status()

User: default | Score: 0 | Streak: 0
Completed: []


# Lookarounds Explained (Zero-Width Assertions)
Lookarounds let you assert **context** around a match *without consuming it*.
They match a **position**, not characters.

| Type | Syntax | Meaning | Example |
|------|--------|---------|---------|
| Positive Lookahead | `X(?=Y)` | Match `X` only if followed by `Y` | `\d+(?= USD)` finds numbers before " USD" |
| Negative Lookahead | `X(?!Y)` | Match `X` only if *not* followed by `Y` | `foo(?!bar)` matches `foo` not followed by `bar` |
| Positive Lookbehind | `(?<=Y)X` | Match `X` only if preceded by `Y` | `(?<=ID:)\d+` grabs digits after `ID:` |
| Negative Lookbehind | `(?<!Y)X` | Match `X` only if *not* preceded by `Y` | `(?<!-)\b\d+\b` numbers not after dash |

Use cases:
- Select digits only after a label.
- Exclude words in certain contexts without capturing extra text.
- Prevent over-greedy matches.

Run the next cells for levels. Attempt first, then view solution.


In [None]:
# Level 1: Literal & Raw Strings (ID=1)
import re
level_id = 1
sample_text = "Contact: Mukesh, Python learner."
# Task: Match the exact word Mukesh.
# Write your pattern in variable `pat` then run re.search(pat, sample_text)
pat = r"Mukesh"  # TODO: modify if experimenting
match = re.search(pat, sample_text)
print(match.group() if match else 'No match')
register_hint(level_id, 'Use a simple literal; raw string not required here.')

In [None]:
# Level 1 Solution
# Explanation: Direct literal match.
complete_level(1, used_hint=False, success=True)

In [None]:
# Level 2: Character Classes & Negation (ID=2)
import re
level_id = 2
sample_text = "User42 scored 99 points; ID:2025; Flag:X"
# Task: 1) Find all digit sequences. 2) Find all uppercase letters excluding 'X'.
# Fill patterns pat_digits, pat_upper_no_x.
pat_digits = r"\d+"  # TODO: refine if needed
pat_upper_no_x = r"[A-WY-Z]"  # excludes X via range split
print('Digits:', re.findall(pat_digits, sample_text))
print('Upper (no X):', re.findall(pat_upper_no_x, sample_text))
register_hint(level_id, 'Use [^X] within a class or split ranges (A-W)(Y-Z).')

In [None]:
# Level 2 Solution
# Explanation: \d+ grabs consecutive digits; ranges avoid X.
complete_level(2, used_hint=False, success=True)

In [None]:
# Level 3: Shorthand Classes (ID=3)
import re
level_id = 3
sample_text = "Name:Mukesh\tEmail:mukesh@example.com\nLangs: Python, Java, C++"
# Task: Extract all 'words' (alphanumeric + underscore) and all whitespace segments.
pat_words = r"\b\w+\b"  # TODO adjust
pat_spaces = r"\s+"       # captures spaces/tabs/newlines
words = re.findall(pat_words, sample_text)
spaces = re.findall(pat_spaces, sample_text)
print('Words count:', len(words), '\nFirst 8:', words[:8])
print('Whitespace segments:', spaces)
register_hint(level_id, 'Use \w for word chars, \s for whitespace; anchor with \b.')

In [None]:
# Level 3 Solution
# Explanation: \b\w+\b ensures word boundaries; \s+ groups whitespace.
complete_level(3, used_hint=False, success=True)

In [None]:
# Level 4: Quantifiers (ID=4)
import re
level_id = 4
sample_text = "Phones: 903-366-9661, 800-123-0000 alt 44-22-11"
# Task: Match US-style phone numbers ###-###-#### only.
pat_phone = r"\b\d{3}-\d{3}-\d{4}\b"  # TODO adjust or extend
phones = re.findall(pat_phone, sample_text)
print('Phones found:', phones)
register_hint(level_id, 'Use {3} and {4} quantifiers with \d; anchor with \b.')

In [None]:
# Level 4 Solution
# Explanation: {n} specifies exact counts; boundaries prevent partial matches.
complete_level(4, used_hint=False, success=True)

In [None]:
# Level 5: Anchors (ID=5)
import re
level_id = 5
sample_text = "START task A\nMiddle line\nEND42"
# Task: 1) Match 'START' only at line start. 2) Capture trailing digits at end of last line.
pat_start = r"^START"  # with MULTILINE
pat_end_digits = r"\d+$"  # end digits
start_match = re.findall(pat_start, sample_text, flags=re.MULTILINE)
end_digits = re.findall(pat_end_digits, sample_text, flags=re.MULTILINE)
print('Line-start:', start_match)
print('End digits:', end_digits)
register_hint(level_id, 'Use ^ and $ with re.MULTILINE to anchor per line.')

In [None]:
# Level 5 Solution
# Explanation: ^ and $ match start/end of line with MULTILINE flag.
complete_level(5, used_hint=False, success=True)

In [None]:
# Level 6: Groups & Alternation (ID=6)
import re
level_id = 6
sample_text = "Skills: Python, Java, C++, C, Rust, Python, Java"
# Task: Capture only (Python|Java|C\+\+|C) occurrences and count frequency.
pat_langs = r"\b(Python|Java|C\+\+|C)\b"  # TODO extend
langs = re.findall(pat_langs, sample_text)
from collections import Counter
print('Matches:', langs)
print('Frequency:', Counter(langs))
register_hint(level_id, 'Use grouping ( ... ) with alternation | ; escape + in C++.')

In [None]:
# Level 6 Solution
# Explanation: Group with alternation; \b ensures whole-word matches.
complete_level(6, used_hint=False, success=True)

In [None]:
# Level 7: Substitution & Backreferences (ID=7)
import re
level_id = 7
sample_text = "Error: file file.txt not found. Path path/config missing."  
# Task: Remove immediate duplicated words (case-sensitive) like 'file file'.
# Pattern idea: (\b\w+\b) \1
pat_dupe = r"\b(\w+) \1\b"  # TODO refine
cleaned = re.sub(pat_dupe, r"\1", sample_text)
print('Original:', sample_text)
print('Cleaned :', cleaned)
register_hint(level_id, 'Capture a word then refer with \\1; use word boundaries.')

In [3]:
# Level 7 Solution
# Explanation: (\w+) captures a word; \1 reuses it; substitute single instance.
complete_level(7, used_hint=False, success=True)

Level 7 COMPLETE â†’ +110 points (streak=1)


In [None]:
# Level 8: Greedy vs Non-Greedy (ID=8)
import re
level_id = 8
sample_text = "<tag>One</tag><tag>Two</tag><tag>Three</tag>"
# Task: Extract each tag content separately using a non-greedy pattern.
pat_greedy = r"<tag>.*</tag>"           # Greedy (will swallow everything)
pat_nongreedy = r"<tag>.*?</tag>"       # Non-greedy
print('Greedy  :', re.findall(pat_greedy, sample_text))
print('NonGreedy:', re.findall(pat_nongreedy, sample_text))
contents = [m[len('<tag>'):-len('</tag>')] for m in re.findall(pat_nongreedy, sample_text)]
print('Contents:', contents)
register_hint(level_id, 'Use .*? to stop at earliest closing tag.')

In [None]:
# Level 8 Solution
# Explanation: .*? minimal expansion prevents spanning multiple tags.
complete_level(8, used_hint=False, success=True)

In [None]:
# Level 9: Compile & Flags (ID=9)
import re
level_id = 9
sample_text = "Emails: Mukesh@Example.com ALICE@test.org bob.DATA@demo.IO"
# Task: Case-insensitively extract all emails and normalize domain to lowercase.
email_pat = re.compile(r"\b[\w.]+@[\w.]+\b", re.IGNORECASE)
emails = email_pat.findall(sample_text)
normalized = [local+'@'+domain.lower() for local,domain in [e.split('@') for e in emails]]
print('Found:', emails)
print('Normalized:', normalized)
register_hint(level_id, 'Use re.compile with IGNORECASE; split at @ to rebuild.')

In [None]:
# Level 9 Solution
# Explanation: Compiled pattern improves reuse; IGNORECASE handles mixed case.
complete_level(9, used_hint=False, success=True)

In [None]:
# Level 10: findall vs finditer (ID=10)
import re
level_id = 10
sample_text = "Phones: 903-366-9661, 800-123-0000, 999-111-2222"
pat_phone = r"\b\d{3}-\d{3}-\d{4}\b"
all_list = re.findall(pat_phone, sample_text)
iter_spans = [(m.group(), m.start(), m.end()) for m in re.finditer(pat_phone, sample_text)]
print('findall list:', all_list)
print('finditer spans:', iter_spans)
register_hint(level_id, 'Use finditer for positions; findall for just matched strings.')

In [None]:
# Level 10 Solution
# Explanation: finditer exposes positional metadata for advanced processing.
complete_level(10, used_hint=False, success=True)

In [None]:
# Level 11: Lookarounds (ID=11)
import re
level_id = 11
sample_text = "Contact: 903-366-9661 TEMP 800-123-0000 Contact: 777-888-9999"
# Task: Extract phone numbers that appear after 'Contact:' but NOT followed by ' TEMP'.
# Use positive lookbehind for 'Contact: ' and negative lookahead for ' TEMP'.
pat_select = r"(?<=Contact: )\d{3}-\d{3}-\d{4}(?! TEMP)"
selected = re.findall(pat_select, sample_text)
print('Selected phones:', selected)
register_hint(level_id, 'Combine (?<=Contact: ) with (?! TEMP) after the number.')

In [None]:
# Level 11 Solution
# Explanation: Lookbehind asserts prefix; negative lookahead excludes TEMP.
complete_level(11, used_hint=False, success=True)

In [None]:
# Level 12: Negative Lookbehind / Filtering (ID=12)
import re
level_id = 12
sample_text = "test_user@example.com real.user@demo.org prod.admin@main.io test123@demo.org"
# Task: Extract emails NOT starting with 'test' (exclude those where local begins with 'test').
# Use negative lookahead or lookbehind anchored at start of local part.
pat_emails = r"\b(?!test)[A-Za-z0-9._]+@[A-Za-z0-9._]+\b"
filtered = re.findall(pat_emails, sample_text)
print('Filtered emails:', filtered)
register_hint(level_id, 'Apply (?!test) just after word boundary before local part.')

In [None]:
# Level 12 Solution
# Explanation: (?!test) at start prevents matching locals beginning with 'test'.
complete_level(12, used_hint=False, success=True)

In [None]:
# Mini Boss: Phone Normalizer (ID=100)
import re
level_id = 100
sample_text = "+1 903-366-9661; (903) 366-9661; 903.366.9661; 903-366-9661"
# Task: Extract all variants and normalize to 903-366-9661 format (drop country code).
pat_variants = r"(?:(?:\+1\s)?(?:\(\d{3}\)|\d{3})[ .-]?\d{3}[ .-]?\d{4})"
raw_numbers = re.findall(pat_variants, sample_text)
normalized = []
for n in raw_numbers:
    digits = re.sub(r"\D", "", n)
    normalized.append(f"{digits[-10:-7]}-{digits[-7:-4]}-{digits[-4:]}")
print('Raw:', raw_numbers)
print('Normalized:', normalized)
register_hint(level_id, 'Capture digits then reformat with slicing; ignore country code.')

In [None]:
# Mini Boss Solution
# Explanation: Flexible pattern variants; digits stripped and sliced into groups.
complete_level(100, used_hint=False, success=True)

In [None]:
# Boss: Contact Directory Parser (ID=200)
import re, json
level_id = 200
sample_text = """Name: Mukesh\nEmail: mukesh@example.com\nPhone: 903-366-9661\n---\nName: Alice A.\nEmail: ALICE@test.org\nPhone: (800) 123-0000\n---\nName: Bob\nEmail: bob.DATA@demo.IO\nPhone: +1 777-888-9999"""
# Task: Produce list of dicts {name,email,phone_normalized}
block_pat = re.compile(r"Name: (?P<name>.+?)\nEmail: (?P<email>.+?)\nPhone: (?P<phone>.+?)(?:\n---|$)", re.DOTALL)
records = []
for m in block_pat.finditer(sample_text):
    d = m.groupdict()
    digits = re.sub(r"\D", "", d['phone'])
    norm_phone = f"{digits[-10:-7]}-{digits[-7:-4]}-{digits[-4:]}"
    local, domain = d['email'].split('@')
    d['email'] = local + '@' + domain.lower()
    d['phone'] = norm_phone
    records.append(d)
print('Parsed Records:', records)
register_hint(level_id, 'Use DOTALL to span lines; named groups for clarity.')

In [None]:
# Boss Solution
# Explanation: Named groups + DOTALL; normalization of phone and email domain.
complete_level(200, used_hint=False, success=True)

In [None]:
# Tournament: Comprehensive Cleanup (ID=300)
import re, json
level_id = 300
sample_text = """### CONTACT DUMP ###\nContact: Mukesh <mukesh@Example.COM> Phone: 903.366.9661 NOTE DUP\nContact: Mukesh <mukesh@Example.COM> Phone: (903) 366-9661\nContact: Alice <ALICE@test.org> Phone: +1 800-123-0000 TEMP\nContact: Bob <bob.DATA@demo.IO> Phone: 777-888-9999\nFooter: Generated 2025"""
# Goals:
# 1) Extract unique contacts (ignore duplicates by email + phone normalized).
# 2) Exclude lines with 'TEMP'.
# 3) Normalize phone to ###-###-####, email domain lowercased.
# 4) Output JSON list.
line_pat = re.compile(r"Contact: (?P<name>[A-Za-z]+) <(?P<email>[\w.]+@[\w.]+)> Phone: (?P<phone>[^ ]+)", re.IGNORECASE)
seen = set()
out = []
for line in sample_text.splitlines():
    if 'TEMP' in line:  # exclude
        continue
    m = line_pat.search(line)
    if not m:
        continue
    d = m.groupdict()
    digits = re.sub(r"\D", "", d['phone'])
    norm_phone = f"{digits[-10:-7]}-{digits[-7:-4]}-{digits[-4:]}"
    local, domain = d['email'].split('@')
    norm_email = local + '@' + domain.lower()
    key = (norm_email, norm_phone)
    if key in seen:
        continue
    seen.add(key)
    out.append({'name': d['name'], 'email': norm_email, 'phone': norm_phone})
print('Tournament Output JSON:')
print(json.dumps(out, indent=2))
register_hint(level_id, 'Use a set of tuples (email, phone) to filter duplicates.')

In [None]:
# Tournament Solution
# Explanation: Pattern extraction + normalization + duplicate filtering by key tuple.
complete_level(300, used_hint=False, success=True)
show_status()