<a href="https://colab.research.google.com/github/emiliawisnios/Social-and-Public-Policy-python/blob/main/Notebooks/Social_and_Public_Policy_Coding_Python_24_10_24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In today's class we will focus on regular expressions (regex).

Next time we will work on data scraping.

## What is Regex?

**Regular Expressions (regex)** are sequences of characters that define a search pattern, mainly used for string matching and manipulation. Think of regex as the grammar rules for the language of strings.

> **Example:** If you want to find all the occurrences of "cat" in a text, regex can help you locate them effortlessly!

##  Why Use Regex?

- **Efficiency:** Perform complex search and replace operations with minimal code.
- **Flexibility:** Handle a wide variety of text processing tasks.
- **Power:** Access advanced string manipulation features not easily achievable otherwise.


##  Basic Regex Syntax


| Regex | Description | Example |
|-------|-------------|---------|
| `.` | Matches any single character except newline | `c.t` matches `cat`, `cot`, `cut` |
| `^` | Start of the string | `^Hello` matches any string that starts with `Hello` |
| `$` | End of the string | `world$` matches any string that ends with `world` |
| `*` | Matches 0 or more repetitions | `ca*t` matches `ct`, `cat`, `caat`, `caaat` |
| `+` | Matches 1 or more repetitions | `ca+t` matches `cat`, `caat`, `caaat` but not `ct` |
| `?` | Makes the preceding token optional | `ca?t` matches `ct` or `cat` |
| `\d` | Matches any digit | `\d` matches `5` in `a5b` |
| `\w` | Matches any word character (alphanumeric & underscore) | `\w` matches `a`, `5`, `_` |
| `[abc]` | Matches any one character inside the brackets | `[cb]at` matches `cat`, `bat` |
| `( )` | Groups multiple tokens together | `(cat\|dog)` matches `cat` or `dog` |

##  Using Regex in Python

Python's `re` module provides support for regex operations. Here's a quick overview of commonly used functions:

- **`re.search(pattern, string)`**: Searches for the pattern anywhere in the string.
- **`re.match(pattern, string)`**: Checks if the pattern matches at the beginning of the string.
- **`re.findall(pattern, string)`**: Returns all non-overlapping matches of the pattern in the string.
- **`re.sub(pattern, repl, string)`**: Replaces matches of the pattern with `repl`.


```python
import re

# Let's find all the vowels in a sentence
sentence = "Why did the chicken join a band?"
vowels = re.findall(r'[aeiou]', sentence, re.IGNORECASE)
print("Vowels found:", vowels)
```

**Output:**
```
Vowels found: ['e', 'i', 'e', 'i', 'o', 'i', 'a', 'a']
```

### 1. Matching Silly Emails

Imagine you want to find email addresses.

In [None]:
import re

# Sample text with silly emails
text = """
Here are some emails:
- john.doe@example.com
- funny_bunny@hoppity-hop.org
- pirate!@shipwreck.net
- unicorn@magic.realm
"""

# Regex pattern to match silly emails (allowing some unusual characters)
pattern = r'\b[\w\.-]+@[a-zA-Z0-9-]+\.[a-zA-Z\.]{2,}\b'

emails = re.findall(pattern, text)
print("Emails Found:")
for email in emails:
    print(email)

*Note: `pirate!@shipwreck.net` is excluded because `!` is not typically allowed in email usernames.*

### 2. Finding Pirate Speak

In [None]:
import re

# Sample sentence
sentence = "Hello there! How are you doing today?"

# Define pirate substitutions
substitutions = {
    r'\bHello\b': 'Ahoy',
    r'\bthere\b': 'matey',
    r'\byou\b': 'ye',
    r'\bdoing\b': 'doin\'",
    r'\btoday\b': 't\'day',
    r'\bHow are\b': 'How be',
}

# Apply substitutions
for pattern, repl in substitutions.items():
    sentence = re.sub(pattern, repl, sentence)

print("Pirate Speak:")
print(sentence)

### 3. Detecting Unicorns

Let's create a regex to detect mentions of unicorns in a whimsical text.

In [None]:
import re

# Sample text
text = """
Once upon a time, a unicorn named Sparkle wandered into the enchanted forest.
Another unicorn, Rainbow, joined Sparkle on her magical journey.
But where is the unicorn now?
"""

# Regex pattern to find 'unicorn' followed by a name
pattern = r'unicorn named (\w+)'

names = re.findall(pattern, text)
print("Unicorn Names Found:")
for name in names:
    print(name)

*Note: Only "Sparkle" is captured because "Rainbow" is introduced differently.*

## Tips and Tricks

- **Use Raw Strings:** Prefix your regex patterns with `r` to avoid issues with escape characters. e.g., `r'\d+'`
  
  ```python
  pattern = r'\d+'
  ```
  
- **Compile Your Patterns:** For repeated use, compile your regex for better performance.
  
  ```python
  compiled_pattern = re.compile(r'\bcat\b')
  matches = compiled_pattern.findall("The cat sat on the mat.")
  ```
  
- **Verbose Mode:** Use `re.VERBOSE` to write more readable regex patterns with comments.
  
  ```python
  pattern = re.compile(r"""
      \b          # word boundary
      cat         # match 'cat'
      \b          # word boundary
      """, re.VERBOSE)
  ```
  
- **Test Your Regex:** Websites like [regex101.com](https://regex101.com/) are excellent for testing and debugging your regex patterns.

## Homework

Calculate how many times word `coffee` appears in the text below.

In [None]:
text = '''
The aroma of freshly brewed coffee filled the air as Amelia stumbled out of bed. She shuffled to the kitchen, her eyelids heavy with sleep, and fumbled for the coffee maker.  "Ah, coffee," she sighed contentedly after the first sip, "the elixir of life."

With a jolt of caffeine coursing through her veins, Amelia started her day.  She glanced at the newspaper while sipping her coffee, the headlines blurring into a jumble of words.  Suddenly, a peculiar story caught her eye: "Coffee Bean Bandit Strikes Again!" Apparently, a mysterious figure was stealing coffee beans from local shops, leaving behind only a single, perfectly roasted coffee bean as a calling card.

Intrigued, Amelia finished her coffee and headed to her favorite coffee shop, "The Daily Grind."  The owner, a jittery man named Bob, was frantically pacing behind the counter. "He took my best Sumatran!" he wailed, "The rarest coffee this side of the equator!"  Amelia, fueled by her morning coffee and a newfound sense of adventure, decided to investigate.

Her first lead came from a barista at another coffee shop who claimed to have seen a shadowy figure carrying a large sack of coffee beans.  The trail led her to a dimly lit alley where the lingering scent of coffee hung heavy in the air.  Following the aroma, she stumbled upon a hidden door.

Inside, she found a secret coffee lair, filled with sacks of coffee beans from all over the world.  And there, in the center of it all, stood a man with a steaming cup of coffee in his hand.  He introduced himself as Bartholomew, a coffee enthusiast driven mad by the inferior quality of coffee in the city.  He was on a quest to create the perfect blend, and these stolen coffee beans were his ingredients.

Amelia, though initially taken aback, understood his passion.  She, too, knew the importance of a good cup of coffee.  They talked for hours, debating the merits of different roasts and brewing methods.  By the end, they had formed an unlikely friendship, bonded by their shared love of coffee.  Bartholomew, realizing the error of his ways, decided to return the stolen coffee beans and open his own coffee shop, where he could share his passion with the world.  Amelia, of course, became his first regular customer.

'''