In [1]:
import re

## Basic Syntax

- `.`: Matches any single character except newline
- `^`: Matches the start of the string
- `$`: Matches the end of the string
- `*`: Matches 0 or more repetitions of the preceding element
- `+`: Matches 1 or more repetitions of the preceding element
- `?`: Matches 0 or 1 repetition of the preceding element
- `{n}`: Matches exactly n repetitions of the preceding element
- `{n,}`: Matches at least n repetitions of the preceding element
- `{n,m}`: Matches between n and m repetitions of the preceding element
- `|`: Alternation, matches either the pattern before or the pattern after the symbol

## Character Classes

- `[abc]`: Matches any one of the characters a, b, or c
- `[^abc]`: Matches any character that is not a, b, or c
- `[a-z]`: Matches any character from a to z
- `[A-Z]`: Matches any character from A to Z
- `[0-9]`: Matches any digit
- `\d`: Matches any digit (equivalent to [0-9])
- `\D`: Matches any non-digit
- `\w`: Matches any word character (equivalent to [a-zA-Z0-9_])
- `\W`: Matches any non-word character
- `\s`: Matches any whitespace character
- `\S`: Matches any non-whitespace character

## Special Characters

- `\`: Escapes a special character
- `()` : Defines a group
- `(?:...)`: Non-capturing group
- `(?=...)`: Positive lookahead assertion
- `(?!...)`: Negative lookahead assertion

## Examples

- `abc`: Matches the string "abc"
- `abc|def`: Matches "abc" or "def"
- `^abc`: Matches any string that starts with "abc"
- `abc$`: Matches a string that ends with "abc"
- `a.b`: Matches any string containing "a", any character, then "b"
- `a*`: Matches 0 or more 'a's
- `a+`: Matches 1 or more 'a's
- `a?`: Matches 0 or 1 'a'
- `\d{2,4}`: Matches between 2 and 4 digits

In [3]:
s = '''
<a href="https://amazon.com/categories/ski">Ski</a>
<a href="https://amazon.com/p/1234567890/awesome-product-1">Coffee beans</a>
<a href="https://amazon.com/p/6454343333/ok-product-2">Backcountry Ski</a>
<a href="https://amazon.com/p/6543565454/great-product-1">Book</a>
<a href="https://amazon.com/about-us">About Us</a>
'''

In [None]:
# Extract only the product links
# Expected output:
# https://amazon.com/p/1234567890/awesome-product-1
# https://amazon.com/p/6454343333/ok-product-2
# https://amazon.com/p/6543565454/great-product-1

re.findall(r'<a href="https://amazon.com/p/1234567890/awesome-product-1">', s)

['<a href="https://amazon.com/p/1234567890/awesome-product-1">']

In [4]:
re.findall(r'<a href="https://amazon.com/p/\d+/.+">', s)

['<a href="https://amazon.com/p/1234567890/awesome-product-1">',
 '<a href="https://amazon.com/p/6454343333/ok-product-2">',
 '<a href="https://amazon.com/p/6543565454/great-product-1">']

In [5]:
re.findall(r'<a href="(https://amazon.com/p/\d+/.+)">', s)

['https://amazon.com/p/1234567890/awesome-product-1',
 'https://amazon.com/p/6454343333/ok-product-2',
 'https://amazon.com/p/6543565454/great-product-1']

# Practice Problems

### Problem 1: Email Extraction

**Problem**: Extract emails from a given string.  
**String**: "Contact us at support@example.com or sales@example.org"

In [6]:
string_to_search = "Contact us at support@example.com or sales@example.org"

# Regular expression pattern for matching email addresses
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

# Use re.findall to extract email addresses from the string
extracted_emails = re.findall(email_pattern, string_to_search)

print(extracted_emails)


['support@example.com', 'sales@example.org']


### Problem 2: Phone Number Validation

**Problem**: Validate and extract US phone numbers in the format xxx-xxx-xxxx.  
**String**: "My numbers are 123-456-7890 or 333-333-3333"

In [7]:
string_to_search = "My numbers are 123-456-7890 or 333-333-3333"

# Regular expression pattern for matching US phone numbers
phone_number_pattern = r'\b\d{3}-\d{3}-\d{4}\b'

# Use re.findall to extract US phone numbers from the string
extracted_phone_numbers = re.findall(phone_number_pattern, string_to_search)

print(extracted_phone_numbers)


['123-456-7890', '333-333-3333']


### Problem 3: Password Strength Check

**Problem**: Check if a password is at least 8 characters long, contains a digit, an uppercase, and a lowercase letter.  
**String**: "Password1"

In [8]:


def is_valid_password(password):
    # Check if the password is at least 8 characters long
    if len(password) < 8:
        return False

    # Check if the password contains at least one digit
    if not re.search(r'\d', password):
        return False

    # Check if the password contains at least one uppercase letter
    if not re.search(r'[A-Z]', password):
        return False

    # Check if the password contains at least one lowercase letter
    if not re.search(r'[a-z]', password):
        return False

    # If all checks pass, the password is valid
    return True

# Test the function with the given password
password_to_check = "Password1"
result = is_valid_password(password_to_check)

if result:
    print(f"The password '{password_to_check}' is valid.")
else:
    print(f"The password '{password_to_check}' is not valid.")


The password 'Password1' is valid.


### Problem 4: Extracting Domain Name

**Problem**: Extract the domain name from an email address.  
**String**: "user@example.com"

In [9]:
email_address = "user@example.com"

# Regular expression pattern for extracting the domain name from an email address
domain_pattern = r'@([A-Za-z0-9.-]+)'

# Use re.search to find the first match of the pattern in the email address
match = re.search(domain_pattern, email_address)

if match:
    # Extract the domain name from the match
    domain_name = match.group(1)
    print(f"The domain name is: {domain_name}")
else:
    print("No domain name found in the email address.")


The domain name is: example.com


### Problem 5: Validating an IP Address

**Problem**: Check if a string is a valid IPv4 address.  
**String**: "192.168.1.1"

In [11]:
def is_valid_ipv4(ip_address):
    # Regular expression pattern for matching a valid IPv4 address
    ipv4_pattern = r'^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'

    # Use re.match to check if the string matches the IPv4 pattern
    match = re.match(ipv4_pattern, ip_address)

    return bool(match)

# Test the function with the given string
ip_address_to_check = "192.168.1.1"
result = is_valid_ipv4(ip_address_to_check)

if result:
    print(f"The string '{ip_address_to_check}' is a valid IPv4 address.")
else:
    print(f"The string '{ip_address_to_check}' is not a valid IPv4 address.")


The string '192.168.1.1' is a valid IPv4 address.
