# Practice Problems

### Problem 1: Email Extraction

**Problem**: Extract emails from a given string.  
**String**: "Contact us at support@example.com or sales@example.org"

In [2]:
import re

# Given string
text = "Contact us at support@gmail.com or sales@example.org or jackieji@123.com"

# Regular expression for matching email addresses
# [a-zA-Z0-9._%+-]+: Matches the user name part of the email (letters, numbers, dots, underscores, percentages, plus signs, and hyphens)
# @: Matches the @ symbol in the email
# [a-zA-Z0-9.-]+: Matches the domain part of the email (letters, numbers, dots, and hyphens)
# \.[a-zA-Z]{2,}: Matches the domain suffix (like .com, .org), which starts with a dot followed by two or more letters
email_regex = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

# Extracting email addresses using the findall method from the re (regular expression) module
extracted_emails = re.findall(email_regex, text)

# Printing the extracted email addresses
print(extracted_emails)


['support@gmail.com', 'sales@example.org', 'jackieji@123.com']


### Problem 2: Phone Number Validation

**Problem**: Validate and extract US phone numbers in the format xxx-xxx-xxxx.  
**String**: "My numbers are 123-456-7890 or 333-333-3333"

In [3]:
import requests
import re

# URL to extract phone numbers from
url = "https://visitseattle.org/contact-us/"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Regular expression for matching US phone numbers in the format xxx-xxx-xxxx
    phone_number_regex = r'\b\d{3}.\d{3}.\d{4}\b'

    # Extracting phone numbers
    extracted_phone_numbers = re.findall(phone_number_regex, response.text)
    print(extracted_phone_numbers)


['206.461.5888', '206.461.5800', '206.461.5855']


In [5]:
import requests
import re
url = 'https://visitseattle.org/things-to-do/sightseeing/'
response = requests.get(url)
data = response.text
new_pattern = r'\(\d{3}\)\s\d{3}-\d{4}'
# Using re.findall to extract all matching phone numbers from the string
phone_numbers = re.findall(new_pattern, data)
print(phone_numbers)

['(206) 443-2560', '(253) 288-7700', '(800) 464-1476', '(360) 378-1962']


In [9]:
import requests
import re
url = 'https://www.hcde.washington.edu/contact-us'
response = requests.get(url)
data = response.text
pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
email = re.findall(pattern, data)
print(email)

['hcdehelp@uw.edu', 'hcdehelp@uw.edu', 'hcdeweb@uw.edu']


### Problem 3: Password Strength Check

**Problem**: Problem: Check if a password is at least 8 characters long, contains a digit, an uppercase, and a lowercase letter.
**String**: "Password1"

In [3]:
import re

# The given string to be validated
password = "Password1"

# Regular expression to check if password is at least 8 characters long,
# contains at least one digit, one uppercase, and one lowercase letter
regex = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$"

# Using re.match() to check if the regex pattern matches the given password
if re.match(regex, password):
    print("Valid password")
else:
    print("Invalid password")


Valid password


### Problem 4: Extracting Domain Name

**Problem**: Extract the domain name from an email address.  
**String**: "user@example.com"

In [11]:
import re

# Function to extract domain name from an email address
def extract_domain(email):
    # Regular expression for extracting the domain name
    # @: Matches the @ character in the email address
    # ([a-zA-Z0-9.-]+): Captures the domain name (alphanumeric characters, dot, and hyphen)
    # \.: Matches the literal dot before the top-level domain
    domain_name_regex = r'@([a-zA-Z0-9.-]+)\.'

    # Extracting the domain name
    extracted_domain = re.search(domain_name_regex, email)
    return extracted_domain.group(1) if extracted_domain else "Domain not found"

# Email address to extract the domain from
email = "usjackier@foxmail.com"

# Extracting the domain name
domain_name = extract_domain(email)

# Printing the result
print("Domain name:", domain_name)

Domain name: foxmail


### Problem 5: Validating an IP Address

**Problem**: Check if a string is a valid IPv4 address.  
**String**: "192.168.1.1"

In [12]:
import re

# Function to validate an IPv4 address
def validate_ipv4(ip_address):
    # Regular expression for validating an IPv4 address
    # ^ and $: Start and end of the string
    # 25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?: Matches an octet of the IP address
    ipv4_regex = r'^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'

    # Checking if the IP address is valid
    return bool(re.match(ipv4_regex, ip_address))

# IP address to validate
ip_address = "192.168.1.1"

# Validating the IP address
is_valid_ipv4 = validate_ipv4(ip_address)

# Printing the result
print("Is the IP address valid?", is_valid_ipv4)

Is the IP address valid? True


# Reference

### Basic Syntax

- `.`: Matches any single character except newline
- `^`: Matches the start of the string
- `$`: Matches the end of the string
- `*`: Matches 0 or more repetitions of the preceding element
- `+`: Matches 1 or more repetitions of the preceding element
- `?`: Matches 0 or 1 repetition of the preceding element
- `{n}`: Matches exactly n repetitions of the preceding element
- `{n,}`: Matches at least n repetitions of the preceding element
- `{n,m}`: Matches between n and m repetitions of the preceding element
- `|`: Alternation, matches either the pattern before or the pattern after the symbol

### Character Classes

- `[abc]`: Matches any one of the characters a, b, or c
- `[^abc]`: Matches any character that is not a, b, or c
- `[a-z]`: Matches any character from a to z
- `[A-Z]`: Matches any character from A to Z
- `[0-9]`: Matches any digit
- `\d`: Matches any digit (equivalent to [0-9])
- `\D`: Matches any non-digit
- `\w`: Matches any word character (equivalent to [a-zA-Z0-9_])
- `\W`: Matches any non-word character
- `\s`: Matches any whitespace character
- `\S`: Matches any non-whitespace character

### Special Characters

- `\`: Escapes a special character
- `()` : Defines a group
- `(?:...)`: Non-capturing group
- `(?=...)`: Positive lookahead assertion
- `(?!...)`: Negative lookahead assertion

### Examples

- `abc`: Matches the string "abc"
- `abc|def`: Matches "abc" or "def"
- `^abc`: Matches any string that starts with "abc"
- `abc$`: Matches a string that ends with "abc"
- `a.b`: Matches any string containing "a", any character, then "b"
- `a*`: Matches 0 or more 'a's
- `a+`: Matches 1 or more 'a's
- `a?`: Matches 0 or 1 'a'
- `\d{2,4}`: Matches between 2 and 4 digits