# Regular Expressions

### What are Regular Expressions?

- Regular expressions (regex) are patterns used to match character combinations in strings.
- They are used for searching, matching, and manipulating text based on specific patterns.

In [1]:
import re

### Basic Regular Expression Syntax

In [None]:
# .       - Any character except newline
# ^       - Start of string
# $       - End of string
# *       - 0 or more repetitions
# +       - 1 or more repetitions
# ?       - 0 or 1 repetition
# {n}     - Exactly n repetitions
# {n,}    - n or more repetitions
# {n,m}   - Between n and m repetitions
# [abc]   - Any of a, b, or c
# [^abc]  - Not a, b, or c
# a|b     - Either a or b
# ()      - Grouping

In [None]:
# Practice Questions:
# 1. What does the '^' symbol represent in regular expressions?
# 2. Write a regex pattern to match any string that starts with 'a' and ends with 'z'.

### Common Functions and Methods in Python

### 1. `re.match()`
- Determines if the regex matches at the beginning of the string.

In [16]:
pattern = r'h\w*o'
text = 'hello world'
match = re.match(pattern, text)
if match:
    print('Match found:', match)
    print('Match found:', match.group())
else:
    print('No match')

Match found: <re.Match object; span=(0, 5), match='hello'>
Match found: hello


### 2. `re.search()`
- Searches for the first location where the regex matches.

In [7]:
pattern = r'world'
search = re.search(pattern, text)
if search:
    print('Match found:', search)
    print('Match found:', search.group())
else:
    print('No match')

Match found: <re.Match object; span=(6, 11), match='world'>
Match found: world


### 3. re.findall()
- Returns all non-overlapping matches of the pattern in the string as a list.

In [10]:
pattern = r'\d+'  # Matches one or more digits
text = 'There are 123 apples and 45 bananas.'
all_matches = re.findall(pattern, text)
print('All matches:', all_matches)

All matches: ['123', '45']


### 4. `re.sub()`
- Replaces the matches with the specified string.

In [11]:
pattern = r'apples'
replacement = 'oranges'
new_text = re.sub(pattern, replacement, text)
print('Replaced text:', new_text)

Replaced text: There are 123 oranges and 45 bananas.


In [None]:
# Practice Questions:
# 1. Use re.match() to check if a string starts with 'Python'.
# 2. Use re.search() to find the first occurrence of a digit in a string.
# 3. Use re.findall() to find all words in a string.
# 4. Use re.sub() to replace all spaces in a string with hyphens.

In [None]:
# Answer:
# 1. re.match(r'^Python', text)
# 2. re.search(r'\d', text)
# 3. re.findall(r'\b\w+\b', text)
# 4. re.sub(r' ', '-', text)

### Practical Examples

In [None]:
# Example 1: Validate an Email Address
pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$'
email = 'example@example.com'
if re.match(pattern, email):
    print('Valid email')
else:
    print('Invalid email')

In [13]:
# Example 2: Extract Dates from a Text
pattern = r'\b\d{2}/\d{2}/\d{4}\b'
text = 'The event is scheduled for 12/05/2021 and 15/06/2021.'
dates = re.findall(pattern, text)
print('Dates found:', dates)

Dates found: ['12/05/2021', '15/06/2021']


In [14]:
# Example 3: Split a String by Multiple Delimiters
pattern = r'[;, \n]+'
text = 'apple;orange,banana\npear'
fruits = re.split(pattern, text)
print('Fruits:', fruits)

Fruits: ['apple', 'orange', 'banana', 'pear']


In [None]:
# Practice Questions:
# 1. Write a regex pattern to validate a phone number (e.g., (123) 456-7890).
# 2. Extract all hashtags from a text (e.g., "This is a #hashtag example.").
# 3. Split a string by commas, semicolons, and spaces.

In [None]:
# Answer:
# 1. r'^\(\d{3}\) \d{3}-\d{4}$'
# 2. re.findall(r'#\w+', text)
# 3. re.split(r'[;, ]+', text)

In [None]:
# Complete Example: Parsing a Log File

# Step-by-step process:

In [17]:
# 1. Read the log file
log_data = """
127.0.0.1 - - [28/Jul/2021:10:22:04] "GET /index.html HTTP/1.1" 200 1043
192.168.1.1 - - [28/Jul/2021:10:22:05] "POST /login HTTP/1.1" 200 2205
"""

In [24]:
# 2. Define a regex pattern to extract IP addresses, timestamps, and request methods
pattern = r'(\d+\.\d+\.\d+\.\d+) - - \[(.*?)\] "(.*?)"'

In [25]:
# 3. Use re.findall() to extract the data
log_entries = re.findall(pattern, log_data)

In [26]:
log_entries

[('127.0.0.1', '28/Jul/2021:10:22:04', 'GET /index.html HTTP/1.1'),
 ('192.168.1.1', '28/Jul/2021:10:22:05', 'POST /login HTTP/1.1')]

In [27]:
# 4. Print the extracted data
for entry in log_entries:
    print('IP Address:', entry[0])
    print('Timestamp:', entry[1])
    print('Request:', entry[2])
    print('-' * 40)

IP Address: 127.0.0.1
Timestamp: 28/Jul/2021:10:22:04
Request: GET /index.html HTTP/1.1
----------------------------------------
IP Address: 192.168.1.1
Timestamp: 28/Jul/2021:10:22:05
Request: POST /login HTTP/1.1
----------------------------------------


In [None]:
# Practice Questions:
# 1. Write a regex pattern to extract the status codes and response sizes from the log entries.
# 2. Modify the log parsing example to save the extracted data into a CSV file.

# Answer:
# 1. r'(\d+\.\d+\.\d+\.\d+) - - \[(.*?)\] "(.*?)" (\d{3}) (\d+)'
# 2. Use the csv module to write the extracted data to a CSV file.

---
_**Your Dataness**_,  
`Obinna Oliseneku` (_**Hybraid**_)  
**[LinkedIn](https://www.linkedin.com/in/obinnao/)** | **[GitHub](https://github.com/hybraid6)**  