
-----

# **`Regular Expressions`**

-----



## **Regular Expressions in Python**

Regular expressions are a powerful tool for searching, matching, and manipulating text. They allow you to specify patterns for strings and provide a flexible way to validate, search, and manipulate text.

#### 1. **Introduction to Regular Expressions**

- **Definition**: A regular expression is a sequence of characters that defines a search pattern. This pattern can be used for string matching and manipulation.
- **Use Cases**: Validating input, searching text, replacing substrings, splitting strings, etc.

#### 2. **Python `re` Module**

Python provides the `re` module, which contains functions and classes for working with regular expressions.

```python
import re
```

#### 3. **Basic Functions in the `re` Module**

1. **`re.match(pattern, string)`**
   - Checks for a match only at the beginning of the string.
   - Returns a match object if found; `None` otherwise.
   ```python
   result = re.match(r'Hello', 'Hello, World!')
   print(result)  # Output: <re.Match object; span=(0, 5), match='Hello'>
   ```

2. **`re.search(pattern, string)`**
   - Searches the entire string for the first match of the pattern.
   - Returns a match object if found; `None` otherwise.
   ```python
   result = re.search(r'World', 'Hello, World!')
   print(result)  # Output: <re.Match object; span=(7, 12), match='World'>
   ```

3. **`re.findall(pattern, string)`**
   - Returns a list of all matches of the pattern in the string.
   ```python
   result = re.findall(r'o', 'Hello, World!')
   print(result)  # Output: ['o', 'o']
   ```

4. **`re.finditer(pattern, string)`**
   - Returns an iterator yielding match objects for all matches.
   ```python
   for match in re.finditer(r'o', 'Hello, World!'):
       print(match)  # Outputs each match object
   ```

5. **`re.sub(pattern, repl, string)`**
   - Replaces occurrences of the pattern with a replacement string.
   ```python
   result = re.sub(r'World', 'Python', 'Hello, World!')
   print(result)  # Output: Hello, Python!
   ```

6. **`re.split(pattern, string)`**
   - Splits the string by the occurrences of the pattern.
   ```python
   result = re.split(r'\s+', 'Hello World! This is Python.')
   print(result)  # Output: ['Hello', 'World!', 'This', 'is', 'Python.']
   ```

#### 4. **Regular Expression Syntax**

- **Literal characters**: Match themselves (e.g., `a`, `1`, `@`).
  
- **Metacharacters**: Special characters that have a particular meaning:
  - `.` : Matches any character except a newline.
  - `^` : Matches the start of the string.
  - `$` : Matches the end of the string.
  - `*` : Matches 0 or more repetitions of the preceding element.
  - `+` : Matches 1 or more repetitions of the preceding element.
  - `?` : Matches 0 or 1 repetition of the preceding element.
  - `{n}` : Matches exactly `n` occurrences of the preceding element.
  - `{n,}` : Matches `n` or more occurrences.
  - `{n,m}` : Matches between `n` and `m` occurrences.

- **Character classes**: Enclosed in square brackets `[]`, matches any single character within the brackets.
  - `[abc]` : Matches `a`, `b`, or `c`.
  - `[a-z]` : Matches any lowercase letter.
  - `[^abc]` : Matches any character except `a`, `b`, or `c`.

- **Predefined character classes**:
  - `\d` : Matches any digit (equivalent to `[0-9]`).
  - `\D` : Matches any non-digit.
  - `\w` : Matches any alphanumeric character (equivalent to `[a-zA-Z0-9_]`).
  - `\W` : Matches any non-alphanumeric character.
  - `\s` : Matches any whitespace character (space, tab, newline).
  - `\S` : Matches any non-whitespace character.

- **Grouping and capturing**: Use parentheses `()` to group parts of a pattern.
  - `(abc)` : Matches the exact string `abc`.
  - `(a|b)` : Matches either `a` or `b`.

#### 5. **Flags**

Flags allow you to modify the behavior of the regex engine. Common flags include:

- `re.IGNORECASE` or `re.I`: Ignore case when matching.
- `re.MULTILINE` or `re.M`: Changes the behavior of `^` and `$` to match the start and end of each line.
- `re.DOTALL` or `re.S`: Makes `.` match any character, including newline.

Example of using a flag:
```python
result = re.search(r'hello', 'Hello, World!', re.IGNORECASE)
print(result)  # Output: <re.Match object; ...>
```

#### 6. **Examples**

1. **Validating Email Addresses**:
   ```python
   email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
   email = 'example@example.com'
   print(re.match(email_pattern, email) is not None)  # Output: True
   ```

2. **Extracting Dates**:
   ```python
   text = "Today is 2023-10-26."
   date_pattern = r'\d{4}-\d{2}-\d{2}'
   date = re.search(date_pattern, text)
   print(date.group())  # Output: 2023-10-26
   ```

3. **Replacing Patterns**:
   ```python
   text = "My phone number is 123-456-7890."
   new_text = re.sub(r'\d{3}-\d{3}-\d{4}', 'XXX-XXX-XXXX', text)
   print(new_text)  # Output: My phone number is XXX-XXX-XXXX.
   ```

### **Conclusion**

Regular expressions in Python are a powerful tool for text processing and manipulation. Understanding the syntax and functions provided by the `re` module allows you to efficiently search, validate, and manipulate strings based on complex patterns. If you have any specific questions or need further examples, feel free to ask!


------


#### **Let's Practice** 

----

#### **re.match()**

   - Checks for a match only at the beginning of the string.
   - Returns a match object if found; `None` otherwise.

In [5]:
import re

pattren = r"\d+"   # matches 1 or more digits
string = "1234abc456"
result = re.match(pattren, string)
result.group()

'1234'

#### **re.search(pattern, string)**
   - Searches the entire string for the first match of the pattern.
   - Returns a match object if found; `None` otherwise.
   ```python

In [6]:
import re

pattren = r"\d+"   # matches 1 or more digits
string = "abc1234abc456"
result = re.search(pattren, string)
result.group()

'1234'

#### **re.findall(pattern, string)**
   - Returns a list of all matches of the pattern in the string.

In [9]:
import re

pattren = r"\d+"   # matches 1 or more digits
string = "abc1234abc456"
result = re.findall(pattren, string)
result

['1234', '456']

#### **re.finditer(pattern, string)**
   - Returns an iterator yielding match objects for all matches.

In [16]:
import re

pattren = r"\d+"   # matches 1 or more digits
string = "abc123def456ghi789"
result = re.finditer(pattren, string )

for i in result:
    print(i.group(), i.span())

123 (3, 6)
456 (9, 12)
789 (15, 18)


#### **re.sub(pattern, repl, string)**
   - Replaces occurrences of the pattern with a replacement string.

In [17]:
import re

pattren = r"\d+"   # matches 1 or more digits
string = "abc1234abc456"
result = re.sub(pattren, "#" ,string)
result

'abc#abc#'

#### **re.split(pattern, string)**
   - Splits the string by the occurrences of the pattern.

In [19]:
import re

pattren = r"\d+"   # matches 1 or more digits
string = "abc1234def456ijk"
result = re.split(pattren,string)
result

['abc', 'def', 'ijk']


### **Common Regex Pattrens**
 
- **Email Validation**: `r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"`

- **URL Validation** : `r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"`
  
- **IP Address Validation**: `r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b"`


#### **Extracting Email from Text**

In [26]:
text = "Please contact us at support@example.com or sales@example.co.uk. or madlnaeem0@gmail.com"
email_pattern = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"
email=re.findall(email_pattern,text)
email

['support@example.com', 'sales@example.co.uk.', 'madlnaeem0@gmail.com']

#### **Replacing Number With Charaters**

In [28]:
number = "+923044181428 this is my phone number." 
re.sub(r"\d","*",number)

'+************ this is my phone number.'

#### **Splitting a String by Multiple Delimiters**

In [30]:

text = "apple, orange; banana|grapes"
split_pattern = r"[,;|]"
fruits = re.split(split_pattern, text )
fruits

['apple', ' orange', ' banana', 'grapes']

--------