# regular expressions module

## 📘 Python `re` Module – Notes

### 🔹 What is `re`?

- `re` stands for **regular expression**.
- It's a built-in Python module used to **search**, **match**, **replace**, or **split** text based on patterns.

---

### ✅ How to use it

```python
import re
```

---

### 🔸 Common `re` Functions

| Function            | Description |
|---------------------|-------------|
| `re.findall()`      | Returns all matches in a list |
| `re.search()`       | Returns the first match (as a Match object) |
| `re.match()`        | Checks for a match **only at the beginning** of the string |
| `re.sub()`          | Replaces parts of the string |
| `re.split()`        | Splits string by pattern |

---

### 🔹 What is a **pattern**?

A pattern is a set of characters used to define what you're looking for in a string.

---

### 🔸 Common Regex Patterns

| Pattern   | Meaning                     | Example Match       |
|-----------|-----------------------------|----------------------|
| `\d`      | Any digit (0–9)              | `2` in `I have 2 cats` |
| `\w`      | Any word character (A–Z, a–z, 0–9, _) | `hello_123` |
| `\s`      | Any whitespace               | Space, tab, newline  |
| `\b`      | Word boundary                | Start/end of words   |
| `.`       | Any character **except newline** | `a`, `1`, `@` etc. |
| `+`       | One or more                  | `\d+` matches `123` |
| `*`       | Zero or more                 | `a*` matches `aaa`, `a`, or `""` |
| `[]`      | Set of characters            | `[aeiou]` matches any vowel |
| `^`       | Start of string              | `^Hello` matches if string starts with Hello |
| `$`       | End of string                | `world$` matches if string ends with world |

---

### 🔹 Raw String (`r''`)

- Write regex patterns using **raw strings**: `r'\d+'`
- Raw string means Python will **not escape backslashes**
- Without `r`, you’d have to write `'\\d+'` — which is harder to read

---

### 🔸 Examples

```python
import re

# Find all words
re.findall(r'\b\w+\b', "Hello world!")  # ['Hello', 'world']

# Find all numbers
re.findall(r'\d+', "I have 2 cats and 10 dogs")  # ['2', '10']

# Replace digits
re.sub(r'\d+', '#', "Room 101")  # 'Room #'

# Split sentence by spaces
re.split(r'\s+', "This is a test")  # ['This', 'is', 'a', 'test']
```

---

### ✅ Tips

- Always use `r''` for regex strings
- Use `re.findall()` if you want all matches
- Use `re.search()` if you only need the first match
- Test your patterns at [regex101.com](https://regex101.com) (select Python flavor)

# re.findall(pattern, string)

Absolutely! Here are some **clear examples** of using `re.findall()` for different tasks. Each example will show how `re.findall()` extracts matches based on a specific pattern.

### 1. **Extract All Words in a Sentence**
   Pattern: `\b\w+\b`

```python
import re

text = "Hello world! This is a test."
words = re.findall(r'\b\w+\b', text)

print(words)  # ['Hello', 'world', 'This', 'is', 'a', 'test']
```

**Explanation**:
- `\b\w+\b` matches any word (letters/numbers/underscores), and ignores punctuation marks.
  
---

### 2. **Extract All Numbers**
   Pattern: `\d+`

```python
import re

text = "There are 3 apples, 15 bananas, and 42 oranges."
numbers = re.findall(r'\d+', text)

print(numbers)  # ['3', '15', '42']
```

**Explanation**:
- `\d+` matches **one or more digits**. It will extract all numbers in the text.
  
---

### 3. **Extract Email Addresses**
   Pattern: `\w+@\w+\.\w+`

```python
import re

text = "Contact us at support@example.com or sales@company.com."
emails = re.findall(r'\w+@\w+\.\w+', text)

print(emails)  # ['support@example.com', 'sales@company.com']
```

**Explanation**:
- `\w+` matches one or more word characters (letters, digits, underscores).
- `@` and `.` are literal characters to match in an email address.
- This pattern extracts all email addresses.

---

### 4. **Extract Words Starting with a Specific Letter**
   Pattern: `\b[Aa]\w+\b` (words starting with 'A' or 'a')

```python
import re

text = "Alice went to the art gallery."
words_with_a = re.findall(r'\b[Aa]\w+\b', text)

print(words_with_a)  # ['Alice', 'art']
```

**Explanation**:
- `[Aa]` matches either an uppercase `A` or a lowercase `a`.
- This pattern extracts all words starting with 'A' or 'a'.

---

### 5. **Extract Dates (DD-MM-YYYY format)**
   Pattern: `\b\d{2}-\d{2}-\d{4}\b`

```python
import re

text = "The event will be held on 25-12-2022 and 01-01-2023."
dates = re.findall(r'\b\d{2}-\d{2}-\d{4}\b', text)

print(dates)  # ['25-12-2022', '01-01-2023']
```

**Explanation**:
- `\d{2}` matches exactly two digits.
- `-` is the literal separator between day, month, and year.
- `\d{4}` matches exactly four digits for the year.

---

### 6. **Extract Hashtags (e.g., #Python, #DataScience)**
   Pattern: `#\w+`

```python
import re

text = "I love #Python and #DataScience!"
hashtags = re.findall(r'#\w+', text)

print(hashtags)  # ['#Python', '#DataScience']
```

**Explanation**:
- `#` is the literal character used in hashtags.
- `\w+` matches one or more word characters after `#`.

---

### 7. **Extract All Words with a Specific Length**
   Pattern: `\b\w{5}\b` (words with exactly 5 characters)

```python
import re

text = "I have a dream of creating great code."
words_with_5 = re.findall(r'\b\w{5}\b', text)

print(words_with_5)  # ['dream', 'great']
```

**Explanation**:
- `\w{5}` matches any word that has exactly **5 characters**.
  
---

### 8. **Extract URLs**
   Pattern: `https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+`

```python
import re

text = "Visit us at https://www.example.com or http://example.org"
urls = re.findall(r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', text)

print(urls)  # ['https://www.example.com', 'http://example.org']
```

**Explanation**:
- `https?://` matches the **http** or **https** protocol in URLs.
- `(?:...)` is a **non-capturing group**, which groups characters without capturing them as separate matches.
  
---

### 9. **Extract All Words Without Vowels**
   Pattern: `\b[^aeiouAEIOU\s\d\W]+\b`

```python
import re

text = "The quick brown fox jumps over the lazy dog."
words_without_vowels = re.findall(r'\b[^aeiouAEIOU\s\d\W]+\b', text)

print(words_without_vowels)  # ['Th', 'qck', 'brwn', 'fx', 'jmps', 'vr', 'th', 'lzy', 'dg']
```

**Explanation**:
- `[^aeiouAEIOU]` matches any character **except vowels** (both lowercase and uppercase).
- This pattern helps extract words that **don’t contain any vowels**.

---

### 10. **Extract All Words with Numbers in Them**
   Pattern: `\b\w*\d\w*\b`

```python
import re

text = "My code is v1.0, and I have 100 items."
words_with_numbers = re.findall(r'\b\w*\d\w*\b', text)

print(words_with_numbers)  # ['v1.0', '100']
```

**Explanation**:
- `\w*\d\w*` matches words that contain **at least one digit**.

---

### Final Summary of `re.findall()`:

- **`re.findall(pattern, string)`** returns all non-overlapping matches of the pattern in the string as a list.
- It's great for **searching through strings** and extracting parts that match a given pattern.

Would you like any of these examples explained in more detail or additional practice examples? 😊

### words

In [7]:
import re

text = 'Jai Sree Ram.....'
result = re.findall(r'\b\w+\b', text)
print(result)


['Jai', 'Sree', 'Ram']


### numbers

In [6]:
import re

text = '1 apple, 2 bananas, 5 oranges, 344 grapes'
result = re.findall(r'\d+', text)
print(result)

['1', '2', '5', '344']


### finding email

In [11]:
import re

t = 'My email is konduriakhil1998@gamil.com'
result = re.findall(r'\w+@\w+\.\w+', t)
print(result)

['konduriakhil1998@gamil.com']


###  Extract Words Starting with a Specific Letter

In [12]:
import re
 
t = 'Akhil is a good allmighty all powerfull Admin'
print(re.findall(r'\b[Aa]\w+\b', t))


['Akhil', 'allmighty', 'all', 'Admin']


### Printing dates

In [13]:
import re

t = 'my dob is 21-06-1998 and by my brother is 15-07-1999'
print(re.findall(r'\b\d{2}-\d{2}-\d{4}\b', t))

['21-06-1998', '15-07-1999']


### Extract #'s

In [16]:
import re

t = 'I Love #apple #bananas #grapes'
print(re.findall(r'#\w+', t))

['#apple', '#bananas', '#grapes']


# Extract All Words with a Specific Length

In [None]:
import re

t = 'This is akhil konduri from martur and good looking, honest'
print(re.findall(r'\b\w{6}\b', t))

['martur', 'honest']


### Extract All Words Without Vowels

In [34]:
import re

t = 'this is english course, please pay attention and enjoy the class ss l ssh'
print(re.findall(r'\b[^aeiouAEIOU\s\d\W]+\b', t))

['ss', 'l', 'ssh']


## re.search(pattern, string)
Great question! In the Python `re` module, both `findall()` and `search()` are used to work with patterns in strings — but they behave **very differently**.

---

## ✅ Difference Between `re.findall()` and `re.search()`

| Feature              | `re.findall()`                                | `re.search()`                             |
|----------------------|-----------------------------------------------|-------------------------------------------|
| **What it returns**  | A **list** of all matches                     | A **Match object** (first match only)     |
| **Returns if no match** | Empty list `[]`                             | Returns `None`                            |
| **Use case**         | When you want **all matches**                 | When you just want to **check for one match** (and get info) |
| **Searches**         | The **entire string**                         | Also searches the **entire string**, but stops at the first match |

---

## 🔍 Examples

### 1. `re.findall()` – All Matches

```python
import re

text = "I have 2 apples and 5 bananas."

numbers = re.findall(r'\d+', text)
print(numbers)  # ['2', '5']
```

> `findall()` gives **all the numbers** found in the string.

---

### 2. `re.search()` – First Match Only

```python
import re

text = "I have 2 apples and 5 bananas."

match = re.search(r'\d+', text)
if match:
    print(match.group())  # 2
```

> `search()` returns a **Match object**, and `match.group()` gives you the first match.

---

### 🚨 Important Notes

- `re.search()` is useful when you want to know **if a match exists**, or want to get **start and end positions** using:
  ```python
  match.start(), match.end()
  ```
- `re.findall()` gives **just the matched strings** as a list.

---

### ✅ Summary

| You want...                         | Use...         |
|-------------------------------------|----------------|
| All matches                         | `re.findall()` |
| Only the first match or existence   | `re.search()`  |
| Match position information          | `re.search()`  |

---

Would you like examples comparing `re.match()` too?


In [40]:
import re

t = 'Candidate name is Konduri Akhil, Akhil is good and Konduri is Akhil sir name'
match = re.search(r'Akhil', t)
if match:
    print('Match found')
else:
    print('Match not found')    


Match found


In [50]:
import re 

t = 'Candidate name is Konduri Akhil, Akhil is good and Konduri is Akhil sir name'
match = re.search(r'Akhil', t)
print('The match is: ', match.group())
print('Starting index of match is: ', match.start())
print('Ending index of the match is: ', match.end())
print('Spaning of the match is: ', match.span())

The match is:  Akhil
Starting index of match is:  26
Ending index of the match is:  31
Spaning of the match is:  (26, 31)


# re.match(pattern, string)
The re.match() function attempts to match a pattern only at the beginning of a string. If the pattern is found at the start, it returns a match object; otherwise, it returns None.

re.match() checks for a match only at the beginning of the string.

If the pattern is found at the start, it returns a match object; otherwise, it returns None.

Use re.search() if you need to search for a pattern anywhere in the string.

In [58]:
import re

t = 'Aham brahma Asmi'
result = re.match(r'Aham', t)
if result:
    print('Match found: ', result.group())
else:
    print('Match not found')    


Match found:  Aham


# re.sub(pattern, replacement, string, count=0, flags=0)

Certainly! The `re.sub()` function in Python's `re` module is used to **replace occurrences of a pattern** in a string with a specified replacement. Here's a comprehensive guide to understanding and using `re.sub()` effectively.

---

### 🔹 Syntax of `re.sub()`

```python
re.sub(pattern, replacement, string, count=0, flags=0)
```

- **`pattern`**: The regular expression pattern to search for.
- **`replacement`**: The string to replace the matched patterns with.
- **`string`**: The input string where replacements will occur.
- **`count`** *(optional)*: The maximum number of pattern occurrences to replace. Defaults to `0`, which means replace all occurrences.
- **`flags`** *(optional)*: Modify the behavior of the pattern matching (e.g., `re.IGNORECASE` for case-insensitive matching).

---

### 🔹 Basic Examples

#### 1. **Replace All Whitespace with a Dash**

```python
import re

text = "The rain in Spain"
result = re.sub(r"\s", "-", text)
print(result)  # Output: The-rain-in-Spain
```

**Explanation**: `\s` matches any whitespace character. All whitespace characters are replaced with `-`.

---

#### 2. **Replace Only the First Two Whitespace Occurrences**

```python
import re

text = "The rain in Spain"
result = re.sub(r"\s", "-", text, count=2)
print(result)  # Output: The-rain-in Spain
```

**Explanation**: Only the first two whitespace characters are replaced with `-` due to `count=2`.

---

#### 3. **Remove All Non-Digit Characters from a Phone Number**

```python
import re

phone = "(212)-456-7890"
cleaned = re.sub(r"\D", "", phone)
print(cleaned)  # Output: 2124567890
```

**Explanation**: `\D` matches any non-digit character. All such characters are removed, leaving only digits.

---

### 🔹 Advanced Usage: Using a Function as Replacement

You can pass a function to `re.sub()` to determine the replacement string dynamically.

#### Example: Capitalize All Words

```python
import re

def capitalize(match):
    return match.group(0).upper()

text = "hello world"
result = re.sub(r"\b\w+\b", capitalize, text)
print(result)  # Output: HELLO WORLD
```

**Explanation**: The function `capitalize` converts each matched word to uppercase.

---

### 🔹 Using Capture Groups in Replacement

You can use capture groups in your pattern and refer to them in the replacement string.

#### Example: Swap First and Last Names

```python
import re

text = "Doe, John"
result = re.sub(r"(\w+), (\w+)", r"\2 \1", text)
print(result)  # Output: John Doe
```

**Explanation**: `(\w+), (\w+)` captures two words separated by a comma. `\2 \1` rearranges them.

---

### 🔹 Common Use Cases

- **Data Cleaning**: Removing unwanted characters or formatting strings.
- **Text Normalization**: Standardizing text for analysis.
- **Pattern-Based Replacements**: Modifying strings based on specific patterns.

---

If you have a specific scenario or need further clarification on any aspect of `re.sub()`, feel free to ask! 

### simple example

In [64]:
import re

t = 'Konduri Akhil'
res = re.sub(r'\s', '.', t)
print(res)

Konduri.Akhil
