Regular expressions (regex) in Python are a powerful tool for matching patterns in text. Python's `re` module provides the necessary functions to work with regular expressions. Here's a basic guide to get you started:

### 1. Importing the `re` Module
First, you need to import the `re` module:

```python
import re
```

### 2. Basic Patterns
- **Literal Characters**: Match exact characters.
  ```python
  pattern = r"hello"
  text = "hello world"
  match = re.search(pattern, text)
  if match:
      print("Match found:", match.group())
  else:
      print("No match")
  ```

- **Special Characters**:
  - `.` : Matches any character except a newline.
  - `^` : Matches the start of the string.
  - `$` : Matches the end of the string.
  - `*` : Matches 0 or more repetitions of the preceding element.
  - `+` : Matches 1 or more repetitions of the preceding element.
  - `?` : Matches 0 or 1 repetition of the preceding element.
  - `{n}` : Matches exactly n repetitions of the preceding element.
  - `{n,}` : Matches n or more repetitions of the preceding element.
  - `{n,m}` : Matches between n and m repetitions of the preceding element.

### 3. Character Classes
- `[abc]` : Matches any one of the characters a, b, or c.
- `[^abc]` : Matches any character except a, b, or c.
- `[a-z]` : Matches any lowercase letter.
- `[A-Z]` : Matches any uppercase letter.
- `[0-9]` : Matches any digit.
- `\d` : Matches any digit (equivalent to `[0-9]`).
- `\D` : Matches any non-digit.
- `\w` : Matches any word character (equivalent to `[a-zA-Z0-9_]`).
- `\W` : Matches any non-word character.
- `\s` : Matches any whitespace character.
- `\S` : Matches any non-whitespace character.

### 4. Grouping and Capturing
- `( )` : Groups patterns and captures the text matched by the pattern inside the parentheses.
- `(?: )` : Groups patterns but does not capture the text.

### 5. Alternation
- `|` : Matches either the pattern before or the pattern after the pipe.

### 6. Escaping Special Characters
- Use a backslash `\` to escape special characters if you want to match them literally.

### 7. Common Functions
- **`re.search(pattern, string)`**: Searches for the pattern anywhere in the string.
- **`re.match(pattern, string)`**: Matches the pattern only at the beginning of the string.
- **`re.findall(pattern, string)`**: Finds all occurrences of the pattern in the string and returns them as a list.
- **`re.finditer(pattern, string)`**: Finds all occurrences of the pattern in the string and returns them as an iterator of match objects.
- **`re.sub(pattern, repl, string)`**: Replaces all occurrences of the pattern in the string with `repl`.

### Example Usage

```python
import re

# Example text
text = "The rain in Spain falls mainly in the plain."

# Search for a pattern
pattern = r"Spain"
match = re.search(pattern, text)
if match:
    print("Match found:", match.group())
else:
    print("No match")

# Find all occurrences of a pattern
pattern = r"in"
matches = re.findall(pattern, text)
print("All matches:", matches)

# Replace a pattern
new_text = re.sub(r"in", "on", text)
print("Replaced text:", new_text)

# Using groups
pattern = r"(\b\w+)\s(\b\w+)"
matches = re.findall(pattern, text)
print("Grouped matches:", matches)
```

### Practice
Try creating your own patterns and test them with different strings to get comfortable with regex in Python. Regular expressions can be complex, but with practice, you'll get the hang of it!

In [7]:
#practice starts from here🧐
import re

<h3>Extract phone numbers</h3>

In [8]:
text='''
Elon musk's phone number is 9991116666, call him if you have any questions on dodgecoin. Tesla's revenue is 40 billion
Tesla's CFO number (999)-333-7777 , Modi's phone : (747)-375-4636
'''
pattern = '\(\d{3}\)-\d{3}-\d{4}|\d{10}'

matches = re.findall(pattern, text)
matches

['9991116666', '(999)-333-7777', '(747)-375-4636']

<h3>Extract Note Titles</h3>

<img src='https://github.com/codebasics/py/blob/master/Advanced/regex/tesla_report_notes.jpg?raw=1' />

In [9]:
text = '''
Note 1 - Overview
Tesla, Inc. (“Tesla”, the “Company”, “we”, “us” or “our”) was incorporated in the State of Delaware on July 1, 2003. We design, develop, manufacture and sell high-performance fully electric vehicles and design, manufacture, install and sell solar energy generation and energy storage
products. Our Chief Executive Officer, as the chief operating decision maker (“CODM”), organizes our company, manages resource allocations and measures performance among two operating and reportable segments: (i) automotive and (ii) energy generation and storage.
Beginning in the first quarter of 2021, there has been a trend in many parts of the world of increasing availability and administration of vaccines
against COVID-19, as well as an easing of restrictions on social, business, travel and government activities and functions. On the other hand, infection
rates and regulations continue to fluctuate in various regions and there are ongoing global impacts resulting from the pandemic, including challenges
and increases in costs for logistics and supply chains, such as increased port congestion, intermittent supplier delays and a shortfall of semiconductor
supply. We have also previously been affected by temporary manufacturing closures, employment and compensation adjustments and impediments to
administrative activities supporting our product deliveries and deployments.
Note 2 - Summary of Significant Accounting Policies
Unaudited Interim Financial Statements
The consolidated balance sheet as of September 30, 2021, the consolidated statements of operations, the consolidated statements of
comprehensive income, the consolidated statements of redeemable noncontrolling interests and equity for the three and nine months ended September
30, 2021 and 2020 and the consolidated statements of cash flows for the nine months ended September 30, 2021 and 2020, as well as other information
disclosed in the accompanying notes, are unaudited. The consolidated balance sheet as of December 31, 2020 was derived from the audited
consolidated financial statements as of that date. The interim consolidated financial statements and the accompanying notes should be read in
conjunction with the annual consolidated financial statements and the accompanying notes contained in our Annual Report on Form 10-K for the year
ended December 31, 2020.
'''

In [10]:
pattern = 'Note \d - ([^\n]*)'
matches = re.findall(pattern, text)
matches

['Overview', 'Summary of Significant Accounting Policies']

<h3>Extract financial periods from a company's financial reporting</h3>

In [11]:
text = '''
The gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.
In previous quarter i.e. FY2020 Q4 it was $3 billion. FY2025 Q3  FY2025 Q3
'''

pattern = 'FY\d{4} Q[1-4]'

matches = re.findall(pattern, text)
matches

['FY2021 Q1', 'FY2020 Q4', 'FY2025 Q3', 'FY2025 Q3']

**Case insensitive pattern match using flags**

In [12]:
text = '''
The gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.
In previous quarter i.e. fy2020 Q4 it was $3 billion.
'''

pattern = 'FY\d{4} Q[1-4]'

matches = re.findall(pattern, text, flags=re.IGNORECASE)
matches

['FY2021 Q1', 'fy2020 Q4']

<h3>Extract only financial numbers</h3>

In [13]:
text = '''
Tesla's gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.
In previous quarter i.e. FY2020 Q4 it was $3 billion.
'''

pattern = '\$([0-9\.]+)'
matches = re.findall(pattern, text)
matches

['4.85', '3']

<h3>Extract periods and financial numbers both</h3>

In [14]:
text = '''
Tesla's gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.
In previous quarter i.e. FY2020 Q4 it was $3 billion.
'''
pattern = 'FY(\d{4} Q[1-4])[^\$]+\$([0-9\.]+)'

matches = re.findall(pattern, text)
matches

[('2021 Q1', '4.85'), ('2020 Q4', '3')]

<h3>re.search</h3>

In [15]:
text = '''
Tesla's gross cost of operating lease vehicles in FY2021 Q1 ljh lsj a 123 was $4.85 billion. Same number for FY2020 Q4 was $8 billion
'''
pattern = 'FY(\d{4} Q[1-4])[^\$]+\$([0-9\.]+)'

matches = re.search(pattern, text)
matches

<re.Match object; span=(51, 84), match='FY2021 Q1 ljh lsj a 123 was $4.85'>

In [16]:
matches.groups()

('2021 Q1', '4.85')

In [17]:
#---------------------Regex-----------------------------------
import re

pattern = r"hello"
text = "hello world"
match = re.search(pattern, text)
if match:
    print("Match found:", match.group())
else:
    print("No match")




Match found: hello


In [18]:
# Find all occurrences of a pattern

text = "The rain in Spain falls mainly in the plain."
pattern = r"\bin\b"
matches = re.findall(pattern, text)
print("All matches:", matches)


All matches: ['in', 'in']


In [19]:
text =  "Student Vishal Student Bhushan Student Chaitanya"
pattern = 'Student (\w+)'
m = re.findall(pattern,text)
m

['Vishal', 'Bhushan', 'Chaitanya']