## Regular Expressions

Regular expressions are used for matching text patterns for searching, replacing and parsing text 
with complex patterns of characters.

Regexes are used for four main purposes - 
- To validate if a text meets some criteria; Ex. a zip code with 6 numeric digits 
- Search substrings. Ex. finding texts that ends with abc and does not contain any digits 
- Search & replace everywhere the match is found within a string; Ex. search "fixed deposit" and replace with "term deposit" 
- Split a string at each place the regex matches; Ex. split everywhere a @ is encountered

#### Raw python string

It is recommended that you use raw strings instead of regular Python strings. Raw strings begin with a prefix, r, placed before the quotes

In [None]:
print("ABC \n PQR")

In [None]:
print(r"ABC \n PQR")

In [None]:
open(r"C:\users\newfolder\file.txt")

#### Importing re module

In [None]:
import re

#### Functions in re Module
The "re" module offers functionalities that allow us to match/search/replace a string 

- `re.match()` - The match only if it occurs at the beginning of the string 
- `re.search()` - First occurrence of the match if there is a match anywhere in the string  
- `re.findall()` - Returns a list containing all matches in the string 
- `re.split()` - Returns a list where the string has been split at each match 
- `re.sub()` - Replaces one or many matches with a string 
- `re.finditer()` - Returns a collectable iterator yielding all non-overlapping matches 

In [None]:
text = "India celebrates its independence day on 15th August"
re.match(r"[A-z]+\b", text)  # returns a match object

In [None]:
text = "India celebrates its independence day on 15th August"
re.search(r"[0-9]+", text)  # returns a match object

The **Match object** in Python's `re` module provides several useful methods to extract information about the match. It contains details about the match, such as the matched string, its position, and captured groups.

<table style="width: 60%; border-collapse: collapse; border: 1px solid #ccc; text-align: left; margin-left: 0;">
  <thead>
    <tr style="background-color: #050A30; color: white;">
    <th>Method</th>
    <th>Description</th>
  </tr>
  </thead>
  <tr>
    <td><b>.group([group])</b></td>
    <td>Returns the matched string or a specific group if specified.</td>
  </tr>
  <tr>
    <td><b>.groups()</b></td>
    <td>Returns a tuple of all captured groups.</td>
  </tr>
  <tr>
    <td><b>.start([group])</b></td>
    <td>Returns the start index of the match or a group.</td>
  </tr>
  <tr>
    <td><b>.end([group])</b></td>
    <td>Returns the end index of the match or a group.</td>
  </tr>
  <tr>
    <td><b>.span([group])</b></td>
    <td>Returns a tuple `(start, end)` of the match or a group.</td>
  </tr>
  <tr>
    <td><b>.re</b></td>
    <td>Returns the regular expression pattern object.</td>
  </tr>
  <tr>
    <td><b>.string</b></td>
    <td>Returns the original string searched.</td>
  </tr>
  <tr>
    <td><b>.lastindex</b></td>
    <td>Returns the last captured group’s index.</td>
  </tr>
  <tr>
    <td><b>.lastgroup</b></td>
    <td>Returns the last captured group’s name if named groups are used.</td>
  </tr>
</table>

In [11]:
# Example - 

import re

text = "India celebrates its independence day on 15th August and Republic day on 26th January"
match = re.search(r"[0-9]+", text)  # returns a match object

if match:
    print("Matched Text:", match.group())        
    print("Start Position:", match.start())     
    print("End Position:", match.end())         
    print("Span:", match.span())                
else:
    print("No match found.")

Matched Text: 15
Start Position: 41
End Position: 43
Span: (41, 43)


**Basic Characters**
- `^` - Matches the expression to its right at the start of a string. It matches every such 
instance before each line break in the string 
- `$` - Matches the expression to its left at the end of a string. It matches every such 
instance before each line break in the string 
- `p|q` - Matches expression p or q 

**Character Classes**

- `\w` - Matches alphanumeric characters: a-z, A-Z, 0-9 and _
- `\W` - Matches non-alphanumeric characters. Ignores a-z, A-Z, 0-9 and _
- `\d` - Matches digits: 0-9
- `\D` - Matches any non-digits 
- `\s` - Matches whitespace characters, which include the \t, \n, \r, and space characters 
- `\S` - Matches non-whitespace characters 
- `\A` - Matches the expression to its right at the absolute start of a string (in single or multi-line mode) 
- `\t` - Matches tab character
- `\Z` - Matches the expression to its left at the absolute end of a string (in single or multi-line mode) 
- `\n` - Matches a newline character 
- `\b` - Matches the word boundary at the start and end of a word 
- `\B` - Matches where \b does not, that is, non-word boundary

**Groups and Sets**

- `[abc]` - Matches either a, b, or c. It does not match abc
- `[a\-z]` - Matches a, -, or z. It matches - because \ escapes it 
- `[^abc]` - Adding ^ excludes any character in the set. Here, it matches characters that are  NOT a, b or c 
- `()` Matches the expression inside the parentheses and groups it
- `[a-zl` - Matches any alphabet from a to z 
- `[a-z0-9]` - Matches characters from a to z and O to 9 
- `[(+*)]` - Special characters become literal inside a set, so this matches ( + * and ) 
- `(?P=name)` - Matches the expression matched by an earlier group named "name"

**Quantifiers**

- `.` - Matches any character except newline 
- `?` - Matches the expression to its left O or 1 times 
- `{n}` - Matches the expression to its left n times 
- `(,m)` - Matches the expression to its left up to m times
- `*` - Matches the expression to its left O or more times 
- `+` - Matches the expression to its left 1 or more times 
- `{n,m}` - Matches the expression to its left n to m times 
- `{n, }` - Matches the expression to its left n or more times 

#### Examples - 

###### Ex. Extract all digits from the text

In [None]:
text = "The stock price was 456 yesterday. Today, it rose to 564"


###### Ex. Extract all numbers from the text

In [None]:
text = "The stock price was 456 yesterday. Today, it rose to 564"


###### Ex. Retrive the dividend from the text

In [None]:
text = "On 25th March, the company declared 17% dividend."


###### Ex. Retrieve all uppercase characters

In [None]:
text = "Stocks like AAPL GOOGL BMW are the preferred ones"


###### Ex. Retrive all stock names

In [None]:
text = "Stocks like AAPL GOOGL BMW are the preferred ones"


###### Ex. Retrieve the phone numbers with country code only 

In [None]:
text = "My number is 65-11223344 and 65-91919191. My other number is 44332211"


###### Ex. Retrieve the phone numbers with or without country code

In [None]:
text = "My number is 65-11223344 and 65-91919191. My other number is 44332211"


###### Ex. Retrieve the phone numbers without country code

In [None]:
text = "My number is 65-11223344 and 65-91919191. My other number is 44332211"


###### Ex. Retrieve the zip codes with 2 alphabets in the beginning 

In [None]:
text = "The zipcodes are AB4567, TX23A3, 310120, NY1210, 734001 "


###### Ex. Replace values as given in the dict

In [None]:
text = "Stocks like AAPL GOOGL BMW are the preferred ones"
repl_dict = {"AAPL": "APPLE", "GOOGL": "GOOGLE"}


<hr><hr>

## Handling data from external sources

#### Introduction to OS module

In [None]:
import os


### DataBase Source

In [None]:
!pip install SQLAlchemy
!pip install pymysql
!pip install cx_oracle

- Syntax - dialect+driver://username:password@host:port/database
            
- Mysql - "mysql+pymysql://root:1234@localhost:3306/onlineshopping"
- Oracle - "oracle+cx_oracle://s:t@dsn"

#### Data Connection

In [None]:
from sqlalchemy import create_engine, text
engine = create_engine("sqlite:///employee.sqlite3") # Creates a new file if not present
conn = engine.connect()

#### Select Clause

In [None]:
emp = conn.execute(text("select * from Employee")).fetchall() # Extract all the data
emp = conn.execute(text("select * from Employee")).fetchone() # Extract data for 1st emp

#### Insert - Update - Delete

In [None]:
curr = conn.execute(text("Insert into Employee values (30, 'Jane', 90000, 'Manager', 45)")) # Insert Query
curr

In [None]:
curr = conn.execute(text("delete from Employee where Name = 'Jane'"))  # Delete Query

In [None]:
curr = conn.execute(text("select * from Employee"))
curr.keys()  # Return the names of all columns

#### Working with database using pandas library

In [None]:
import pandas as pd
df = pd.read_sql_table("Employee", conn) # Extract by table name
df.head()

In [None]:
df = pd.read_sql_query(text("Select * from Employee where Designation = 'Manager'"), conn) # Extract by query
df

#### Data Manipulation using pandas

### HTTPS Requests

In [None]:
!pip install requests

In [None]:
import requests


### File Source 

- The key function for working with files in Python is the `open()` function.

- The `open()` function takes two parameters; filename, and mode.

- There are four different methods (modes) for opening a file:

    - "r" - Read - Default value. Opens a file for reading, error if the file does not exist

    - "a" - Append - Opens a file for appending, creates the file if it does not exist

    - "w" - Write - Opens a file for writing, creates the file if it does not exist

###### Ex. Read file `customers.txt`

In [None]:
file = open("customers.txt")


###### Ex. Print numbers of lines in the file

###### Ex. Clean data read from the file and extract information about all `Pilots`.

###### Ex. Create a list of Pilots (HINT - use filter())

###### Ex. Create a list of senior citizens from customers

###### Ex. Display the list of pilots in ASC of their age (HINT- use sorted() with key as func object)

#### Using `with` keyword to read data and write data

In [None]:
with open("pilots_details.txt", "w") as file :
    for cust in pilots :
        file.write(f"{cust.name} - {cust.age}yrs \n")

In [None]:
df.to_excel("emp_details.xlsx", sheet_name="Employee", index=False)

<hr><hr>