# Regular Expressions

Regular expressions are used for matching text patterns for searching, replacing and parsing text 
with complex patterns of characters.

Regexes are used for four main purposes - 
- To validate if a text meets some criteria; Ex. a zip code with 6 numeric digits 
- Search substrings. Ex. finding texts that ends with abc and does not contain any digits 
- Search & replace everywhere the match is found within a string; Ex. search "fixed deposit" and replace with "term deposit" 
- Split a string at each place the regex matches; Ex. split everywhere a @ is encountered

#### Raw python string

It is recommended that you use raw strings instead of regular Python strings. Raw strings begin with a prefix, r, placed before the quotes

In [None]:
print("ABC \n PQR")

In [None]:
print(r"ABC \n PQR")

In [None]:
open(r"C:\users\newfolder\file.txt")

### Importing re module

In [None]:
import re

### Functions in re Module
The "re" module offers functionalities that allow us to match/search/replace a string 

- `re.match()` - The match only if it occurs at the beginning of the string 
- `re.search()` - First occurrence of the match if there is a match anywhere in the string  
- `re.findall()` - Returns a list containing all matches in the string 
- `re.split()` - Returns a list where the string has been split at each match 
- `re.sub()` - Replaces one or many matches with a string 
- `re.finditer()` - Returns a collectable iterator yielding all non-overlapping matches 

In [None]:
text = "Jack and Jill went up the hill"

re.match(r"Jack", text)  # returns a match object

In [None]:
text = "Jack and Jill went up the hill"

re.search(r"Jill", text)

In [None]:
text = "She sells sea shells on the sea shore"

re.findall(r"se", text)

In [None]:
text = "She sells sea shells on the sea shore"

re.split(r" ", text)

In [None]:
strg = "1, 2, 3, 4, 5"
re.split(r"[, ]", strg)

In [None]:
text = "She sells sea shells on the sea shore"

re.sub(r"[aeiou]", "*", text)

### Basic Characters


- `^` - Matches the expression to its right at the start of a string. It matches every such 
instance before each line break in the string 
- `$` - Matches the expression to its left at the end of a string. It matches every such 
instance before each line break in the string 
- `p|q` - Matches expression p or q 

### Character Classes

- `\w` - Matches alphanumeric characters: a-z, A-Z, 0-9 and _
- `\W` - Matches non-alphanumeric characters. Ignores a-z, A-Z, 0-9 and _
- `\d` - Matches digits: 0-9
- `\D` - Matches any non-digits 
- `\s` - Matches whitespace characters, which include the \t, \n, \r, and space characters 
- `\S` - Matches non-whitespace characters 
- `\A` - Matches the expression to its right at the absolute start of a string (in single or multi-line mode) 
- `\t` - Matches tab character
- `\Z` - Matches the expression to its left at the absolute end of a string (in single or multi-line mode) 
- `\n` - Matches a newline character 
- `\b` - Matches the word boundary at the start and end of a word 
- `\B` - Matches where \b does not, that is, non-word boundary

### Groups and Sets

- `[abc]` - Matches either a, b, or c. It does not match abc
- `[a\-z]` - Matches a, -, or z. It matches - because \ escapes it 
- `[^abc]` - Adding ^ excludes any character in the set. Here, it matches characters that are  NOT a, b or c 
- `()` Matches the expression inside the parentheses and groups it
- `[a-zl` - Matches any alphabet from a to z 
- `[a-z0-9]` - Matches characters from a to z and O to 9 
- `[(+*)]` - Special characters become literal inside a set, so this matches ( + * and ) 
- `(?P=name)` - Matches the expression matched by an earlier group named "name"

### Quantifiers

- `.` - Matches any character except newline 
- `?` - Matches the expression to its left O or 1 times 
- `{n}` - Matches the expression to its left n times 
- `(,m)` - Matches the expression to its left up to m times
- `*` - Matches the expression to its left O or more times 
- `+` - Matches the expression to its left 1 or more times 
- `{n,m}` - Matches the expression to its left n to m times 
- `{n, }` - Matches the expression to its left n or more times 

### Examples - 

###### Ex. Extract all digits from the text

In [None]:
text = "The stock price was 456 yesterday. Today, it rose to 564"
re.findall(r"\d", text)

###### Ex. Extract all numbers from the text

In [None]:
text = "The stock price was 456 yesterday. Today, it rose to 564"
re.findall(r"\d+", text)

###### Ex. Retrive the dividend from the text

In [None]:
text = "On 25th March, the company declared 17% dividend."
re.findall(r"\d+%", text)

###### Ex. Retrieve all uppercase characters

In [None]:
text = "Stocks like AAPL GOOGL BMW are the preferred ones"
re.findall(r"[A-Z]", text)

###### Ex. Retrive all stock names

In [None]:
text = "Stocks like AAPL GOOGL BMW are the preferred ones"
re.findall(r"[A-Z]+\b", text)

###### Ex. Retrieve the phone numbers with country code only 

In [None]:
text = "My number is 65-11223344 and 65-91919191. My other number is 44332211"
re.findall(r"\d+-\d+", text)

###### Ex. Retrieve the phone numbers with or without country code

In [None]:
text = "My number is 65-11223344 and 65-91919191. My other number is 44332211"
re.findall(r"\d+-\d+|\d+", text)

###### Ex. Retrieve the phone numbers without country code

In [None]:
text = "My number is 65-11223344 and 65-91919191. My other number is 44332211"
re.findall(r"\d{3,}", text)

###### Ex. Retrieve the zip codes with 2 alphabets in the beginning 

In [None]:
text = "The zipcodes are AB4567, TX23A3, 310120, NY1210, 734001 "
re.findall(r"[A-Z]{2}\w+", text)

In [None]:
text = "The zipcodes are AB4567, TX23A3, 310120, NY1210, 734001 "
re.findall(r"[A-Z]{2}\d+", text)

###### Ex. Retrieve the dates

In [None]:
text = "Temasek Holdings was founded on 25/06/1974. It turns 47 on 25/6/2021" 
re.findall(r"\d+/\d+/\d+", text)

###### Ex. Retrieve the email IDs 

In [None]:
text = "Email us at contact@gobledy.com or info@info.net or tryuspython.az "
re.findall(r"\w+@\w+.\w+", text)

###### Ex. Replace values as given in the dict

In [None]:
text = "Stocks like AAPL GOOGL BMW are the preferred ones"
repl_dict = {"AAPL": "APPLE", "GOOGL": "GOOGLE"}
func = lambda match_obj : repl_dict.get(match_obj.group(), match_obj.group())
re.sub(r"[A-Z]+\b", func, text)

In [None]:
help(re.sub)

In [None]:
# Demo for match obj and creating the lambda function not replated to re.sub()
# Extract the sub-string matching with the re pattern from the match_obj
match_obj = re.search(r"[A-Z]+\b", text)  # using search to get the sample of match obj
repl_dict.get(match_obj.group())

<hr><hr>

# Handling data from external sources

### Introduction to OS module

In [21]:
import os
os.getcwd()  # Returns path of current working directory

'C:\\Users\\vaide\\OneDrive - knowledgecorner.in\\Course Material\\Clients\\Oracle\\Oracle_Mar_25\\Oracle_10_Mar_25\\Classwork'

In [22]:
os.chdir(r"C:\Users\vaide\OneDrive - knowledgecorner.in\Course Material\Clients\Oracle\Oracle_Mar_25\Oracle_10_Mar_25\Classwork\dataset")

In [23]:
os.getcwd()  

'C:\\Users\\vaide\\OneDrive - knowledgecorner.in\\Course Material\\Clients\\Oracle\\Oracle_Mar_25\\Oracle_10_Mar_25\\Classwork\\dataset'

In [None]:
os.system("")

## File Source 

- The key function for working with files in Python is the `open()` function.

- The `open()` function takes two parameters; filename, and mode.

- There are four different methods (modes) for opening a file:

    - "r" - Read - Default value. Opens a file for reading, error if the file does not exist

    - "a" - Append - Opens a file for appending, creates the file if it does not exist

    - "w" - Write - Opens a file for writing, creates the file if it does not exist

###### Ex. Read file `customers.txt`

In [26]:
file = open("customers.txt")
data = file.readlines()

9999

###### Ex. Print numbers of lines in the file

In [27]:
len(data)

9999

###### Ex. Clean data read from the file and extract information about all `Pilots`.

In [70]:
class Customer :
    def __init__(self, c_id, fname, lname, age, prof):
        self.c_id = c_id
        self.name = fname + " " + lname
        self.age = int(age)
        self.profession = prof

    def __str__(self):
        return self.name
        
    def __repr__(self) :
        return f"{self.name} | {self.age}"

    def __lt__(self, obj) :
        return self.age < obj.age
    
cust_lst = data[0].strip().split(",")
cust = Customer(*cust_lst)
cust  # uses repr 

Kristina Chung | 55

In [71]:
print(cust)  # use str if present if not will use repr

Kristina Chung


In [72]:
# convert all the rows to a list of customers
def clean_data(strg) :
    cust_lst = strg.strip().split(",")
    cust = Customer(*cust_lst)
    return cust

customers = list(map(clean_data, data))
customers[0:3]

[Kristina Chung | 55, Paige Chen | 74, Sherri Melton | 34]

In [85]:
customers = [clean_data(i) for i in data]

In [None]:
[Customer(*strg.strip().split(",")) for strg in data]

###### Ex. Create a list of Pilots (HINT - use filter())

In [73]:
pilots = list(filter(lambda cust : cust.profession == "Pilot", customers))
len(pilots)

209

###### Ex. Create a list of senior citizens from customers

In [74]:
seniors = list(filter(lambda cust : cust.age > 60, customers))
len(seniors)

2840

###### Ex. Display the list of pilots in ASC of their age (HINT- use sorted() with key as func object)

In [75]:
sorted(pilots, key = lambda cust : cust.name)

[Alan O'Neal | 59,
 Alexander Britt | 52,
 Alice Nance | 59,
 Alice Norton | 38,
 Allan Nguyen | 51,
 Amy Pappas | 42,
 Anna Gunter | 30,
 Anna Whitfield | 59,
 Anne Price | 32,
 Annie Buck | 55,
 Anthony Perkins | 63,
 Arlene Blanton | 39,
 Arthur Coble | 66,
 Ashley McGee | 61,
 Ben Patton | 40,
 Benjamin Hensley | 37,
 Betty Norman | 49,
 Bobby Hines | 26,
 Brad Sanford | 55,
 Brian Durham | 58,
 Brian Reid | 49,
 Calvin Peele | 53,
 Calvin Schultz | 46,
 Cameron Allred | 45,
 Cameron Khan | 69,
 Carlos Block | 27,
 Carole Curtis | 29,
 Charles Eason | 43,
 Charlie Becker | 50,
 Charlotte Ray | 34,
 Christina Heath | 59,
 Christine Barrett | 22,
 Claire Meyers | 51,
 Colleen Griffith | 31,
 Connie Pappas | 39,
 Craig Simon | 35,
 Danny Bowers | 72,
 David English | 30,
 David Strauss | 29,
 Dean Lutz | 26,
 Deborah Britt | 64,
 Debra Stephenson | 32,
 Diana Hawley | 68,
 Don Rose | 56,
 Dorothy Stone | 52,
 Douglas Buckley | 43,
 Douglas Weeks | 59,
 Dwight Dickens | 61,
 Dwight Jai

In [76]:
customers[0] > customers[1]

False

In [78]:
customers[0], customers[1]

(Kristina Chung | 55, Paige Chen | 74)

In [79]:
sorted(pilots)

[Paula Jiang | 21,
 Gordon House | 21,
 Tracy Gillespie | 22,
 Christine Barrett | 22,
 Kevin Snow | 23,
 Wayne Marsh | 23,
 Jonathan Pearson | 23,
 Marsha Cash | 24,
 Hannah Langston | 25,
 Patrick Padgett | 25,
 Vincent Jernigan | 25,
 Lorraine Fischer | 25,
 Dean Lutz | 26,
 Bobby Hines | 26,
 Vernon Gibbs | 26,
 Kay Fletcher | 26,
 Julia Dillon | 26,
 Hilda Sherrill | 26,
 Ruth Bass | 27,
 Louis Sharp | 27,
 Emma Kearney | 27,
 Elisabeth Feldman | 27,
 Carlos Block | 27,
 Kathy Burch | 28,
 Edwin Aldridge | 28,
 Timothy Welch | 28,
 Michelle Moser | 28,
 Ronald Willis | 28,
 Veronica Briggs | 29,
 Randall Copeland | 29,
 David Strauss | 29,
 Carole Curtis | 29,
 Kerry Jennings | 29,
 Robyn Cox | 30,
 Joanna Block | 30,
 Kelly Xu | 30,
 Todd Horn | 30,
 Anna Gunter | 30,
 Heidi Chung | 30,
 Harry Adler | 30,
 David English | 30,
 Vicki Jackson | 30,
 Eddie Goodwin | 30,
 Samantha Berry | 31,
 Colleen Griffith | 31,
 Suzanne Strauss | 31,
 Anne Price | 32,
 Irene Langley | 32,
 Jeff 

#### Using `with` keyword to read data and write data

In [82]:
with open("pilots.txt", "w") as file :
    for cust in pilots :
        file.write(str(cust)+"\n")

In [84]:
with open("pilots_details.txt", "w") as file :
    for cust in pilots :
        file.write(f"{cust.name} - {cust.age}yrs \n")

<hr><hr>

## DataBase Source

In [None]:
!pip install SQLAlchemy
!pip install pymysql
!pip install cx_oracle

- Syntax - dialect+driver://username:password@host:port/database
            
- Mysql - "mysql+pymysql://root:1234@localhost:3306/onlineshopping"
- Oracle - "oracle+cx_oracle://s:t@dsn"

#### Data Connection

In [None]:
from sqlalchemy import create_engine, text
engine = create_engine("sqlite:///employee.sqlite3") # Creates a new file if not present
conn = engine.connect()

#### Select Clause

In [None]:
emp = conn.execute(text("select * from Employee")).fetchall() # Extract all the data
emp = conn.execute(text("select * from Employee")).fetchone() # Extract data for 1st emp

#### Insert - Update - Delete

In [None]:
curr = conn.execute(text("Insert into Employee values (30, 'Jane', 90000, 'Manager', 45)")) # Insert Query
curr

In [None]:
curr = conn.execute(text("delete from Employee where Name = 'Jane'"))  # Delete Query

In [None]:
curr = conn.execute(text("select * from Employee"))
curr.keys()  # Return the names of all columns

#### Working with database using pandas library

In [None]:
import pandas as pd
df = pd.read_sql_table("Employee", conn) # Extract by table name
df.head()

In [None]:
df = pd.read_sql_query(text("Select * from Employee where Designation = 'Manager'"), conn) # Extract by query
df

In [None]:
df = pd.read_sql_table("Employee", conn)
df.drop(columns=["index"], inplace=True)
df.loc[30] = ["Jane", 80000, "Manager", 40]
df.to_sql("Employee", conn, if_exists="replace", index=False)

In [16]:
df = pd.read_sql_query(text("Select * from Employee where Designation = 'Manager'"), conn) # Extract by query
df

Unnamed: 0,Name,Salary,Designation,Age
0,Claire,88962,Manager,35
1,Sean,117501,Manager,36
2,Sandra,115116,Manager,41
3,Tracy,109132,Manager,34
4,Matt,83327,Manager,43


In [17]:
df = pd.read_sql_table("Employee", conn)
df.loc[df.Name == "Vaidehi", "Designation"] = "Senior Manager"

In [None]:
df.to_sql("Employee", conn, if_exists="replace", index=False)

In [None]:
df = pd.read_sql_table("Employee", conn)
df

In [None]:
pd.read_excel("filename.xlsx")

In [19]:
df.to_excel("emp_details.xlsx", sheet_name="Employee", index=False)

<hr><hr>

## HTTPS Requests

In [None]:
!pip install requests

In [100]:
import requests
response = requests.get(r"http://127.0.0.1:5000/tasks")

In [101]:
response.json()  # dict object

{'TaskNo': [2, 3, 4],
 'Task': ['Meeting at 3', 'Python Session ', 'Meeting at 5pm'],
 'Created_date': ['2023-09-15 13:49:46.580811',
  '2024-11-28 14:45:35.368632',
  '2025-03-13 14:40:36.208116'],
 'Due_date': ['2023-09-17 00:00:00',
  '2024-11-28 00:00:00',
  '2025-03-13 00:00:00'],
 'Status': ['In-Progress', 'In-Progress', 'Complete']}

In [92]:
df = pd.DataFrame(response.json())
df

Unnamed: 0,TaskNo,Task,Created_date,Due_date,Status
0,2,Meeting at 3,2023-09-15 13:49:46.580811,2023-09-17 00:00:00,In-Progress
1,3,Python Session,2024-11-28 14:45:35.368632,2024-11-28 00:00:00,In-Progress
2,4,Meeting at 5pm,2025-03-13 14:40:36.208116,2025-03-13 00:00:00,Complete


In [125]:
import requests
response = requests.get(r"https://raw.github.com/knowledge-corner/Oracle_10_Mar_25/main/Classwork/customers.txt")

In [126]:
response

<Response [200]>

In [None]:
response.text

In [None]:
data = response.text.split("\n")
data

#### Revision - 

1. Decorators
2. Rest API