# Regular Expressions

Regular expressions are used for matching text patterns for searching, replacing and parsing text 
with complex patterns of characters.

Regexes are used for four main purposes - 
- To validate if a text meets some criteria; Ex. a zip code with 6 numeric digits 
- Search substrings. Ex. finding texts that ends with abc and does not contain any digits 
- Search & replace everywhere the match is found within a string; Ex. search "fixed deposit" and replace with "term deposit" 
- Split a string at each place the regex matches; Ex. split everywhere a @ is encountered

#### Raw python string

It is recommended that you use raw strings instead of regular Python strings. Raw strings begin with a prefix, r, placed before the quotes

In [1]:
print("ABC \n PQR")

ABC 
 PQR


In [2]:
print(r"ABC \n PQR")

ABC \n PQR


In [None]:
open(r"C:\users\newfolder\file.txt")

### Importing re module

In [3]:
import re

### Functions in re Module
The "re" module offers functionalities that allow us to match/search/replace a string 

- `re.match()` - The match only if it occurs at the beginning of the string 
- `re.search()` - First occurrence of the match if there is a match anywhere in the string  
- `re.findall()` - Returns a list containing all matches in the string 
- `re.split()` - Returns a list where the string has been split at each match 
- `re.sub()` - Replaces one or many matches with a string 
- `re.finditer()` - Returns a collectable iterator yielding all non-overlapping matches 

In [5]:
text = "Jack and Jill went up the hill"

re.match(r"Jack", text)  # returns a match object

<re.Match object; span=(0, 4), match='Jack'>

In [6]:
text = "Jack and Jill went up the hill"

re.search(r"Jill", text)

<re.Match object; span=(9, 13), match='Jill'>

In [8]:
text = "She sells sea shells on the sea shore"

re.findall(r"se", text)

['se', 'se', 'se']

In [9]:
text = "She sells sea shells on the sea shore"

re.split(r" ", text)

['She', 'sells', 'sea', 'shells', 'on', 'the', 'sea', 'shore']

In [15]:
strg = "1, 2, 3, 4, 5"
re.split(r"[, ]", strg)

['1', '', '2', '', '3', '', '4', '', '5']

In [16]:
text = "She sells sea shells on the sea shore"

re.sub(r"[aeiou]", "*", text)

'Sh* s*lls s** sh*lls *n th* s** sh*r*'

### Basic Characters


- `^` - Matches the expression to its right at the start of a string. It matches every such 
instance before each line break in the string 
- `$` - Matches the expression to its left at the end of a string. It matches every such 
instance before each line break in the string 
- `p|q` - Matches expression p or q 

### Character Classes

- `\w` - Matches alphanumeric characters: a-z, A-Z, 0-9 and _
- `\W` - Matches non-alphanumeric characters. Ignores a-z, A-Z, 0-9 and _
- `\d` - Matches digits: 0-9
- `\D` - Matches any non-digits 
- `\s` - Matches whitespace characters, which include the \t, \n, \r, and space characters 
- `\S` - Matches non-whitespace characters 
- `\A` - Matches the expression to its right at the absolute start of a string (in single or multi-line mode) 
- `\t` - Matches tab character
- `\Z` - Matches the expression to its left at the absolute end of a string (in single or multi-line mode) 
- `\n` - Matches a newline character 
- `\b` - Matches the word boundary at the start and end of a word 
- `\B` - Matches where \b does not, that is, non-word boundary

### Groups and Sets

- `[abc]` - Matches either a, b, or c. It does not match abc
- `[a\-z]` - Matches a, -, or z. It matches - because \ escapes it 
- `[^abc]` - Adding ^ excludes any character in the set. Here, it matches characters that are  NOT a, b or c 
- `()` Matches the expression inside the parentheses and groups it
- `[a-zl` - Matches any alphabet from a to z 
- `[a-z0-9]` - Matches characters from a to z and O to 9 
- `[(+*)]` - Special characters become literal inside a set, so this matches ( + * and ) 
- `(?P=name)` - Matches the expression matched by an earlier group named "name"

### Quantifiers

- `.` - Matches any character except newline 
- `?` - Matches the expression to its left O or 1 times 
- `{n}` - Matches the expression to its left n times 
- `(,m)` - Matches the expression to its left up to m times
- `*` - Matches the expression to its left O or more times 
- `+` - Matches the expression to its left 1 or more times 
- `{n,m}` - Matches the expression to its left n to m times 
- `{n, }` - Matches the expression to its left n or more times 

### Examples - 

###### Ex. Extract all digits from the text

In [17]:
text = "The stock price was 456 yesterday. Today, it rose to 564"
re.findall(r"\d", text)

['4', '5', '6', '5', '6', '4']

###### Ex. Extract all numbers from the text

In [18]:
text = "The stock price was 456 yesterday. Today, it rose to 564"
re.findall(r"\d+", text)

['456', '564']

###### Ex. Retrive the dividend from the text

In [19]:
text = "On 25th March, the company declared 17% dividend."
re.findall(r"\d+%", text)

['17%']

###### Ex. Retrieve all uppercase characters

In [21]:
text = "Stocks like AAPL GOOGL BMW are the preferred ones"
re.findall(r"[A-Z]", text)

['S', 'A', 'A', 'P', 'L', 'G', 'O', 'O', 'G', 'L', 'B', 'M', 'W']

###### Ex. Retrive all stock names

In [23]:
text = "Stocks like AAPL GOOGL BMW are the preferred ones"
re.findall(r"[A-Z]+\b", text)

['AAPL', 'GOOGL', 'BMW']

###### Ex. Retrieve the phone numbers with country code only 

In [27]:
text = "My number is 65-11223344 and 65-91919191. My other number is 44332211"
re.findall(r"\d+-\d+", text)

['65-11223344', '65-91919191']

###### Ex. Retrieve the phone numbers with or without country code

In [29]:
text = "My number is 65-11223344 and 65-91919191. My other number is 44332211"
re.findall(r"\d+-\d+|\d+", text)

['65-11223344', '65-91919191', '44332211']

###### Ex. Retrieve the phone numbers without country code

In [34]:
text = "My number is 65-11223344 and 65-91919191. My other number is 44332211"
re.findall(r"\d{3,}", text)

['11223344', '91919191', '44332211']

###### Ex. Retrieve the zip codes with 2 alphabets in the beginning 

In [37]:
text = "The zipcodes are AB4567, TX23A3, 310120, NY1210, 734001 "
re.findall(r"[A-Z]{2}\w+", text)

['AB4567', 'TX23A3', 'NY1210']

In [38]:
text = "The zipcodes are AB4567, TX23A3, 310120, NY1210, 734001 "
re.findall(r"[A-Z]{2}\d+", text)

['AB4567', 'TX23', 'NY1210']

###### Ex. Retrieve the dates

In [39]:
text = "Temasek Holdings was founded on 25/06/1974. It turns 47 on 25/6/2021" 
re.findall(r"\d+/\d+/\d+", text)

['25/06/1974', '25/6/2021']

###### Ex. Retrieve the email IDs 

In [41]:
text = "Email us at contact@gobledy.com or info@info.net or tryuspython.az "
re.findall(r"\w+@\w+.\w+", text)

['contact@gobledy.com', 'info@info.net']

###### Ex. Replace values as given in the dict

In [57]:
text = "Stocks like AAPL GOOGL BMW are the preferred ones"
repl_dict = {"AAPL": "APPLE", "GOOGL": "GOOGLE"}
func = lambda match_obj : repl_dict.get(match_obj.group(), match_obj.group())
re.sub(r"[A-Z]+\b", func, text)

'Stocks like APPLE GOOGLE BMW are the preferred ones'

In [42]:
help(re.sub)

Help on function sub in module re:

sub(pattern, repl, string, count=0, flags=0)
    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the Match object and must return
    a replacement string to be used.



In [59]:
# Demo for match obj and creating the lambda function not replated to re.sub()
# Extract the sub-string matching with the re pattern from the match_obj
match_obj = re.search(r"[A-Z]+\b", text)  # using search to get the sample of match obj
repl_dict.get(match_obj.group())

'APPLE'

<hr><hr>

# Handling data from external sources

### Introduction to OS module

In [None]:
import os

## File Source 

- The key function for working with files in Python is the `open()` function.

- The `open()` function takes two parameters; filename, and mode.

- There are four different methods (modes) for opening a file:

    - "r" - Read - Default value. Opens a file for reading, error if the file does not exist

    - "a" - Append - Opens a file for appending, creates the file if it does not exist

    - "w" - Write - Opens a file for writing, creates the file if it does not exist

###### Ex. Read file `customers.txt`

###### Ex. Print numbers of lines in the file

###### Ex. Clean data read from the file and extract information about all `Pilots`.

###### Ex. Write names of the pilots to `pilots.txt` file

#### Using `with` keyword to read data and write data

<hr><hr>

## DataBase Source

In [None]:
!pip install SQLAlchemy
!pip install pymysql
!pip install cx_oracle

- Syntax - dialect+driver://username:password@host:port/database
            
- Mysql - "mysql+pymysql://root:1234@localhost:3306/onlineshopping"
- Oracle - "oracle+cx_oracle://s:t@dsn"

#### Data Connection

In [72]:
from sqlalchemy import create_engine, text
engine = create_engine("sqlite:///employee.sqlite3") # Creates a new file if not present
conn = engine.connect()

#### Select Clause

In [74]:
conn.execute(text("select * from Employee")).fetchall()

[(0, 'Claire', 88962, 'Manager', 35),
 (1, 'Darrin', 67659, 'Team Lead', 26),
 (2, 'Sean', 117501, 'Manager', 36),
 (3, 'Brosina', 149957, 'Senior Manager', 44),
 (4, 'Andrew', 32212, 'Team Lead', 33),
 (5, 'Irene', 63391, 'Team Lead', 33),
 (6, 'Harold', 14438, 'Developer', 23),
 (7, 'Pete', 22445, 'Developer', 22),
 (8, 'Alejandro', 72287, 'Team Lead', 35),
 (9, 'Zuschuss', 195588, 'Managing Director', 53),
 (10, 'Ken', 17240, 'Developer', 25),
 (11, 'Sandra', 115116, 'Manager', 41),
 (12, 'Emily', 18027, 'Developer', 24),
 (13, 'Eric', 55891, 'Team Lead', 31),
 (14, 'Tracy', 109132, 'Manager', 34),
 (15, 'Matt', 83327, 'Manager', 43),
 (16, 'Gene', 22125, 'Developer', 22),
 (17, 'Steve', 29324, 'Team Lead', 29),
 (18, 'Linda', 54003, 'Team Lead', 35),
 (19, 'Ruben', 18390, 'Developer', 25),
 (20, 'Erin', 141401, 'Senior Manager', 47),
 (21, 'Odella', 19593, 'Developer', 22),
 (22, 'Patrick', 57093, 'Team Lead', 26),
 (23, 'Lena', 130556, 'Senior Manager', 52),
 (24, 'Darren', 22093,

#### Insert - Update - Delete

In [75]:
import pandas as pd
df = pd.read_sql_table("Employee", engine)
df.head()

Unnamed: 0,index,Name,Salary,Designation,Age
0,0,Claire,88962,Manager,35
1,1,Darrin,67659,Team Lead,26
2,2,Sean,117501,Manager,36
3,3,Brosina,149957,Senior Manager,44
4,4,Andrew,32212,Team Lead,33


<hr><hr>

## HTTPS Requests

In [None]:
!pip install requests

#### Revision - 

1. Decorators
2. Rest API