# Python Regex 

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.

Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module:

In [31]:
import re

# Simple example
# To search for all 'e' character in the given string:
string = "expression"
pattern = r"e"
re.findall(pattern, string)

['e', 'e']

**Important note**:

When setting the regex pattern to a variable, it is important to assign it as **raw string**: To denote raw string:

```python

variable = r"this is a raw string. special characters are not escaped"
```
Python raw string is created by prefixing a string literal with ‘r’ or ‘R’. Python raw string treats backslash (\) as a literal character. This is useful when we want to have a string that contains backslash and don’t want it to be treated as an escape character like in the case of regex where we will be using a lot of backslash to search for patterns.

for example:

In [169]:
## String assigned as normal text
s = 'Hi\nHello'
print(s)

print("-"*10)
## String assigned as raw string
s = r'Hi\nHello'
print(s)

Hi
Hello
----------
Hi\nHello


In [32]:
txt = "The rain in Spain. Spain is near France."
x = re.findall(r"Spain", txt)
print(x)

['Spain', 'Spain']


So in regex mostly we give a pattern and the pattern searched over a given string. In next part we will explore the patterns and functions in regex:

## Patterns
1. **Metacharacters**

List of pattern also known as metacharacters:
![](https://www.engineeringbigdata.com/wp-content/uploads/python-regular-expression-regex-metacharacters-meanings.jpg)

2. **Special sequences**

A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:
![](https://www.engineeringbigdata.com/wp-content/uploads/python-regular-expression-regex-special-sequences.jpg)

3. **Sets**

A set is a set of characters inside a pair of square brackets [] with a special meaning:
![](https://149695847.v2.pressablecdn.com/wp-content/uploads/2021/06/image-8.png)

## Functions in regex

Functions in regex module:

1. `findall`: Returns a list containing all matches
2. `search`: Returns a Match object if there is a match anywhere in the string
3. `split`: Returns a list where the string has been split at each match
4. `sub`: Replaces one or many matches with a string



## `findall()` with different patterns

Finding all types of matches using a pattern. Here we will also explore different different patterns:

In [156]:
txt = "The rain in Spain"
x = re.findall(r"ai", txt)
print(x)

['ai', 'ai']


In [157]:
## returns an empty list if the pattern is not found:
txt = "The rain in Spain"
x = re.findall(r"Portugal", txt)
print(x)

[]


This pattern bellow only looks for word characters in the string

In [35]:
string = "expression@gmail.com The search proceeds through the string from start to end, stopping at the first match found"
pattern = r"\w"
print(re.findall(pattern, string))

['e', 'x', 'p', 'r', 'e', 's', 's', 'i', 'o', 'n', 'g', 'm', 'a', 'i', 'l', 'c', 'o', 'm', 'T', 'h', 'e', 's', 'e', 'a', 'r', 'c', 'h', 'p', 'r', 'o', 'c', 'e', 'e', 'd', 's', 't', 'h', 'r', 'o', 'u', 'g', 'h', 't', 'h', 'e', 's', 't', 'r', 'i', 'n', 'g', 'f', 'r', 'o', 'm', 's', 't', 'a', 'r', 't', 't', 'o', 'e', 'n', 'd', 's', 't', 'o', 'p', 'p', 'i', 'n', 'g', 'a', 't', 't', 'h', 'e', 'f', 'i', 'r', 's', 't', 'm', 'a', 't', 'c', 'h', 'f', 'o', 'u', 'n', 'd']


To look for all words in a given string, we provide a `+` character meaning denoting word character followed by one or more word character:

Refer table 1 in [patterns section](#Patterns)

In [36]:
string = "expression@gmail.com The search proceeds through the string from start to end, stopping at the first match found"
pattern = r"\w+"
print(re.findall(pattern, string))

['expression', 'gmail', 'com', 'The', 'search', 'proceeds', 'through', 'the', 'string', 'from', 'start', 'to', 'end', 'stopping', 'at', 'the', 'first', 'match', 'found']


Similarly we can search for digit or digits:

In [37]:
string = "9999 The search proceeds through 999 the string from start to end, 999 stopping at the first match found"
pattern = r"\d"
re.findall(pattern, string)

['9', '9', '9', '9', '9', '9', '9', '9', '9', '9']

To find all the numbers in a given string:

In [38]:
string = "9999 The search proceeds through 999 the string from start to end, 999 stopping at the first match found"
pattern = r"\d+"
re.findall(pattern, string)

['9999', '999', '999']

To find all the numbers in the random string:

In [155]:
string = '''
Contrary to popular belief, Lorem Ipsum is not simply random text. 
It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. 
Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure 
Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, 
discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et 
Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, 
very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", 
comes from a line in section 1.10.32
'''

pattern = r"\d+"
re.findall(pattern, string)

['45', '2000', '1', '10', '32', '1', '10', '33', '45', '1', '10', '32']

Only look for numbers that are of 3 digits only:

`\d` in regular expressions means match any digit, and the `{n}` construct means repeat the previous item `n` times

In [170]:
string = "9999676 8 The search proceeds through 999 the string from start to end, 999 stopping at the first match found"
pattern = r"\d{3}"
re.findall(pattern, string)

['999', '967', '999', '999']

This will only take the first 3 digits of a number even if that number is 10 digits. If you want to have only numbers exactly equal to 3 digits then use `\b` boundary character the ends of pattern like this:

In [172]:
string = "9999676 8 The search proceeds through 999 the string from start to end, 999 stopping at the first match found"
pattern = r"\b\d{3}\b"
re.findall(pattern, string)

['999', '999']

You can refer to this [post](https://stackoverflow.com/a/62117048/6819442) explaning how the `\b` boundary works in regex.

Now find the digits 1-3 digits long:

In [174]:
string = "9999 The 22 search proceeds through 999 the string from start to end, 999 stopping at the first match found"
pattern = r"\d{1,3}"
re.findall(pattern, string)

['999', '9', '22', '999', '999']

List down all the small letter words only:

Then use `[a-z]` pattern for it. Note that letters in the pattern are smallcase:

In [173]:
string = '''
Now is the winter of our discontent\nMade glorious summer by this sun of York;\nAnd all the clouds that lour'd upon our house\nIn the deep bosom of the ocean buried.\nNow are our brows bound with victorious wreaths;\nOur bruised arms hung up for monuments;\nOur stern alarums changed to merry meetings,\nOur dreadful marches to delightful measures.\nGrim-visaged war hath smooth'd his wrinkled front;\nAnd now, instead of mounting barded steeds\nTo fright the souls of fearful adversaries,\nHe capers nimbly in a lady's chamber\nTo the lascivious pleasing of a lute.\nBut I, that am not shaped for sportive tricks,\nNor made to court an amorous looking-glass;\nI, that am rudely stamp'd, and want love's majesty\nTo strut before a wanton ambling nymph;\nI, that am curtail'd of this fair proportion,
'''
pattern = r"[a-z]+"
print(re.findall(pattern, string))

['ow', 'is', 'the', 'winter', 'of', 'our', 'discontent', 'ade', 'glorious', 'summer', 'by', 'this', 'sun', 'of', 'ork', 'nd', 'all', 'the', 'clouds', 'that', 'lour', 'd', 'upon', 'our', 'house', 'n', 'the', 'deep', 'bosom', 'of', 'the', 'ocean', 'buried', 'ow', 'are', 'our', 'brows', 'bound', 'with', 'victorious', 'wreaths', 'ur', 'bruised', 'arms', 'hung', 'up', 'for', 'monuments', 'ur', 'stern', 'alarums', 'changed', 'to', 'merry', 'meetings', 'ur', 'dreadful', 'marches', 'to', 'delightful', 'measures', 'rim', 'visaged', 'war', 'hath', 'smooth', 'd', 'his', 'wrinkled', 'front', 'nd', 'now', 'instead', 'of', 'mounting', 'barded', 'steeds', 'o', 'fright', 'the', 'souls', 'of', 'fearful', 'adversaries', 'e', 'capers', 'nimbly', 'in', 'a', 'lady', 's', 'chamber', 'o', 'the', 'lascivious', 'pleasing', 'of', 'a', 'lute', 'ut', 'that', 'am', 'not', 'shaped', 'for', 'sportive', 'tricks', 'or', 'made', 'to', 'court', 'an', 'amorous', 'looking', 'glass', 'that', 'am', 'rudely', 'stamp', 'd', '

Capital letters:

In [175]:
string = "Expression@gmail.com The search Proceeds Through the String From start to end, stopping at the first match found"
pattern = r"[A-Z]+"
re.findall(pattern, string)

['E', 'T', 'P', 'T', 'S', 'F']

To search for capital letters "E" **or** "P"

In [176]:
string = "Expression@gmail.com The search Proceeds Through the String From start to end, stopping at the first match found"
pattern = r"[EP]"
re.findall(pattern, string)

['E', 'P']

To extract email ids from a string:

In [165]:
string = '''
Expressio123n@gmail.com The search Proceeds Through the String From start to end, stopping at the first match found.

Expression@yahoo.com
Expression@hotmail.com
Expression@ramboll.com

'''
pattern = r"[\w]+@[\w]+.com"
re.findall(pattern, string)

['Expressio123n@gmail.com',
 'Expression@yahoo.com',
 'Expression@hotmail.com',
 'Expression@ramboll.com']

In [166]:
string = '''
Expressio123n@gmail.com The search Proceeds Through the String From start to end, stopping at the first match found.

Expression@yahoo.com
Expression@hotmail.com
Expression@ramboll.com

'''
pattern = r"[\w]+@[\w]+.com"
re.findall(pattern, string)

['Expressio123n@gmail.com',
 'Expression@yahoo.com',
 'Expression@hotmail.com',
 'Expression@ramboll.com']

To extract valid mobile numbers from the given string:

In [185]:
pattern = r"\b[\d]{2}-\d{10}\b"

string = '''
Expressio123n@gmail.com The search Proceeds Through the String From start to end, stopping at the first match found.

Expression@yahoo.com
Expression@hotmail.com
Expression@ramboll.com

+91-1234567890 - valid number

+911214-1234567890 - invalid number
+91-1234567890 - this one is valid
+91-1234567890 - this one is valid
+91-1234569999231204812 - invalid number
'''
re.findall(pattern, string)

['91-1234567890', '91-1234567890', '91-1234567890']

To practice more on this go this [site](https://regex101.com/) where you can test your pattern in a given string in real-time:
![](https://regex101.com/static/assets/card.png)

# File handling with python

File handling is an important part of any web application.

Python has several functions for creating, reading, updating, and deleting files.

[More on this site](https://www.w3schools.com/python/python_file_handling.asp)

## Reading a file

The key function for working with files in Python is the open() function.

The `open()` function takes two parameters; `filename` and `mode`.

There are four different methods (modes) for opening a file:
- `r` - Read - Default value. Opens a file for reading, error if the file does not exist
- `a` - Append - Opens a file for appending, creates the file if it does not exist
- `w` - Write - Opens a file for writing, creates the file if it does not exist
- `x` - Create - Creates the specified file, returns an error if the file exists

In addition you can specify if the file should be handled as binary or text mode
- `t` - Text - Default value. Text mode
- `b` - Binary - Binary mode (e.g. images)

In [62]:
## To list down the files in the current directory:
import os
os.listdir()

['.ipynb_checkpoints',
 '01_python_basics.ipynb',
 '02_python_basics.ipynb',
 '03_python_basics.ipynb',
 '04_python_basics.ipynb',
 '05-regular_expression_and_files.ipynb',
 '06-numpy.ipynb',
 'attendance',
 'attendance.xlsx',
 'calc.py',
 'example_module.py',
 'functions.py',
 'google-python-exercises',
 'untitled.txt',
 '__pycache__']

### Syntax

```python
file = open("untitled.txt", "r")
```

To open the file, use the built-in `open()` function.

The `open()` function returns a file object, which has a `read()` method for reading the content of the file (**character by character**)

In [71]:
f = open("demo_file.txt", "r")
print(f.read())
f.close()

Hello! Welcome to demofile.txt
This file is for testing purposes.
Good Luck!


It is a good practice to always close the file when you are done with it.

If the file is located in a different location, you will have to specify the file path, like this:

```python
f = open("D:\\myfiles\welcome.txt", "r")
print(f.read())
f.close()
```

### Read only parts of the file

By default the `read()` method returns the whole text, but you can also specify how many characters you want to return

Return the 5 first characters of the file: demo_file.txt

```
Hello! Welcome to demofile.txt
This file is for testing purposes.
Good Luck!
```

In [73]:
f = open("demo_file.txt", "r")
print(f.read(5))
f.close()

Hello


In [77]:
file = open("untitled.txt", "r")
txt = file.read()
print(txt)
file.close()

Now is the winter of our discontent
Made glorious summer by this sun of York;
And all the clouds that lour'd upon our house
In the deep bosom of the ocean buried.
Now are our brows bound with victorious wreaths;
Our bruised arms hung up for monuments;
Our stern alarums changed to merry meetings,
Our dreadful marches to delightful measures.
Grim-visaged war hath smooth'd his wrinkled front;
And now, instead of mounting barded steeds
To fright the souls of fearful adversaries,
He capers nimbly in a lady's chamber
To the lascivious pleasing of a lute.
But I, that am not shaped for sportive tricks,
Nor made to court an amorous looking-glass;
I, that am rudely stamp'd, and want love's majesty
To strut before a wanton ambling nymph;
I, that am curtail'd of this fair proportion,

This is a new line


### Read a line

You can return one line by using the `readline()` method:

In [79]:
f = open("demo_file.txt", "r")
print(f.readline())
f.close()

Hello! Welcome to demofile.txt



By calling `readline()` two times, you can read the two first lines:

In [80]:
f = open("demo_file.txt", "r")
print(f.readline())
print(f.readline())
f.close()

Hello! Welcome to demofile.txt

This file is for testing purposes.



### Looping over the lines in a file

Using `readlines()` method we can return all the lines in a list

In [95]:
f = open("demo_file.txt", "r")
lines = f.readlines()
for line in lines:
    print(line)
f.close()

Hello! Welcome to demofile.txt

This file is for testing purposes.

Good Luck!



This is a new line


### Opening file using `with`

`with` keyword can be used to open a file and this statement automatically closes the file outside of the `with` scope.

So you don't have to type `file.close()` everytime you open a file:

In [92]:
with open("demo_file.txt", "r") as file:
    all_lines = file readlines()
    for line in all_lines:
        print(line)

Hello! Welcome to demofile.txt

This file is for testing purposes.

Good Luck!


## Write/Create a file

To write to an existing file, you must add a parameter to the `open()` function:

`a` - Append - will append to the end of the file

`w` - Write - will overwrite any existing content

### Appending lines to an existing file

Open the file "demofile2.txt" and append content to the file:

In [93]:
with open("demo_file.txt", "a") as file:
    file.write("\n\nThis is a new line")

Open the file and see the changes:

In [94]:
with open("demo_file.txt", "r") as file:
    for line in file.readlines():
        print(line)

Hello! Welcome to demofile.txt

This file is for testing purposes.

Good Luck!



This is a new line


But be cautious while using the mode. For example if you use `w` mode then the existing file's content will be overwritten:

In [97]:
with open("demo_file.txt", "w") as file:
    file.write("\n\nThis is a new line")
    
with open("demo_file.txt", "r") as file:
    for line in file.readlines():
        print(line)    





This is a new line


### Creating a new file

To create a new file in Python, use the `open()` method, with one of the following parameters:

`x` - Create - will create a file, returns an error if the file exist

`a` - Append - will create a file if the specified file does not exist

`w` - Write - will create a file if the specified file does not exist

In [99]:
f = open("demofile4.txt", "x")
f.close()

open the new file created and see the contents

In [100]:
f = open("myfile.txt", "w")
f.write("This is a new text")
f.close()