# Regular expression or pattern matching

A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression (or if a given regular expression matches a particular string, which comes down to the same thing).

https://docs.python.org/3/library/re.html

https://www.w3schools.com/python/python_regex.asp

https://regex101.com/

## Metacharacters

| Charatcer | Description | Example |
| ---------|--------------|---------|
|[ ] | A set of characters | [2-5] |
|{ } | Exactly the specified number of occurrences | Sa.{4}p |
|{ } | Exactly the specified number of occurrences | Sa.{4}p |
| \ | Signals a special sequence | \d|
|.	|Any character (except newline character)|	"Sa....p"|
|^|Starts with|^Hi|
|I| Either or |"AIB"|
|+|One or more occurence|"he.+o|



## **compile**

Compiling regular expression objects is useful and efficient when the expression will be used several times in a single program.

## **match**

match() function of re in Python will search the regular expression pattern and return the first occurrence.

In [None]:
import re
DS17 = "DS17 is in the Fourth Week of Python"


compiled_pat = re.compile("[\w]*")
print(compiled_pat)
result = compiled_pat.match(DS17)
print(result)


re.compile('[\\w]*')
<re.Match object; span=(0, 4), match='DS17'>


### **findall**

In [None]:
DS17 = "DS17, is in the Fourth Week of Python"

# The findall() function returns a list containing all matches.
x = re.findall(",", DS17) # here A-Z means A to Z 
print(x)


[',']


In [None]:
print(re.findall("[Z]", DS17))

In [None]:
print(re.findall("[^A-Z]", DS17)) # ^ means any character except A-Z

['1', '7', ',', ' ', 'i', 's', ' ', 'i', 'n', ' ', 't', 'h', 'e', ' ', 'o', 'u', 'r', 't', 'h', ' ', 'e', 'e', 'k', ' ', 'o', 'f', ' ', 'y', 't', 'h', 'o', 'n']


In [None]:
DS15 = "DS15 is in the Third Week of Python"
#Check if the string starts with 'Batch number':

x = re.findall("^DS", DS15) # Starts with DS
if x:
  print("Yes, the string starts with 'batch number'")
else:
  print("No match")


No match


In [None]:
DS15 = "DS15 is in the Third Week of Python."
#Check if the string ends with 'Batch number':

x = re.findall("Python$", DS15) # end with Python
if x:
  print("Yes, the string ends with '.'")
else:
  print("No match")


Yes, the string ends with '.'


### Search

In [None]:
Gene_Seq = "ATGCGCCTACAATCGGTACGTCATCGCGCGCGCTTAC"

# The search() function searches the string for a match, and returns a Match object if there is a match.
print(re.search("GCGCGCGC", Gene_Seq)) #

<re.Match object; span=(25, 33), match='GCGCGCGC'>


In [None]:
Gene_Seq = "ATGCGCCTACAATCGGTACGTCATCGCGCGCGCTTAC"

# The search() function searches the string for a match, and returns a Match object if there is a match.
print(re.findall("GCGCG.+GCT", Gene_Seq)) # string starts with GCGCG and ends with GCT

['GCGCGCGCT']


### Span

In [None]:
Text = "This course is Python"

# returns a tuple containing the start-, and end positions of the match
x = re.search(r"\bc\w+", Text) #

print(x)
print(x.span()) #  returns a tuple containing the start-, and end positions of the match.

<re.Match object; span=(5, 11), match='course'>
(5, 11)


In [None]:
Text = "This course is Python"

# The search() function searches the string for a match, and returns a Match object if there is a match.
x = re.search(r"\bP\w+", Text)
print(x.string) #  returns the string passed into the function

This course is Python


### group

In [None]:
Text = "This course is Python programming"

# The search() function searches the string for a match, and returns a Match object if there is a match.
x = re.search(r"\bp\w+", Text) # Start with P and then any number alphabets of string
print(x)
print(x.group()) # returns the part of the string where there was a match

<re.Match object; span=(22, 33), match='programming'>
programming


In [None]:
print(re.search("((a|b)c)","bcaac").groups())

('bc', 'b')


### split

In [None]:
Information = "There are 60 students in DS15 batch."

print(Information.split('60')) # here I want to split them using number

['There are ', ' students in DS15 batch.']


In [None]:
# The split() function returns a list where the string has been split at each match:
print(re.split("\d", Information)) # It splitted both numbers

['There are ', '', ' students in DS', '', ' batch.']


In [None]:
Date = "10-04-2022"
result = re.split("\D", Date, maxsplit=2) # maxsplit tells how many splits you want
# r for new string
# \D is for non disgit characterstics
print(result)

['10', '04', '2022']


### sub

In [None]:
 # The sub() function replaces the matches with the text of your choice:
Gene_Seq = "ATGCGCGTACAATCGGTACGTCATCGCGCGCGCTTAC"

#  replaces the matches with the text of your choice:
print(re.sub("GCGCGCG", "ATTACHED", Gene_Seq)) #

ATGCGCGTACAATCGGTACGTCATCATTACHEDCTTAC
