In [None]:
# Cheatsheet

# Related work: Regex

Regular Expressions, also known as “regex” or “regexp”, are used to match strings of text such as particular characters, words, or patterns of characters. It means that we can match and extract any string pattern from the text with the help of regular expressions. 


I have used two terms, match and extract and both the terms have a slightly different meaning. There may be cases when we want to match a specific pattern but extract a subset of it. For example, we want to extract the names of PhD scholars from a list of names of people in an organization.

In this case, we will match the “Dr XYZ” keyword and extract only the name, i.e. “XYZ” not the prefix “Dr.” from the list.  Regex is very useful in searching through large texts, emails, and documents. Regex is also called “programming language for the string matching”. Before diving into the regular expressions and their implementation in python, it is important to know their applications in the real world.

Wild Card patterns
The smallest individual units through the regular expressions are formed are called wild-card patterns. The list of commonly used patterns are

^

This wild card matches the characters at the beginning of a line.

$

This wild card matches the characters at the end of the line.

.

This wild card matches any character in the line.

s

This wild card is used to match space in a string.


d

This wild card matches one digit.

*

This wild card repeats any preceding character zero or more times. It matches the longest possible string.

*?

This wild card also repeats any preceding character/characters zero or more times. However, it matches the shortest string following the pattern.

+

This wild card repeats any preceding character one or more times. It matches the longest possible string following the pattern.

+?

This wild card repeats any preceding character one or more times. However, it matches the shortest possible string following the pattern.

[aeiou]

It matches any character from a set of given characters.

[^XYZ]

It matches any character not given in the set.

 [a-z0-9]

It matches any character given in the a-z or 0-9.

(

This wild card represents the beginning of the string extraction.

)


### match method



In [None]:
# This function searches for the RE pattern at the beginning of the string and returns the match object of the string. The value in the object can be accessed through the group() function. The syntax of the match function is

# re.match(pattern, string, flags)




import re
string = """Hi I am Eliza. I am a chatbot. Welcome to the study group. I was born on October 13, 2023. I am a computer program. I am a therapist. I will also run again on October 23, 2023. I am so excited to meet you all."""

#match the string starting with H and ending with b     . ? is used to match the shortest string     . matches any character  and ? matches 0 or 1 occurences of the previous character(.)
pattern=r'^H.+?b' 

print(re.match(pattern,string))      # Returns the match object. An object is a python data type which contains both data and functions. In this case, the object contains a 

print(re.match(pattern,string).group()) #Extracting value from the object

<re.Match object; span=(0, 27), match='Hi I am Eliza. I am a chatb'>
Hi I am Eliza. I am a chatb


### groups

In [None]:
# The previous example was a simple one with match.

# We can also use the match function to extract the multiple values from the string with the help of a group function. The group function is used to extract the value from the match object. For example:

# match the string starting with H and ending with b, the string between Welcome and group, all strings between O and a four digit number

pattern=r'^H.+?b|Welcome.+?group|O.+?\d{4}'
print("Singular greedy match: ", re.match(pattern,string).group()) # group() function is used to extract the value from the match object

#this prints only the first match. To print all the matches, we use the findall function

print("All matches: ", re.findall(pattern,string)) #findall function returns a list of all the matches

Singular greedy match:  Hi I am Eliza. I am a chatb
All matches:  ['Hi I am Eliza. I am a chatb', 'Welcome to the study group', 'October 13, 2023', 'October 23, 2023']


## Python demo: 🖥️ Regex makes a chatbot 🖥️

In [None]:
user_input='hi . nice to meet you'
pattern_greet = r'hi|hello|hey' # | is used to match either of the strings
response_name = "Hi, how can I help you?" # %s is a placeholder for string

if re.match(pattern_greet,user_input):
    print(response_name) #name[0] is the first element of the list name

Hi, how can I help you?


In [None]:
sentence = "hi my name is John. Nice to meet you."
name = re.findall(r'hi my name is (\w+)', sentence)

# Print the name
print('Hi', name[0], 'nice to meet you too')

Hi John nice to meet you too
