# Use Regular Expressions (RegEx) in Python

## RegEx notes and cheatsheet

* use a single `|` to say 'or'
* `[ ]` - says any of the characters inside the brackets
* `?` make the preceding token optional
* `.+` - matches all characters to the end of the line
* use parentheses to group characters as using the `|` alone divide the regex into only a left and right side.
* `^` matches only if it's at the beginning of a string
* `$` matches only if it's the end of a string
* `+` is greedy - it won't just get the next token. it will keep searching until it reaches the last one.
* `+?` is not greedy - it will just search to the next occurrence of a token.
* `{3}` - would match 3 occurrences of the preceding token
* `{3,4}` - could match the range of 3 to 4 characters of the preceding token

### Flags
* `\w` - matches any word character
* `\r` = carriage return
* `\n` = new lines
* `\s` = space characters
* `\w` = word character
* `\d` = digits 0-9

### Lookahead and Lookbehind
* `(?<!`  = negative look behind
* `(?<=`  = positive look behind
* `(?=`  = positive look ahead 
* `(?!`  = negative look ahead

### Verify Email
The first part of email can contain uppercase letters, lowercase letters, and numbers

In [1]:
import re # import the regex module

# define the regex pattern
# a-z = any lowercase letter
# A-Z = any uppercase letter
# 0-9 = any digit
# + = one or more of the preceding character
# @ = the @ symbol
pattern = r"[a-zA-Z0-9]+@[a-zA-Z0-9]+\.(com|edu|net)" 
# you can interpret the above as 
# "one or more of any lowercase letter, 
# uppercase letter, or digit, followed by the @ symbol
# followed by one or more of any lowercase letter,
# uppercase letter, or digit, followed by a period,
# followed by com, edu, or net"




In [2]:
user_input = input() # get user input

if re.search(pattern, user_input): # if the pattern is found in the user input
    print("Valid email address") # print valid email address
else: # otherwise
    print("Invalid email address") # print invalid email address

Invalid email address


### Replacing Substrings

In [3]:
import re

# define the regex pattern for a phone number
pattern = '(\d\d\d)-(\d\d\d)-(\d\d\d\d)' 

# new pattern for the groups of numbers in the phone number
# this will be used to remove the dashes
new_pattern = r"\1\2\3"

In [4]:
user_input = input() # get user input

new_user_input = re.sub(pattern, new_pattern, user_input) # replace the pattern with the new pattern
print(new_user_input) # print the new user input

l


### Methods to Search for Mathces

In [5]:
import re

test_string = "123abc456789abc123ABC"

pattern = re.compile(r"abc") # create a pattern object
matches = pattern.finditer(test_string) # create a list of match objects

# to do this in one line, you can use the following:
# matches = re.finditer(r'abc', test_string) # create a list of match objects

print('Match Ojbects using finditer():')
for match in matches: # for each match object
    print(match) # print the match object

# to just return the substring that matches the pattern, use the following:
targets = re.findall(pattern, test_string) # create a list of substrings that match the pattern

print('\nTargets using findall():')
for target in targets: # for each substring
    print(target) # print the substring

# to find a single match at the beginning of the string, use the following:
match = re.match(pattern, test_string) # create a match object
    
print('\nMatch Object (beginning of string) using match():')
print(match) # print the match object, if it exists, otherwise None

# to find a single match anywhere in the string, use the following:
match = re.search(pattern, test_string) # create a match object

print('\nMatch Object (whole string) using search():')
print(match) # print the match object, if it exists, otherwise None


Match Ojbects using finditer():
<re.Match object; span=(3, 6), match='abc'>
<re.Match object; span=(12, 15), match='abc'>

Targets using findall():
abc
abc

Match Object (beginning of string) using match():
None

Match Object (whole string) using search():
<re.Match object; span=(3, 6), match='abc'>


### Methods on a Match Object

In [6]:
import re

test_string = '123abc456789abc123ABC'

pattern = re.compile(r'abc') # create a pattern object
matches = pattern.finditer(test_string) # create a list of match objects

for match in matches: # for each match object
    print(match) # print the match object
    print(match.span()) # print the start and end positions of the match
    print(match.start()) # print the start position of the match
    print(match.end()) # print the end position of the match
    print(match.group()) # print the substring that matches the pattern

<re.Match object; span=(3, 6), match='abc'>
(3, 6)
3
6
abc
<re.Match object; span=(12, 15), match='abc'>
(12, 15)
12
15
abc


### Meta Characters

All meta characters: . ^ $ * + ? { } [ ] \ | ( )

* `.` - any character except newline
* `^` - beginning of a string "^Hello"
* `$` - end of a string "World$"
* `*` - 0 or more occurrences "aix*"
* `+` - 1 or more occurrences "aix+"
* `?` - 0 or 1 occurrence "aix?"
* `{n}` - exactly n occurrences "al{2}"
* `{n,}` - n or more occurrences "al{2,}"
* `{,n}` - 0 to n occurrences "al{,2}"
* `{m,n}` - m to n occurrences "al{2,3}"
* `[abc]` - matches a, b, or c 
* `[a-z]` - matches any lowercase letter
* `[A-Z]` - matches any uppercase letter
* `[0-9]` - matches any digit
* `[^abc]` - matches anything except a, b, or c
* `[^0-9]` - matches anything except a digit
* `\` - escape character
* `|` - matches either expression
* `()` - groups subexpressions

#### More Meta Characters
* `\d` - any digit [0-9]
* `\D` - any non-digit
* `\s` - any whitespace character [space " " tab "\t" newline "\n" carriage return "\r"]
* `\S` - any non-whitespace character
* `\w` - any alphanumeric character [a-zA-Z0-9_]
* `\W` - any non-alphanumeric character
* `\b` - matches where the pattern is at the start or end of a word
* `\B` - matches where the pattern is in the word, but not at the start or end of a string



In [7]:
import re

test_string = '123abc456789abc123ABC.'

pattern = re.compile(r'.') # matches any character except newline
# pattern = re.compile(r'\.') # matches the period character
# pattern = re.compile(r'^123') # does the string start with 123?
# pattern = re.compile(r'^abc') # does the string start with abc?
# pattern = re.compile(r'123$') # does the string end with 123?
# pattern = re.compile(r'ABC\.$') # does the string end with ABC.?


matches = pattern.finditer(test_string) # create a list of match objects

for match in matches: # for each match object
    print(match) # print the substring that matches the pattern


<re.Match object; span=(0, 1), match='1'>
<re.Match object; span=(1, 2), match='2'>
<re.Match object; span=(2, 3), match='3'>
<re.Match object; span=(3, 4), match='a'>
<re.Match object; span=(4, 5), match='b'>
<re.Match object; span=(5, 6), match='c'>
<re.Match object; span=(6, 7), match='4'>
<re.Match object; span=(7, 8), match='5'>
<re.Match object; span=(8, 9), match='6'>
<re.Match object; span=(9, 10), match='7'>
<re.Match object; span=(10, 11), match='8'>
<re.Match object; span=(11, 12), match='9'>
<re.Match object; span=(12, 13), match='a'>
<re.Match object; span=(13, 14), match='b'>
<re.Match object; span=(14, 15), match='c'>
<re.Match object; span=(15, 16), match='1'>
<re.Match object; span=(16, 17), match='2'>
<re.Match object; span=(17, 18), match='3'>
<re.Match object; span=(18, 19), match='A'>
<re.Match object; span=(19, 20), match='B'>
<re.Match object; span=(20, 21), match='C'>
<re.Match object; span=(21, 22), match='.'>


In [8]:
import re

test_string = r'hello 123_ heyho $% honey'

# pattern = re.compile(r'\d') # matches any digit
# pattern = re.compile(r'\D') # match any non-digit
# pattern = re.compile(r'\s') # matches any whitespace character
# pattern = re.compile(r'\S') # matches any non-whitespace character
# pattern = re.compile(r'\w') # matches any alphanumeric character
# pattern = re.compile(r'\W') # matches any non-alphanumeric character
# pattern = re.compile(r'\bhello') # matches the word hello at the beginning or end of a string
# pattern = re.compile(r'\bhoney') # matches the word honey at the beginning or end of a string
# pattern = re.compile(r'\b23') # matches the number 23 at the beginning or end of a string
# pattern = re.compile(r'\Bhello') # matches the word hello anywhere in a string except the beginning or end
# pattern = re.compile(r'\Bhoney') # matches the word honey anywhere in a string except the beginning or end
pattern = re.compile(r'\B23') # matches the number 23 anywhere in a string except the beginning or end

matches = pattern.finditer(test_string) # create a list of match objects

for match in matches: # for each match object
    print(match) # print the substring that matches the pattern


<re.Match object; span=(7, 9), match='23'>


### Character Sets
A pattern between square brackets `[]` matches any single character in the set.

In [14]:
import re

test_string = r'hello 123_'

# pattern = re.compile(r'[lo]') # does the string contain l or o?
# pattern = re.compile(r'[a-z]') # does the string contain any lowercase letter?
# pattern = re.compile(r'[A-Z]') # does the string contain any uppercase letter?
# pattern = re.compile(r'[0-9]') # does the string contain any digit?
# pattern = re.compile(r'[a-zA-Z]') # does the string contain any letter?
pattern = re.compile(r'[a-zA-Z0-9]') # does the string contain any alphanumeric character?

matches = pattern.finditer(test_string) # create a list of match objects

for match in matches: # for each match object
    print(match) # print the substring that matches the pattern


<re.Match object; span=(0, 1), match='h'>
<re.Match object; span=(1, 2), match='e'>
<re.Match object; span=(2, 3), match='l'>
<re.Match object; span=(3, 4), match='l'>
<re.Match object; span=(4, 5), match='o'>
<re.Match object; span=(6, 7), match='1'>
<re.Match object; span=(7, 8), match='2'>
<re.Match object; span=(8, 9), match='3'>


### Quantifiers
* `*` - 0 or more
* `+` - 1 or more
* `?` - 0 or 1
* `{n}` - exactly n
* `{n,}` - n or more
* `{,n}` - 0 to n
* `{m,n}` - m to n


In [23]:
import re

test_string = r'hello 123_456_7890'

# pattern = re.compile(r'\d*') # matches any number of digits
# pattern = re.compile(r'\d+') # matches one or more digits
# pattern = re.compile(r'\d?') # matches zero or one digits
# pattern = re.compile(r'\d{3}') # matches exactly three digits
# pattern = re.compile(r'\d{3,}') # matches three or more digits
# pattern = re.compile(r'\d{,3}') # matches three or more digits
# pattern = re.compile(r'\d{3,5}') # matches three, four, or five digits
pattern = re.compile(r'_?\d') # matches an optional underscore followed by one digit


matches = pattern.finditer(test_string) # create a list of match objects

for match in matches: # for each match object
    print(match) # print the substring that matches the pattern

<re.Match object; span=(6, 7), match='1'>
<re.Match object; span=(7, 8), match='2'>
<re.Match object; span=(8, 9), match='3'>
<re.Match object; span=(9, 11), match='_4'>
<re.Match object; span=(11, 12), match='5'>
<re.Match object; span=(12, 13), match='6'>
<re.Match object; span=(13, 15), match='_7'>
<re.Match object; span=(15, 16), match='8'>
<re.Match object; span=(16, 17), match='9'>
<re.Match object; span=(17, 18), match='0'>


In [27]:
import re

test_string = """
2022-01-01
2022-01-02
2022-01-03
2022-01-04
2022-01-05
"""

# pattern = re.compile(r'\d{4}[-/]\d{2}[-/]\d{2}') # matches a date in the format yyyy-mm-dd or yyyy/mm/dd
pattern = re.compile(r'\d{4}[-/]\d{2}[-/]0[23]') # matches a date in the format yyyy-mm-dd or yyyy/mm/dd where the day is either 02 or 03

matches = pattern.finditer(test_string) # create a list of match objects

for match in matches: # for each match object
    print(match) # print the substring that matches the pattern


<re.Match object; span=(12, 22), match='2022-01-02'>
<re.Match object; span=(23, 33), match='2022-01-03'>


### Conditions

* `|` - or 

In [32]:
import re

test_string = """
hello world
123
2022-01-01
Mr Simpson
Mrs Simpson
Mr. Brown
Ms Smith
Mr. T
"""

pattern = re.compile(r'(Mr|Ms|Mrs)\.?\s\w+') # matches a title followed by a space followed by a name

matches = pattern.finditer(test_string) # create a list of match objects

for match in matches: # for each match object
    print(match) # print the substring that matches the pattern

<re.Match object; span=(28, 38), match='Mr Simpson'>
<re.Match object; span=(39, 50), match='Mrs Simpson'>
<re.Match object; span=(51, 60), match='Mr. Brown'>
<re.Match object; span=(61, 69), match='Ms Smith'>
<re.Match object; span=(70, 75), match='Mr. T'>


### Grouping
* `()` - group
* `[ ]` - character set
* `[^ ]` - negated character set

In [34]:
import re

test_string = """
hello world
123
2022-01-01
Mr Simpson
Mrs Simpson
Mr. Brown
Ms Smith
Mr. T
pythonengineer@gmail.com
Python-Engineer@gmx.de
PythonEndgineer123@my-domain.org
"""

pattern = re.compile(r'([a-zA-Z0-9-]+)@([a-zA-Z-]+)\.(com|de|org|net|gov|edu)') # matches an email address

matches = pattern.finditer(test_string) # create a list of match objects

for match in matches: # for each match object
    print(match) # print the substring that matches the pattern
    print(match.group(0)) # print the entire match
    print(match.group(1)) # print the first group
    print(match.group(2)) # print the second group
    print(match.group(3)) # print the third group

<re.Match object; span=(76, 100), match='pythonengineer@gmail.com'>
pythonengineer@gmail.com
pythonengineer
gmail
com
<re.Match object; span=(101, 123), match='Python-Engineer@gmx.de'>
Python-Engineer@gmx.de
Python-Engineer
gmx
de
<re.Match object; span=(124, 156), match='PythonEndgineer123@my-domain.org'>
PythonEndgineer123@my-domain.org
PythonEndgineer123
my-domain
org


### Modification

* `split()` - split a string into a list
* `sub()` - replace a string with another string

#### Splitting

In [37]:
import re

test_string = r"123abc456789abc123ABC"

pattern = re.compile(r'abc') # matches the string abc

splitted = pattern.split(test_string) # split the string at each match of the pattern

print(splitted) # print the list of strings


['123', '456789', '123ABC']


#### Substituting

In [39]:
import re

test_string = r"hello world, you are the best world"

pattern = re.compile(r'world') # matches the string world

subbed = pattern.sub('planet', test_string) # replace each match of the pattern with the string planet

print(subbed) # print the modified string

hello planet, you are the best planet


#### Splitting and Substituting

In [57]:
import re

urls = """
hello world
2022-01-01
(123) 456-7890
https://www.google.com
http://www.python-engineer.com
http://python-engineer.com
https://www.pyeng.net
"""

pattern = re.compile(r'https?://(www\.)?([\w-]+)\.(\w+)') # matches a url

matches = pattern.finditer(urls) # create a list of match objects

for match in matches: # for each match object
    print(match) # print the substring that matches the pattern
    
    subbed_url = pattern.sub(r'\2.\3', match.group(0)) # replace the entire match with the second and third groups
    print(subbed_url) # print the modified string

    print('\n')



<re.Match object; span=(39, 61), match='https://www.google.com'>
google.com


<re.Match object; span=(62, 92), match='http://www.python-engineer.com'>
python-engineer.com


<re.Match object; span=(93, 119), match='http://python-engineer.com'>
python-engineer.com


<re.Match object; span=(120, 141), match='https://www.pyeng.net'>
pyeng.net




### Compilation Flags

* `ASCII, A` - make \w, \W, \b, \B, \d, \D, \s and \S perform ASCII-only matching
* `DOTALL, S` - make . match any character, including a newline
* `IGNORECASE, I` - perform case-insensitive matching
* `LOCALE, L` - make \w, \W, \b, \B, \d, \D, \s and \S perform locale-aware matching
* `MULTILINE, M` - make ^ and $ match the begining and end of lines, and make . match any character, including a newline
* `VERBOSE, X` - ignore whitespace and comments for nicer looking RE's


In [58]:
import re

text_string = "Hello World"

pattern = re.compile(r'hello world', re.IGNORECASE) # matches the string hello world, ignoring case

matches = pattern.finditer(text_string) # create a list of match objects

for match in matches: # for each match object
    print(match) # print the substring that matches the pattern
    

<re.Match object; span=(0, 11), match='Hello World'>


## Practical Example
filename = test_regex.txt

In [68]:
import re
from pathlib import Path

file_path = Path(r'test_regex.txt') # create a Path object for the file

with open(file_path, 'r') as file: # open the file
    text_string = file.read() # read the entire file into a string

    for line in re.findall(r"From:.*", text_string): # for each line that matches the pattern
        print(line) # print the line

    matches = re.findall(r"From:.*", text_string) # create a list of all matches

    print('\n')

    for match in matches:
        print(re.findall(r'\".*\"', match)) # print the name in quotes
        print(re.findall(r'\w\S*@.*\w', match)) # print the email address
        
        for email in re.findall(r'\w\S*@.*\w', match):
            username, domain_name = re.split(r"@", email) # split the email address into a username and domain name
            print(f"{username}, {domain_name}") # print the username and domain name

        print('\n')

From: "Mr. Ben Suleman" <bensul2004nng@spinfinder.com>
From: "PRINCE OBONG ELEME" <obong_715@epatra.com>


['"Mr. Ben Suleman"']
['bensul2004nng@spinfinder.com']
bensul2004nng, spinfinder.com


['"PRINCE OBONG ELEME"']
['obong_715@epatra.com']
obong_715, epatra.com




### Regex with Pandas

In [101]:
import re
import pandas as pd
from pathlib import Path
import email

emails = []

file_path = Path(r'test_regex.txt') # create a Path object for the file

with open(file_path, 'r') as file: # open the file
    text_string = file.read() # read the entire file into a string
    
    contents = re.split(r"From r", text_string) # split the string at each match of the pattern
    contents.pop(0) # remove the first element of the list, which is empty

    for item in contents:
        emails_dict = {}

        # find the sender's email address and name

        # step 1: find the whole line beginning with From:
        sender = re.search(r"From:.*", item)

        # step 2: find the email address and name
        if sender:
            sender_email = re.search(r'\w\S*@.*\w', sender.group(0)) # find the email address
            sender_name = re.search(r':.*<', sender.group(0)) # find the name, which is between the colon and the opening angle bracket

            # print("sender type:" + str(type(sender)))
            # print("sender.group() type:" + str(type(sender.group())))
            # print("sender:" + str(sender))
            # print("sender.group():" + str(sender.group()))
            # print("\n")
        else:
            sender_email = None
            sender_name = None

        # step 3A: assign email address as string to dictionary
        if sender_email:
            sender_email = sender_email.group(0) # assign the email address as a string
        else:
            sender_email = None
        
        emails_dict['sender_email'] = sender_email

        # step 3B: assign name as string to dictionary after removing unwanted substrings
        if sender_name:
            sender_name = re.sub(r"s*<", "", re.sub(r":s*", "", sender_name.group(0))) # remove the colon, the opening angle bracket, and any whitespace
        else:
            sender_name = None

        emails_dict['sender_name'] = sender_name # assign the name as a string

        # print(sender_email)
        # print(sender_name)

        # find the recipient's email address and name
        recipient = re.search(r"To:.*", item) # find the whole line beginning with To:

        if recipient:
            recipient_email = re.search(r'\w\S*@.*\w', recipient.group(0))
            recipient_name = re.search(r':.*<', recipient.group(0))
        else:
            recipient_email = None
            recipient_name = None

        if recipient_email:
            recipient_email = recipient_email.group(0)
        else:
            recipient_email = None

        emails_dict['recipient_email'] = recipient_email

        if recipient_name:
            recipient_name = re.sub(r"\s*<", "", re.sub(r":\s*", "", recipient_name.group(0)))
        else:
            recipient_name = None

        emails_dict['recipient_name'] = recipient_name

        # print(recipient_email)
        # print(recipient_name)

        # Get the date
        date_field = re.search(r"Date:.*", item) # find the whole line beginning with Date:

        if date_field:
            date = re.search(r"\d+\s\w+\s\d+", date_field.group(0))
            # print(date_field.group(0))
        else:
            date = None

        if date:
            date_sent = date.group(0) # assign the date as a string
        else:
            date_sent = None

        emails_dict['date_sent'] = date_sent

        # Get the subject

        subject_field = re.search(r"Subject:.*", item) # find the whole line beginning with Subject:

        if subject_field:
            subject = re.search(r"\s.*", subject_field.group(0)).group(0) # find the subject, which is everything after the colon
            # print(subject)
        else:
            subject = None

        emails_dict['subject'] = subject
        
        # Get the body
        full_email = email.message_from_string(item) # create an email object from the string
        body = full_email.get_payload() # get the body of the email
        # print(body)

        emails_dict['body'] = body

        # add the emails dict to the list of emails
        emails.append(emails_dict)

        # print('\n')

# print the number of dictionaries (and emails) in the list
print(f"Number of emails: {len(emails)}")
print('\n')

# # print the first item in the list
# for key, value in emails[0].items():
#     print(f"{key}: {value}")

df_emails = pd.DataFrame(emails) # create a DataFrame from the list of dictionaries
df_emails.head() # print the first five rows of the DataFrame


Number of emails: 30




Unnamed: 0,sender_email,sender_name,recipient_email,recipient_name,date_sent,subject,body
0,james_ngola2002@maktoob.com,"""MR. JAMES NGOLA.""",james_ngola2002@maktoob.com,,31 Oct 2002,URGENT BUSINESS ASSISTANCE AND PARTNERSHIP,FROM:MR. JAMES NGOLA.\nCONFIDENTIAL TEL: 233-2...
1,bensul2004nng@spinfinder.com,"""Mr. Ben Suleman""",R@M,,31 Oct 2002,URGENT ASSISTANCE /RELATIONSHIP (P),"Dear Friend,\n\nI am Mr. Ben Suleman a custom ..."
2,obong_715@epatra.com,"""PRINCE OBONG ELEME""",obong_715@epatra.com,,31 Oct 2002,GOOD DAY TO YOU,FROM HIS ROYAL MAJESTY (HRM) CROWN RULER OF EL...
3,obong_715@epatra.com,"""PRINCE OBONG ELEME""",webmaster@aclweb.org,,31 Oct 2002,GOOD DAY TO YOU,FROM HIS ROYAL MAJESTY (HRM) CROWN RULER OF EL...
4,m_abacha03@www.com,"""Maryam Abacha""",m_abacha03@www.com,,1 Nov 2002,I Need Your Assistance.,"Dear sir, \n \nIt is with a heart full of hope..."


In [102]:
df_emails.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   sender_email     29 non-null     object
 1   sender_name      26 non-null     object
 2   recipient_email  30 non-null     object
 3   recipient_name   0 non-null      object
 4   date_sent        25 non-null     object
 5   subject          30 non-null     object
 6   body             30 non-null     object
dtypes: object(7)
memory usage: 1.8+ KB


In [103]:
# find records where the sender's email constains 'epatra' or 'spinfinder'
df_emails.loc[df_emails["sender_email"].str.contains("epatra|spinfinder", na=False)]

Unnamed: 0,sender_email,sender_name,recipient_email,recipient_name,date_sent,subject,body
1,bensul2004nng@spinfinder.com,"""Mr. Ben Suleman""",R@M,,31 Oct 2002,URGENT ASSISTANCE /RELATIONSHIP (P),"Dear Friend,\n\nI am Mr. Ben Suleman a custom ..."
2,obong_715@epatra.com,"""PRINCE OBONG ELEME""",obong_715@epatra.com,,31 Oct 2002,GOOD DAY TO YOU,FROM HIS ROYAL MAJESTY (HRM) CROWN RULER OF EL...
3,obong_715@epatra.com,"""PRINCE OBONG ELEME""",webmaster@aclweb.org,,31 Oct 2002,GOOD DAY TO YOU,FROM HIS ROYAL MAJESTY (HRM) CROWN RULER OF EL...


In [116]:
# Step 1: find the index where the "sender_email" column contains @spinfinder.com
index = df_emails.loc[df_emails["sender_email"].str.contains(r"\w\S*@spinfinder.com", na=False)].index.values # find the index where the sender's email contains @spinfinder.com

# Step 2: use the index to find the value of the cell in the "sender_email" column.
# the result is returned as a pandas series object
srs_address = df_emails.loc[index]["sender_email"]
print(srs_address)

# Step 3: extract the email address from the series object as a string
str_address = srs_address.to_string(index=False)
print(str_address)

# Step 4: find the value of the "email_body" column where the "sender_email" column = str_address
str_email = df_emails.loc[df_emails["sender_email"] == str_address].body.values
print(str_email)

1    bensul2004nng@spinfinder.com
Name: sender_email, dtype: object
bensul2004nng@spinfinder.com
['Dear Friend,\n\nI am Mr. Ben Suleman a custom officer and work as Assistant controller of the Customs and Excise department Of the Federal Ministry of Internal Affairs stationed at the Murtala Mohammed International Airport, Ikeja, Lagos-Nigeria.\n\nAfter the sudden death of the former Head of state of Nigeria General Sanni Abacha on June 8th 1998 his aides and immediate members of his family were arrested while trying to escape from Nigeria in a Chartered jet to Saudi Arabia with 6 trunk boxes Marked "Diplomatic Baggage". Acting on a tip-off as they attempted to board the Air Craft,my officials carried out a thorough search on the air craft and discovered that the 6 trunk boxes contained foreign currencies amounting to US$197,570,000.00(One Hundred and  Ninety-Seven Million Five Hundred Seventy Thousand United States Dollars).\n\nI declared only (5) five boxes to the government and withh