# Replacing <i>For<i/> Loops With List Comprehension
This notebook is based on this [tutorial](https://www.dataquest.io/blog/regular-expressions-data-scientists/?imm_mid=0f9342&cmp=em-data-na-na-newsltr_20171206). In this notebook, <i>for</i> loops are replaced by list comprehension.

The dataset used in this notebook can be found in the original tutorial [here](https://www.dataquest.io/blog/images/regex/test_emails.txt). This dataset has only a small sample of emails from the original Kaggle fraud email dataset to be found [here](https://www.kaggle.com/rtatman/fraudulent-email-corpus).

First, open the test emails file, and read it into a variable.

In [1]:
fh = open(r"./datasets/test_emails.txt", encoding="utf-8").read()

In the tutorial, to start with, a for loop is used to find lines with the text "From:" without using regular expressions:

In [3]:
# code from the tutorial
for line in fh.split("\n"):
    if "From:" in line:
        print(line)

From: "MR. JAMES NGOLA." <james_ngola2002@maktoob.com>
From: "Mr. Ben Suleman" <bensul2004nng@spinfinder.com>
From: "PRINCE OBONG ELEME" <obong_715@epatra.com>
From: "PRINCE OBONG ELEME" <obong_715@epatra.com>
From: "Maryam Abacha" <m_abacha03@www.com>


The above code can be re-written with list comprehension as follows:

In [4]:
[line for line in fh.split("\n") if "From:" in line]

['From: "MR. JAMES NGOLA." <james_ngola2002@maktoob.com>',
 'From: "Mr. Ben Suleman" <bensul2004nng@spinfinder.com>',
 'From: "PRINCE OBONG ELEME" <obong_715@epatra.com>',
 'From: "PRINCE OBONG ELEME" <obong_715@epatra.com>',
 'From: "Maryam Abacha" <m_abacha03@www.com>']

Next, the tutorial finds lines with the text "From:" using regular expression:

In [5]:
import re

for line in re.findall("From:.*", fh):
    print(line)

From: "MR. JAMES NGOLA." <james_ngola2002@maktoob.com>
From: "Mr. Ben Suleman" <bensul2004nng@spinfinder.com>
From: "PRINCE OBONG ELEME" <obong_715@epatra.com>
From: "PRINCE OBONG ELEME" <obong_715@epatra.com>
From: "Maryam Abacha" <m_abacha03@www.com>


With list comprehension, this becomes:

In [6]:
[line for line in fh.split("\n") if re.findall("From:.*", line)]

['From: "MR. JAMES NGOLA." <james_ngola2002@maktoob.com>',
 'From: "Mr. Ben Suleman" <bensul2004nng@spinfinder.com>',
 'From: "PRINCE OBONG ELEME" <obong_715@epatra.com>',
 'From: "PRINCE OBONG ELEME" <obong_715@epatra.com>',
 'From: "Maryam Abacha" <m_abacha03@www.com>']

Instead of the whole line with the string "From:" in them, the tutorial shows how to extract just the names of senders:

In [7]:
match = re.findall("From:.*", fh)
for line in match:
    print(re.findall("\".*\"", line))

['"MR. JAMES NGOLA."']
['"Mr. Ben Suleman"']
['"PRINCE OBONG ELEME"']
['"PRINCE OBONG ELEME"']
['"Maryam Abacha"']


This can be re-written as a list comprehension:

In [8]:
# extract texts up to the end of names
[re.findall("\".*\"", name) for name in [line for line in fh.split("\n") if re.findall("From:.*", line)]]

[['"MR. JAMES NGOLA."'],
 ['"Mr. Ben Suleman"'],
 ['"PRINCE OBONG ELEME"'],
 ['"PRINCE OBONG ELEME"'],
 ['"Maryam Abacha"']]

Email addresses are extracted next in the tutorial:

In [9]:
match = re.findall("From:.*", fh)

for line in match:
    print(re.findall("\w\S*@.*\w", line))

['james_ngola2002@maktoob.com']
['bensul2004nng@spinfinder.com']
['obong_715@epatra.com']
['obong_715@epatra.com']
['m_abacha03@www.com']


With list comprehension, this becomes:

In [11]:
[re.findall("\w\S*@.*\w", email) for email in [line for line in fh.split("\n") if re.findall("From:.*", line)]]

[['james_ngola2002@maktoob.com'],
 ['bensul2004nng@spinfinder.com'],
 ['obong_715@epatra.com'],
 ['obong_715@epatra.com'],
 ['m_abacha03@www.com']]

A nested for loop is used to extract username and domain name from the email address:

In [13]:
address = re.findall("From:.*", fh)
for item in address:
    for line in re.findall("\w\S*@.*\w", item):
        username, domain_name = re.split("@", line)
        print("{}, {}".format(username, domain_name))

james_ngola2002, maktoob.com
bensul2004nng, spinfinder.com
obong_715, epatra.com
obong_715, epatra.com
m_abacha03, www.com


A list comprehension one-liner can accomplish the same but it comes at a price in readability:

In [22]:
[re.split("@", domain[0]) for domain in [re.findall("\w\S*@.*\w", email) for email in [line for line in fh.split("\n") if re.findall("From:.*", line)]]]

[['james_ngola2002', 'maktoob.com'],
 ['bensul2004nng', 'spinfinder.com'],
 ['obong_715', 'epatra.com'],
 ['obong_715', 'epatra.com'],
 ['m_abacha03', 'www.com']]

Comprehension is one of the features that I love most about Python but as the last example shows, there can be too much of a good thing.