# Identify Fraud from Enron Email
## Part2: Predict POIs using email messages (in progress)

In this project, I will use the actual email messages to predict a person of interest (POI) who was involved in the fraud.

In [159]:
import numpy as np
import pandas as pd
import email
import string

## Datasets

I will use two files. Here are the file names and their short descriptions.

- Emails.csv

I obtained the dataset from https://www.kaggle.com/wcukierski/enron-email-dataset, but there are other sources in different formats (e.g., https://www.cs.cmu.edu/~./enron/). The data was originally obtained by the Federal Energy Regulatory Commission for its investigation of Enron's collapse.  It contains emails generated by Enron employees. 

- poi_names_modified.txt 

This contains the names of 35 POIs, Enron employees who were involved in the fraud, and I obtained this data from a Udacity course.

In [3]:
emails =pd.read_csv('emails.csv')

In [4]:
emails.describe()

Unnamed: 0,file,message
count,517401,517401
unique,517401,517401
top,crandell-s/sent_items/146.,Message-ID: <15876203.1075858020099.JavaMail.e...
freq,1,1


This data has around 500,000 rows each with two columns, file and message. All entries are unique.

In [198]:
emails.head()

Unnamed: 0,file,message
0,allen-p/_sent_mail/1.,Message-ID: <18782981.1075855378110.JavaMail.e...
1,allen-p/_sent_mail/10.,Message-ID: <15464986.1075855378456.JavaMail.e...
2,allen-p/_sent_mail/100.,Message-ID: <24216240.1075855687451.JavaMail.e...
3,allen-p/_sent_mail/1000.,Message-ID: <13505866.1075863688222.JavaMail.e...
4,allen-p/_sent_mail/1001.,Message-ID: <30922949.1075863688243.JavaMail.e...


In [200]:
print(emails['file'][0])

allen-p/_sent_mail/1.


The 'file' column has a file name for each email message. The file name consists of sender's last name and initial of first name, and a number. The combination of first name and last name initial could be good information for sender identifications. 

In [209]:
print(emails['message'][5000])

Message-ID: <32259460.1075852688586.JavaMail.evans@thyme>
Date: Fri, 5 Oct 2001 01:31:17 -0700 (PDT)
From: jennifer.fraser@enron.com
To: john.arnold@enron.com
Subject: RE: right about now dont u think u otta sell some calls against yr
 36.88s
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: Fraser, Jennifer </O=ENRON/OU=NA/CN=RECIPIENTS/CN=JFRASER>
X-To: Arnold, John </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Jarnold>
X-cc: 
X-bcc: 
X-Folder: \JARNOLD (Non-Privileged)\Arnold, John\Deleted Items
X-Origin: Arnold-J
X-FileName: JARNOLD (Non-Privileged).pst

becuase we are overvalued .... jan01 37.50 2.90 bid

 -----Original Message-----
From: 	Arnold, John  
Sent:	Thursday, October 04, 2001 10:25 PM
To:	Fraser, Jennifer
Subject:	RE: right about now dont u think u otta sell some calls against yr 36.88s

because we're $10 off the lows or because you think we're overvalued?

 -----Original Message-----
From: 	Fraser, Jennifer  
Sent:	Thursday, Octobe

Email messages consist of many parts in key-value pairs and text contents at the end. Some original or forwarded messages can be included in the contents. 

In [231]:
# Get POI's full names
poi_names = pd.read_csv('poi_names_modified.txt', sep=" ", header=None)
poi_names.columns = ['YesNo','Last_Name','First_Name']
poi_names

Unnamed: 0,YesNo,Last_Name,First_Name
0,(y),"Lay,",Kenneth
1,(y),"Skilling,",Jeffrey
2,(n),"Howard,",Kevin
3,(n),"Krautz,",Michael
4,(n),"Yeager,",Scott
5,(n),"Hirko,",Joseph
6,(n),"Shelby,",Rex
7,(n),"Bermingham,",David
8,(n),"Darby,",Giles
9,(n),"Mulgrew,",Gary


In [218]:
poi_names.describe()

Unnamed: 0,YesNo,Last_Name,First_Name
count,35,35,35
unique,2,34,27
top,(n),"Fastow,",David
freq,31,2,3


This data shows there are 35 POIs. A lecture in the Udacity says some of them are not Enron employees. I will find out which POIs are also in the email data. I do not know about the first column, but I will find out if possible. This data will be improved in the cleaning section.

## Cleaning and Exploring

Cleaning Emails.csv would require a lot of cleaning steps to be ready for POI predictions. 

In [227]:
# Check message_from_string() (It returns a message object structure from a string)
print(type(emails['message'][0]))
print(type(email.message_from_string(emails['message'][0])))

<class 'str'>
<class 'email.message.Message'>


In [221]:
#test_msg = email.message_from_string(emails['message'][1])
#test_msg.keys()                  

In [229]:
# Check walk()
email.message_from_string(emails['message'][1]).walk()

<generator object walk at 0x000001C19FFCE678>

In [223]:
# https://docs.python.org/2/library/email.message.html
#for part in test_msg.walk():
#     print(part.get_content_type())  

In [224]:
#for part in test_msg.walk():
#    if part.get_content_type()=='text/plain':
#        print("Part START \n", part.get_payload(), "\nPart END")
#test1= part.get_payload()

In [225]:
#parts = []
#for part in test_msg.walk():
#    if part.get_content_type()=='text/plain':
#        parts.append(part.get_payload()) 
#print(''.join(parts))
#test2 = ''.join(parts)

In [226]:
#test1==test2

In [232]:
counts=[]
for i in range(emails.shape[0]):
    test_msg = email.message_from_string(emails['message'][i])
    count = 0
    for part in test_msg.walk():
        if part.get_content_type()=='text/plain':
            count +=1
    counts.append(count)
sum(counts)

517401

In [233]:
sum(counts)==emails.shape[0]

True

There is only one part with 'text/plain' content type for each message.

In [236]:
# how to get content from one email message
#test_msg = email.message_from_string(emails['message'][1])
#
#for part in test_msg.walk():
#    if part.get_content_type()=='text/plain':
#        print(part.get_payload()) 

In [235]:
# function extracting content from email message
def get_content_from_message(message):
    for part in message.walk():
        if part.get_content_type()== 'text/plain':
            return part.get_payload()

print(get_content_from_message(email.message_from_string(emails['message'][1])))

Traveling to have a business meeting takes the fun out of the trip.  Especially if you have to prepare a presentation.  I would suggest holding the business plan meetings here then take a trip without any formal business meetings.  I would even try and get some honest opinions on whether a trip is even desired or necessary.

As far as the business meetings, I think it would be more productive to try and stimulate discussions across the different groups about what is working and what is not.  Too often the presenter speaks and the others are quiet just waiting for their turn.   The meetings might be better if held in a round table discussion format.  

My suggestion for where to go is Austin.  Play golf and rent a ski boat and jet ski's.  Flying somewhere takes too much time.



In [238]:
# Get a list of message objects from strings
messages = list(map(email.message_from_string, emails['message']))
#messages[0].keys()

In [239]:
# Make a new data frame
enron_df = pd.DataFrame()
# make a content column from message objects
enron_df['content'] = list(map(get_content_from_message, messages))
# make columns from other documents
keys = messages[0].keys()  
for key in keys:
    enron_df[key]= [message[key] for message in messages]

In [240]:
# Get sender id from 'file' column
enron_df['sender_id'] = emails['file'].map(lambda x:x.split('/')[0])

In [394]:
enron_df.head(10)

Unnamed: 0,content,Message-ID,Date,From,To,Subject,Mime-Version,Content-Type,Content-Transfer-Encoding,X-From,X-To,X-cc,X-bcc,X-Folder,X-Origin,X-FileName,sender_id
0,Here is our forecast\n\n,<18782981.1075855378110.JavaMail.evans@thyme>,"Mon, 14 May 2001 16:39:00 -0700 (PDT)",phillip.allen@enron.com,tim.belden@enron.com,,1.0,text/plain; charset=us-ascii,7bit,Phillip K Allen,Tim Belden <Tim Belden/Enron@EnronXGate>,,,"\Phillip_Allen_Jan2002_1\Allen, Phillip K.\'Se...",Allen-P,pallen (Non-Privileged).pst,allen-p
1,Traveling to have a business meeting takes the...,<15464986.1075855378456.JavaMail.evans@thyme>,"Fri, 4 May 2001 13:51:00 -0700 (PDT)",phillip.allen@enron.com,john.lavorato@enron.com,Re:,1.0,text/plain; charset=us-ascii,7bit,Phillip K Allen,John J Lavorato <John J Lavorato/ENRON@enronXg...,,,"\Phillip_Allen_Jan2002_1\Allen, Phillip K.\'Se...",Allen-P,pallen (Non-Privileged).pst,allen-p
2,test successful. way to go!!!,<24216240.1075855687451.JavaMail.evans@thyme>,"Wed, 18 Oct 2000 03:00:00 -0700 (PDT)",phillip.allen@enron.com,leah.arsdall@enron.com,Re: test,1.0,text/plain; charset=us-ascii,7bit,Phillip K Allen,Leah Van Arsdall,,,\Phillip_Allen_Dec2000\Notes Folders\'sent mail,Allen-P,pallen.nsf,allen-p
3,"Randy,\n\n Can you send me a schedule of the s...",<13505866.1075863688222.JavaMail.evans@thyme>,"Mon, 23 Oct 2000 06:13:00 -0700 (PDT)",phillip.allen@enron.com,randall.gay@enron.com,,1.0,text/plain; charset=us-ascii,7bit,Phillip K Allen,Randall L Gay,,,\Phillip_Allen_Dec2000\Notes Folders\'sent mail,Allen-P,pallen.nsf,allen-p
4,Let's shoot for Tuesday at 11:45.,<30922949.1075863688243.JavaMail.evans@thyme>,"Thu, 31 Aug 2000 05:07:00 -0700 (PDT)",phillip.allen@enron.com,greg.piper@enron.com,Re: Hello,1.0,text/plain; charset=us-ascii,7bit,Phillip K Allen,Greg Piper,,,\Phillip_Allen_Dec2000\Notes Folders\'sent mail,Allen-P,pallen.nsf,allen-p
5,"Greg,\n\n How about either next Tuesday or Thu...",<30965995.1075863688265.JavaMail.evans@thyme>,"Thu, 31 Aug 2000 04:17:00 -0700 (PDT)",phillip.allen@enron.com,greg.piper@enron.com,Re: Hello,1.0,text/plain; charset=us-ascii,7bit,Phillip K Allen,Greg Piper,,,\Phillip_Allen_Dec2000\Notes Folders\'sent mail,Allen-P,pallen.nsf,allen-p
6,Please cc the following distribution list with...,<16254169.1075863688286.JavaMail.evans@thyme>,"Tue, 22 Aug 2000 07:44:00 -0700 (PDT)",phillip.allen@enron.com,"david.l.johnson@enron.com, john.shafer@enron.com",,1.0,text/plain; charset=us-ascii,7bit,Phillip K Allen,"david.l.johnson@enron.com, John Shafer",,,\Phillip_Allen_Dec2000\Notes Folders\'sent mail,Allen-P,pallen.nsf,allen-p
7,any morning between 10 and 11:30,<17189699.1075863688308.JavaMail.evans@thyme>,"Fri, 14 Jul 2000 06:59:00 -0700 (PDT)",phillip.allen@enron.com,joyce.teixeira@enron.com,Re: PRC review - phone calls,1.0,text/plain; charset=us-ascii,7bit,Phillip K Allen,Joyce Teixeira,,,\Phillip_Allen_Dec2000\Notes Folders\'sent mail,Allen-P,pallen.nsf,allen-p
8,1. login: pallen pw: ke9davis\n\n I don't thi...,<20641191.1075855687472.JavaMail.evans@thyme>,"Tue, 17 Oct 2000 02:26:00 -0700 (PDT)",phillip.allen@enron.com,mark.scott@enron.com,Re: High Speed Internet Access,1.0,text/plain; charset=us-ascii,7bit,Phillip K Allen,Mark Scott,,,\Phillip_Allen_Dec2000\Notes Folders\'sent mail,Allen-P,pallen.nsf,allen-p
9,---------------------- Forwarded by Phillip K ...,<30795301.1075855687494.JavaMail.evans@thyme>,"Mon, 16 Oct 2000 06:44:00 -0700 (PDT)",phillip.allen@enron.com,zimam@enron.com,FW: fixed forward or other Collar floor gas pr...,1.0,text/plain; charset=us-ascii,7bit,Phillip K Allen,zimam@enron.com,,,\Phillip_Allen_Dec2000\Notes Folders\'sent mail,Allen-P,pallen.nsf,allen-p


In [146]:
enron_df.tail()

Unnamed: 0,content,Message-ID,Date,From,To,Subject,Mime-Version,Content-Type,Content-Transfer-Encoding,X-From,X-To,X-cc,X-bcc,X-Folder,X-Origin,X-FileName,sender_id
517396,This is a trade with OIL-SPEC-HEDGE-NG (John L...,<26807948.1075842029936.JavaMail.evans@thyme>,"Wed, 28 Nov 2001 13:30:11 -0800 (PST)",john.zufferli@enron.com,kori.loibl@enron.com,Trade with John Lavorato,1.0,text/plain; charset=us-ascii,7bit,"Zufferli, John </O=ENRON/OU=NA/CN=RECIPIENTS/C...","Loibl, Kori </O=ENRON/OU=NA/CN=RECIPIENTS/CN=K...",,,"\ExMerge - Zufferli, John\Sent Items",ZUFFERLI-J,john zufferli 6-26-02.PST,zufferli-j
517397,Some of my position is with the Alberta Term b...,<25835861.1075842029959.JavaMail.evans@thyme>,"Wed, 28 Nov 2001 12:47:48 -0800 (PST)",john.zufferli@enron.com,john.lavorato@enron.com,Gas Hedges,1.0,text/plain; charset=us-ascii,7bit,"Zufferli, John </O=ENRON/OU=NA/CN=RECIPIENTS/C...","Lavorato, John </O=ENRON/OU=NA/CN=RECIPIENTS/C...",,,"\ExMerge - Zufferli, John\Sent Items",ZUFFERLI-J,john zufferli 6-26-02.PST,zufferli-j
517398,2\n\n -----Original Message-----\nFrom: \tDouc...,<28979867.1075842029988.JavaMail.evans@thyme>,"Wed, 28 Nov 2001 07:20:00 -0800 (PST)",john.zufferli@enron.com,dawn.doucet@enron.com,RE: CONFIDENTIAL,1.0,text/plain; charset=us-ascii,7bit,"Zufferli, John </O=ENRON/OU=NA/CN=RECIPIENTS/C...","Doucet, Dawn </O=ENRON/OU=NA/CN=RECIPIENTS/CN=...",,,"\ExMerge - Zufferli, John\Sent Items",ZUFFERLI-J,john zufferli 6-26-02.PST,zufferli-j
517399,Analyst\t\t\t\t\tRank\n\nStephane Brodeur\t\t\...,<22052556.1075842030013.JavaMail.evans@thyme>,"Tue, 27 Nov 2001 11:52:45 -0800 (PST)",john.zufferli@enron.com,jeanie.slone@enron.com,Calgary Analyst/Associate,1.0,text/plain; charset=us-ascii,7bit,"Zufferli, John </O=ENRON/OU=NA/CN=RECIPIENTS/C...","Slone, Jeanie </O=ENRON/OU=NA/CN=RECIPIENTS/CN...",,,"\ExMerge - Zufferli, John\Sent Items",ZUFFERLI-J,john zufferli 6-26-02.PST,zufferli-j
517400,i think the YMCA has a class that is for peopl...,<28618979.1075842030037.JavaMail.evans@thyme>,"Mon, 26 Nov 2001 10:48:43 -0800 (PST)",john.zufferli@enron.com,livia_zufferli@monitor.com,RE: ali's essays,1.0,text/plain; charset=us-ascii,7bit,"Zufferli, John </O=ENRON/OU=NA/CN=RECIPIENTS/C...",'Livia_Zufferli@Monitor.com@ENRON',,,"\ExMerge - Zufferli, John\Sent Items",ZUFFERLI-J,john zufferli 6-26-02.PST,zufferli-j


In [151]:
# number of unique values in each column
[(col, enron_df[col].nunique()) for col in enron_df.columns]

[('content', 249025),
 ('Message-ID', 517401),
 ('Date', 224128),
 ('From', 20328),
 ('To', 58563),
 ('Subject', 159290),
 ('Mime-Version', 1),
 ('Content-Type', 2),
 ('Content-Transfer-Encoding', 3),
 ('X-From', 27980),
 ('X-To', 73552),
 ('X-cc', 33701),
 ('X-bcc', 132),
 ('X-Folder', 5335),
 ('X-Origin', 259),
 ('X-FileName', 429),
 ('sender_id', 150)]

'Mime-Version','Content-Type','Content-Transfer-Encoding' have so few kinds of values. I will check their values.

In [426]:
[(col, set(enron_df[col])) for col in enron_df[['Mime-Version','Content-Type','Content-Transfer-Encoding']]]

[('Mime-Version', {'1.0', None}),
 ('Content-Type',
  {None,
   'text/plain; charset=ANSI_X3.4-1968',
   'text/plain; charset=us-ascii'}),
 ('Content-Transfer-Encoding', {'7bit', None, 'base64', 'quoted-printable'})]

These colums do not seem to be useful. I wnat to remove the columns, but I will keep them for now.

In [396]:
#print(set(enron_df['sender_id'])) #150 of them
set(enron_df['sender_id']) #show alpabetized list

{'allen-p',
 'arnold-j',
 'arora-h',
 'badeer-r',
 'bailey-s',
 'bass-e',
 'baughman-d',
 'beck-s',
 'benson-r',
 'blair-l',
 'brawner-s',
 'buy-r',
 'campbell-l',
 'carson-m',
 'cash-m',
 'causholli-m',
 'corman-s',
 'crandell-s',
 'cuilla-m',
 'dasovich-j',
 'davis-d',
 'dean-c',
 'delainey-d',
 'derrick-j',
 'dickson-s',
 'donoho-l',
 'donohoe-t',
 'dorland-c',
 'ermis-f',
 'farmer-d',
 'fischer-m',
 'forney-j',
 'fossum-d',
 'gang-l',
 'gay-r',
 'geaccone-t',
 'germany-c',
 'gilbertsmith-d',
 'giron-d',
 'griffith-j',
 'grigsby-m',
 'guzman-m',
 'haedicke-m',
 'hain-m',
 'harris-s',
 'hayslett-r',
 'heard-m',
 'hendrickson-s',
 'hernandez-j',
 'hodge-j',
 'holst-k',
 'horton-s',
 'hyatt-k',
 'hyvl-d',
 'jones-t',
 'kaminski-v',
 'kean-s',
 'keavey-p',
 'keiser-k',
 'king-j',
 'kitchen-l',
 'kuykendall-t',
 'lavorato-j',
 'lay-k',
 'lenhart-m',
 'lewis-a',
 'linder-e',
 'lokay-m',
 'lokey-t',
 'love-p',
 'lucci-p',
 'maggi-m',
 'mann-k',
 'martin-t',
 'may-l',
 'mccarty-d',
 'mcconn

In [242]:
#print(set(enron_df['X-Origin'])) #259 of them

In [308]:
#print(set([str(item).lower() for item in enron_df['X-Origin']]))
set([str(item).lower() for item in enron_df['X-Origin']])

{'allen-p',
 'arnold-j',
 'arora-h',
 'badeer-r',
 'bailey-s',
 'bass-e',
 'baughman-d',
 'baughman-e',
 'beck-s',
 'benson-r',
 'blair-l',
 'brawner-s',
 'buy-r',
 'campbell-l',
 'carson-m',
 'cash-m',
 'causholli-m',
 'corman-s',
 'crandell-s',
 'cuilla-m',
 'dasovich-j',
 'davis-d',
 'dean-c',
 'delainey-d',
 'derrick-j',
 'dickson-s',
 'donoho-l',
 'donohoe-t',
 'dorland-c',
 'ermis-f',
 'farmer-d',
 'fischer-m',
 'forney-j',
 'fossum-d',
 'gang-l',
 'gay-r',
 'geaccone-t',
 'germany-c',
 'gilbertsmith-d',
 'giron-d',
 'griffith-j',
 'grigsby-m',
 'guzman-m',
 'haedicke-m',
 'hain-m',
 'harris-s',
 'hayslett-r',
 'heard-m',
 'hendrickson-s',
 'hernandez-j',
 'hodge-j',
 'holst-k',
 'horton-s',
 'hyatt-k',
 'hyvl-d',
 'jones-t',
 'kaminski-v',
 'kean-s',
 'keavey-p',
 'keiser-k',
 'king-j',
 'kitchen-l',
 'kuykendall-t',
 'lavorado-j',
 'lavorato-j',
 'lay-k',
 'lenhart-m',
 'lewis-a',
 'linder-e',
 'lokay-m',
 'lokey-t',
 'love-p',
 'lucci-p',
 'luchi-p',
 'maggi-m',
 'mann-k',
 'm

In [171]:
len(set([str(item).lower() for item in enron_df['X-Origin']]))

159

'sender_id' still has the least number of unique values (150), so it looks like the most promising column to use for employee identifications. 'X-Origin' has uppercase and lowercase ids interchangebly and seems to have even typos (e.g., 'zufferlie-j' for 'zufferli-j')

In [272]:
sender_id_set = set(enron_df['sender_id'])

In [311]:
X_Origin_set = set([str(item).lower() for item in enron_df['X-Origin']])

In [312]:
for person in X_Origin_set:
    if person not in sender_id_set:
        print(person)

williams-b
lavorado-j
none
baughman-e
weldon-v
mims-p
zufferlie-j
luchi-p
wheldon-c


All of these 9 names only in X-Origin column (not in sender_id) have very similar ids which are in both X-Origin and sender_id. Thus, these seem to represent same people as the similar ids in both columns.

Now I will improve the dataframe poi_names. Here 3 things to do:

- Make a Full_Name column
- Remove ',' from Last_Name
- Make id column having the same format as sender_id in enron_df

In [250]:
poi_names.head()

Unnamed: 0,YesNo,Last_Name,First_Name
0,(y),"Lay,",Kenneth
1,(y),"Skilling,",Jeffrey
2,(n),"Howard,",Kevin
3,(n),"Krautz,",Michael
4,(n),"Yeager,",Scott


In [253]:
poi_names['Full_Name'] = poi_names[['Last_Name', 'First_Name']].apply(lambda x: ' '.join(x), axis=1)
poi_names.head()

Unnamed: 0,YesNo,Last_Name,First_Name,Full_Name
0,(y),"Lay,",Kenneth,"Lay, Kenneth"
1,(y),"Skilling,",Jeffrey,"Skilling, Jeffrey"
2,(n),"Howard,",Kevin,"Howard, Kevin"
3,(n),"Krautz,",Michael,"Krautz, Michael"
4,(n),"Yeager,",Scott,"Yeager, Scott"


In [264]:
# Make a function that removes the last character from a string
def remove_last(s):
    return s[:-1]

remove_last('abc')

'ab'

In [265]:
poi_names['Last_Name'] = poi_names['Last_Name'].map(remove_last)
poi_names.head()

Unnamed: 0,YesNo,Last_Name,First_Name,Full_Name
0,(y),Lay,Kenneth,"Lay, Kenneth"
1,(y),Skilling,Jeffrey,"Skilling, Jeffrey"
2,(n),Howard,Kevin,"Howard, Kevin"
3,(n),Krautz,Michael,"Krautz, Michael"
4,(n),Yeager,Scott,"Yeager, Scott"


In [266]:
ids=[]
for i in range(len(poi_names)):
    ids.append(poi_names['Last_Name'][i].lower()+'-'+poi_names['First_Name'][i][0].lower())
print(ids)

['lay-k', 'skilling-j', 'howard-k', 'krautz-m', 'yeager-s', 'hirko-j', 'shelby-r', 'bermingham-d', 'darby-g', 'mulgrew-g', 'bayley-d', 'brown-j', 'furst-r', 'fuhs-w', 'causey-r', 'calger-c', 'despain-t', 'hannon-k', 'koenig-m', 'forney-j', 'rice-k', 'rieker-p', 'fastow-l', 'fastow-a', 'delainey-d', 'glisan-b', 'richter-j', 'lawyer-l', 'belden-t', 'kopper-m', 'duncan-d', 'bowen-r', 'colwell-w', 'boyle-d', 'loehr-c']


In [268]:
poi_names['id'] = ids
poi_names

Unnamed: 0,YesNo,Last_Name,First_Name,Full_Name,id
0,(y),Lay,Kenneth,"Lay, Kenneth",lay-k
1,(y),Skilling,Jeffrey,"Skilling, Jeffrey",skilling-j
2,(n),Howard,Kevin,"Howard, Kevin",howard-k
3,(n),Krautz,Michael,"Krautz, Michael",krautz-m
4,(n),Yeager,Scott,"Yeager, Scott",yeager-s
5,(n),Hirko,Joseph,"Hirko, Joseph",hirko-j
6,(n),Shelby,Rex,"Shelby, Rex",shelby-r
7,(n),Bermingham,David,"Bermingham, David",bermingham-d
8,(n),Darby,Giles,"Darby, Giles",darby-g
9,(n),Mulgrew,Gary,"Mulgrew, Gary",mulgrew-g


I am done with improving poi_names. It is time to check whether id in poi_names are in the set made from sender_id.

In [271]:
print(sender_id_set)

{'germany-c', 'richey-c', 'watson-k', 'horton-s', 'fossum-d', 'donohoe-t', 'townsend-j', 'holst-k', 'grigsby-m', 'mcconnell-m', 'blair-l', 'delainey-d', 'ermis-f', 'bass-e', 'cuilla-m', 'motley-m', 'rapp-b', 'whalley-l', 'dickson-s', 'sturm-f', 'griffith-j', 'rodrique-r', 'brawner-s', 'ruscitti-k', 'linder-e', 'baughman-d', 'gilbertsmith-d', 'ring-r', 'guzman-m', 'sanchez-m', 'tholt-j', 'benson-r', 'dean-c', 'quenet-j', 'skilling-j', 'ring-a', 'haedicke-m', 'davis-d', 'swerzbin-m', 'bailey-s', 'badeer-r', 'whitt-m', 'campbell-l', 'zipper-a', 'jones-t', 'semperger-c', 'smith-m', 'neal-s', 'derrick-j', 'heard-m', 'hain-m', 'kaminski-v', 'mann-k', 'panus-s', 'ybarbo-p', 'stokley-c', 'beck-s', 'hendrickson-s', 'sager-e', 'meyers-a', 'mckay-b', 'stepenovitch-j', 'storey-g', 'kuykendall-t', 'hyvl-d', 'taylor-m', 'lewis-a', 'mims-thurston-p', 'ward-k', 'maggi-m', 'weldon-c', 'king-j', 'lay-k', 'thomas-p', 'nemec-g', 'cash-m', 'presto-k', 'stclair-c', 'williams-j', 'symes-k', 'lokay-m', 'kitch

In [277]:
poi_with_emails=[]
for poi_id in poi_names['id']:
    if poi_id in sender_id_set:
        #print(poi_id, "is POI in the email data")
        poi_with_emails.append(poi_id)
print("POIs in the email dataset:\n", poi_with_emails)

POIs in the email dataset:
 ['lay-k', 'skilling-j', 'forney-j', 'delainey-d']


In [407]:
# Check how many emails are in enron_df for each of the 4 POIs
#count=0
#for poi in poi_with_emails:
#    print(poi, sum(enron_df['sender_id']==poi))
#    count += sum(enron_df['sender_id']==poi)
#print()    
#print(count, "Total emails (%.02f%% of all emails)" %(count/len(enron_df)*100))

Only 4 people are in the email data. Now I can guess what 'YesNo' column is for in poi_names. I can interprete (y) as POIs who are in the senders and (n) as POIs who are not. This is somewhat disappointing. I expected the email dataset has emails from most of the POIs.

It is possible that not all of the sender ids were not made this way or sender ids do not represent real senders. I will investigate other columns to see if there is any column that identifies senders better than sender_id.  

- X-From
- From

Although these columns look complicated with mixed information, I will look into them more closely now. 

In [393]:
list(set(enron_df['X-From']))[:10]

['"Happ, Susan" <SHapp@caiso.com>',
 'Sue Haynes',
 'American Express <travel201_021+457924.86351074.2@1.americanexpress.com>',
 'Melissa Cortez <mcortez@govadv.com>',
 'rep@haysmcconn.com',
 '"Franson, Rob" <RFranson@czn.com>@ENRON',
 'Jennaro, Jason </O=ENRON/OU=NA/CN=RECIPIENTS/CN=NOTESADDR/CN=FEC32F48-E7F67158-86256AA9-71D75A>',
 'Allison Easton',
 'Herrera, Katherine </O=ENRON/OU=NA/CN=RECIPIENTS/CN=NOTESADDR/CN=C8BCC5E7-D1F7D1E5-86256866-70B621>',
 'Investinme <Investinme@ENRON>']

This column indeed contains mixed information without systemetic formats. Email addresses, full names, things I do not know what they are...... I noticed there is Fastow, Andrew, one of the POIs who were not in the sender_id column. I will check some of those rows.

In [321]:
enron_df[enron_df['X-From']=='Fastow, Andrew </O=ENRON/OU=NA/CN=RECIPIENTS/CN=AFASTOW>']

Unnamed: 0,content,Message-ID,Date,From,To,Subject,Mime-Version,Content-Type,Content-Transfer-Encoding,X-From,X-To,X-cc,X-bcc,X-Folder,X-Origin,X-FileName,sender_id
271885,\n\n -----Original Message-----\nFrom: \tStabl...,<9603832.1075852804708.JavaMail.evans@thyme>,"Wed, 24 Oct 2001 13:02:54 -0700 (PDT)",andrew.fastow@enron.com,greg.whalley@enron.com,FW: Status CEG/CEG-Rio Transaction,1.0,text/plain; charset=us-ascii,7bit,"Fastow, Andrew </O=ENRON/OU=NA/CN=RECIPIENTS/C...","Whalley, Greg </O=ENRON/OU=NA/CN=RECIPIENTS/CN...","Lay, Kenneth </O=ENRON/OU=NA/CN=RECIPIENTS/CN=...",,\KLAY (Non-Privileged)\Inbox,Lay-K,KLAY (Non-Privileged).pst,lay-k
443624,Jeff:\n\nI have a major problem with Ben Glisa...,<13977532.1075852653604.JavaMail.evans@thyme>,"Tue, 26 Jun 2001 15:37:11 -0700 (PDT)",andrew.fastow@enron.com,jeff.skilling@enron.com,FW: MD PRC Committee,1.0,text/plain; charset=us-ascii,7bit,"Fastow, Andrew </O=ENRON/OU=NA/CN=RECIPIENTS/C...","Skilling, Jeff </O=ENRON/OU=NA/CN=RECIPIENTS/C...",,,\JSKILLIN (Non-Privileged)\Deleted Items,Skilling-J,JSKILLIN (Non-Privileged).pst,skilling-j
443741,Nothing to do except things like this when you...,<30020448.1075852656682.JavaMail.evans@thyme>,"Sun, 10 Jun 2001 18:01:24 -0700 (PDT)",andrew.fastow@enron.com,jeff.skilling@enron.com,Op Ed satire,1.0,text/plain; charset=us-ascii,7bit,"Fastow, Andrew </O=ENRON/OU=NA/CN=RECIPIENTS/C...","Skilling, Jeff </O=ENRON/OU=NA/CN=RECIPIENTS/C...","Kean, Steven </O=ENRON/OU=NA/CN=RECIPIENTS/CN=...",,\JSKILLIN (Non-Privileged)\Deleted Items,Skilling-J,JSKILLIN (Non-Privileged).pst,skilling-j
499763,\n\n -----Original Message-----\nFrom: \tStabl...,<2145624.1075852346226.JavaMail.evans@thyme>,"Wed, 24 Oct 2001 13:02:54 -0700 (PDT)",andrew.fastow@enron.com,greg.whalley@enron.com,FW: Status CEG/CEG-Rio Transaction,1.0,text/plain; charset=us-ascii,7bit,"Fastow, Andrew </O=ENRON/OU=NA/CN=RECIPIENTS/C...","Whalley, Greg </O=ENRON/OU=NA/CN=RECIPIENTS/CN...","Lay, Kenneth </O=ENRON/OU=NA/CN=RECIPIENTS/CN=...",,\GWHALLE (Non-Privileged)\Inbox,WHALLEY-G,GWHALLE (Non-Privileged).pst,whalley-g


This shows that the sender of these emails is Andrew Fastow; 'From' column contains his email address. On the other hand, the sender ids are actually receivers or cced people. I realized the column 'file' in the original dataset is not showing actual senders. Now I will investigate the 'From' column containing sender emails to see if it is a reliable or useful column for sender identifications. 

In [402]:
list(set(enron_df['From']))[:10]

['loughrid@hydrant.ruf.rice.edu',
 's..olinger@enron.com',
 'ravik@attglobal.net',
 'pgarner@ufl.edu',
 'jbass@stdauto.com',
 's29040@sunpoint.net',
 'accountinfo@datek.m0.net',
 'barry.vanderhorst@enron.com',
 'eulan121@yahoo.com',
 'kirit.purbhoo@enron.com']

This column looks more clean since they are all email addresses. 

In [403]:
#enron_df[enron_df['X-From']=='Rex Shelby']

In [404]:
#enron_df[enron_df['X-From']=='Christopher F Calger <Christopher F Calger/PDX/ECT@ECT>']

In [405]:
#enron_df[enron_df['From']=='christopher.calger@enron.com']

In [362]:
#for msg in emails['message']:
#    if 'Fastow, Andrew' in msg:
#        print(msg)

Fortunately I found there are much more than 4 POI names in the emails, but the names are here and there in different columns of enron_df. Thus, I will go back to use th original dataframe named 'emails' to find more POIs. 

In [406]:
POI_email_count =dict(zip(poi_names['Last_Name'],np.zeros(len(poi_names),dtype=int))) #First name can be abbreviated
POI_email_index = {key: [] for key in poi_names['Last_Name']}

for name in poi_names['Last_Name']:
    name_part = name+','
    email_part = '.'+name.lower()+'@'
    for i, msg in enumerate(emails['message']):
        if (name_part in msg) or (email_part in msg):
            POI_email_count[name] +=1
            POI_email_index[name].append(i)
            
POI_email_count

{'Lay': 8537,
 'Skilling': 5883,
 'Howard': 1777,
 'Krautz': 57,
 'Yeager': 126,
 'Hirko': 166,
 'Shelby': 419,
 'Bermingham': 25,
 'Darby': 30,
 'Mulgrew': 0,
 'Bayley': 141,
 'Brown': 12041,
 'Furst': 23,
 'Fuhs': 0,
 'Causey': 3024,
 'Calger': 5557,
 'DeSpain': 398,
 'Hannon': 1539,
 'Koenig': 3278,
 'Forney': 3168,
 'Rice': 3326,
 'Rieker': 1677,
 'Fastow': 3814,
 'Delainey': 7782,
 'Glisan': 1302,
 'Richter': 4754,
 'Lawyer': 695,
 'Belden': 10803,
 'Kopper': 334,
 'Duncan': 502,
 'Bowen': 5976,
 'Colwell': 2598,
 'Boyle': 570,
 'Loehr': 85}

This can include someone else with the same last name, so I will use full names. However, Mulgrew and Fuhs have zeros even here, so they are not likely to be in the email dataset.

In [363]:
#print(POI_email_index['Darby'])

In [383]:
# Use full names and possible email addresses
POI_email_count3 =dict(zip(poi_names['Full_Name'],np.zeros(len(poi_names),dtype=int))) #First name can be abbreviated

for i in range(len(poi_names)):
    email_addr= poi_names['First_Name'][i].lower()+'.'+ poi_names['Last_Name'][i].lower()+'@enron.com'
    name1 = poi_names['Full_Name'][i]
    name2 = poi_names['First_Name'][i]+' '+ poi_names['Last_Name'][i]
    for msg in emails['message']:
        if name1 in msg:
            POI_email_count3[name1] +=1
        elif name2 in msg:
            POI_email_count3[name1] +=1
        elif email_addr in msg:
            POI_email_count3[name1] +=1
            
POI_email_count3

{'Lay, Kenneth': 8598,
 'Skilling, Jeffrey': 686,
 'Howard, Kevin': 982,
 'Krautz, Michael': 76,
 'Yeager, Scott': 245,
 'Hirko, Joseph': 4,
 'Shelby, Rex': 567,
 'Bermingham, David': 9,
 'Darby, Giles': 17,
 'Mulgrew, Gary': 0,
 'Bayley, Daniel': 0,
 'Brown, James': 82,
 'Furst, Robert': 4,
 'Fuhs, William': 0,
 'Causey, Richard': 3231,
 'Calger, Christopher': 5251,
 'DeSpain, Timothy': 0,
 'Hannon, Kevin': 2328,
 'Koenig, Mark': 3816,
 'Forney, John': 3221,
 'Rice, Kenneth': 117,
 'Rieker, Paula': 2048,
 'Fastow, Lea': 52,
 'Fastow, Andrew': 1844,
 'Delainey, David': 7373,
 'Glisan, Ben': 1525,
 'Richter, Jeffrey': 38,
 'Lawyer, Larry': 953,
 'Belden, Timothy': 28,
 'Kopper, Michael': 475,
 'Duncan, David': 57,
 'Bowen, Raymond': 2675,
 'Colwell, Wesley': 3,
 'Boyle, Dan': 212,
 'Loehr, Christopher': 0}

In [384]:
POIs_not_in_emails =[name for (name,count) in POI_email_count3.items() if count==0 ]
POIs_not_in_emails

['Mulgrew, Gary',
 'Bayley, Daniel',
 'Fuhs, William',
 'DeSpain, Timothy',
 'Loehr, Christopher']

These are people whose full names or emails are not found in the email data. I already found that 'Mulgrew, Gary' and 'Fuhs, William' are not in the emails and all other 3 POIs' last names are in the emails. I will check if their abbreviated first names are used.

In [385]:
for name in POIs_not_in_emails:
    for msg in emails['message']:
        if name.split(',')[0] in msg:
            print("Message including", name, "?:\n\n", msg, '\n----------------------------------')
            break

Message including Bayley, Daniel ?:

 Message-ID: <22318288.1075848343607.JavaMail.evans@thyme>
Date: Fri, 12 Jan 2001 11:31:00 -0800 (PST)
From: office.chairman@enron.com
To: all.worldwide@enron.com
Subject: Managing Director and Vice President Elections
Mime-Version: 1.0
Content-Type: text/plain; charset=ANSI_X3.4-1968
Content-Transfer-Encoding: quoted-printable
X-From: Office of the Chairman
X-To: All Enron Worldwide
X-cc: 
X-bcc: 
X-Folder: \Harpreet_Arora_Nov2001\Notes Folders\All documents
X-Origin: ARORA-H
X-FileName: harora.nsf

The Managing Director PRC Committee met this week to elect individuals to=
=20
Managing Director and Vice President positions.  These employees are=20
recognized as outstanding contributors to the organization, whose individua=
l=20
efforts have been instrumental in the continued success and growth of the=
=20
company.  We are pleased to announce the election of the following new=20
Managing Directors and Vice Presidents.  Please join us in congratulati

- Bayley, Daniel --> I checked other messages with Bayley (not just one), but they all have very different first names.
- DeSpain, Timothy --> Tim DeSpain (Yes)
- Loehr, Christopher --> Chris Loehr (Yes)

Finally, I found 3 POIs who are not in the emails and they are 

- Bayley, Daniel
- Mulgrew, Gary
- Fuhs, William 

Of course, it is possible that some names or emails I found actually belong to other people with the same names, not the POIs I was looking for.

In [392]:
#name = 'bayley'
#for msg in emails['message']:
#    if name in msg.lower():
#        print("Message including", name, "?:\n\n", msg, '\n----------------------------------')
        

The names and emails of POIs are all in the different parts of email messages. What should I do with these? What about non-POIs? How can I define them? If I am going to use all parts for emails, shoudl I include all the original and forwarded messages in the contents as well?

First, I will try to get all the emails addresses for POIs to see if email addresses are reliable data to use.

In [427]:
POI_email_address = {key: None for key in poi_names['Full_Name']}

for i, name in enumerate(poi_names['Full_Name']):
    email_addr= poi_names['First_Name'][i].lower()+'.'+ poi_names['Last_Name'][i].lower()+'@enron.com'
    for msg in emails['message']:
        if email_addr in msg:
            POI_email_address[name] = email_addr
            break

POI_email_address

{'Lay, Kenneth': 'kenneth.lay@enron.com',
 'Skilling, Jeffrey': 'jeffrey.skilling@enron.com',
 'Howard, Kevin': 'kevin.howard@enron.com',
 'Krautz, Michael': 'michael.krautz@enron.com',
 'Yeager, Scott': 'scott.yeager@enron.com',
 'Hirko, Joseph': None,
 'Shelby, Rex': 'rex.shelby@enron.com',
 'Bermingham, David': None,
 'Darby, Giles': None,
 'Mulgrew, Gary': None,
 'Bayley, Daniel': None,
 'Brown, James': 'james.brown@enron.com',
 'Furst, Robert': None,
 'Fuhs, William': None,
 'Causey, Richard': 'richard.causey@enron.com',
 'Calger, Christopher': 'christopher.calger@enron.com',
 'DeSpain, Timothy': None,
 'Hannon, Kevin': 'kevin.hannon@enron.com',
 'Koenig, Mark': 'mark.koenig@enron.com',
 'Forney, John': 'john.forney@enron.com',
 'Rice, Kenneth': 'kenneth.rice@enron.com',
 'Rieker, Paula': 'paula.rieker@enron.com',
 'Fastow, Lea': None,
 'Fastow, Andrew': 'andrew.fastow@enron.com',
 'Delainey, David': 'david.delainey@enron.com',
 'Glisan, Ben': 'ben.glisan@enron.com',
 'Richter, Je

In [428]:
[name for (name, addr) in POI_email_address.items() if addr== None]

['Hirko, Joseph',
 'Bermingham, David',
 'Darby, Giles',
 'Mulgrew, Gary',
 'Bayley, Daniel',
 'Furst, Robert',
 'Fuhs, William',
 'DeSpain, Timothy',
 'Fastow, Lea',
 'Richter, Jeffrey',
 'Belden, Timothy',
 'Duncan, David',
 'Colwell, Wesley',
 'Loehr, Christopher']

These are people missing their email addresses, but I will try alternative email addresses possibly used.

In [431]:
alternate_addr= ['joe.hirko@', 'dave.bermingham@', 'dan.bayley@', 'rob.Furst@','bob.Furst@', 'will.fuhs@',
                 'tim.despain@', 'jeff.richter@', 'tim.belden@', 'dave.duncan@', 'chris.loehr@']
for addr in alternate_addr:
    addr += 'enron.com'
    for msg in emails['message']:
        if addr in msg:
            print(addr)
            break

joe.hirko@enron.com
tim.despain@enron.com
jeff.richter@enron.com
tim.belden@enron.com
dave.duncan@enron.com
chris.loehr@enron.com


I got six more addresses! I tried these addreess without enron.com (same results) and some other possible addresses like 'bob.furst@' instead of 'rob.furst@' (no luck).

In [432]:
POI_email_address['Hirko, Joseph']='joe.hirko@enron.com'
POI_email_address['DeSpain, Timothy']='tim.despain@enron.com'
POI_email_address['Richter, Jeffrey']='jeff.richter@enron.com'
POI_email_address['Belden, Timothy']='tim.belden@enron.com'
POI_email_address['Duncan, David']='dave.duncan@enron.com'
POI_email_address['Loehr, Christopher']='chris.loehr@enron.com'

In [434]:
# POIs without email addresses found
[name for (name, addr) in POI_email_address.items() if addr== None]

['Bermingham, David',
 'Darby, Giles',
 'Mulgrew, Gary',
 'Bayley, Daniel',
 'Furst, Robert',
 'Fuhs, William',
 'Fastow, Lea',
 'Colwell, Wesley']

Eight people are still missing their Enron email addresses, but actually only 3 are missing since around 5 people of these are not Enron employees.

I will now add columns to the dataframe poi_names.
- Email for email addresses

I might want to add these columns as well later:- Full_Name_2 for full names with first names in front of last names - Full_Name_3 for alternative full names with last names in front - Full_Name_4 for alternative full names with first names in front

In [436]:
poi_names.drop('id_2', axis=1, inplace=True) # I forgot to remove this earlier

In [437]:
addr_list=[POI_email_address[name] for name in poi_names["Full_Name"]] 
poi_names['Email'] = addr_list

poi_names

Unnamed: 0,YesNo,Last_Name,First_Name,Full_Name,id,Email
0,(y),Lay,Kenneth,"Lay, Kenneth",lay-k,kenneth.lay@enron.com
1,(y),Skilling,Jeffrey,"Skilling, Jeffrey",skilling-j,jeffrey.skilling@enron.com
2,(n),Howard,Kevin,"Howard, Kevin",howard-k,kevin.howard@enron.com
3,(n),Krautz,Michael,"Krautz, Michael",krautz-m,michael.krautz@enron.com
4,(n),Yeager,Scott,"Yeager, Scott",yeager-s,scott.yeager@enron.com
5,(n),Hirko,Joseph,"Hirko, Joseph",hirko-j,joe.hirko@enron.com
6,(n),Shelby,Rex,"Shelby, Rex",shelby-r,rex.shelby@enron.com
7,(n),Bermingham,David,"Bermingham, David",bermingham-d,
8,(n),Darby,Giles,"Darby, Giles",darby-g,
9,(n),Mulgrew,Gary,"Mulgrew, Gary",mulgrew-g,


In [None]:
POI_email_count3 =dict(zip(poi_names['Full_Name'],np.zeros(len(poi_names),dtype=int))) #First name can be abbreviated

for i in range(len(poi_names)):
    email_addr= poi_names['First_Name'][i].lower()+'.'+ poi_names['Last_Name'][i].lower()+'@enron.com'
    name1 = poi_names['Full_Name'][i]
    name2 = poi_names['First_Name'][i]+' '+ poi_names['Last_Name'][i]
    for msg in emails['message']:
        if name1 in msg:
            POI_email_count3[name1] +=1
        elif name2 in msg:
            POI_email_count3[name1] +=1
        elif email_addr in msg:
            POI_email_count3[name1] +=1
            
POI_email_count3

Let's find out how many time their addresses appear in 'To' and 'From' columns in enron_df.

In [453]:
def count_messages(what_to_count, col_to_count):
    # what_to_count is a list to be counted
    # col_to_count is a column name to look into in enron_df
    count_list =[]
    for addr in what_to_count:
        count=0
        if addr: # not None
            for addresses in enron_df[col_to_count]:
                if addr in str(addresses): 
                    count +=1
        count_list.append(count)
    return count_list

In [456]:
poi_names['From_count']=count_messages(poi_names['Email'], 'From')
poi_names['To_count'] = count_messages(poi_names['Email'], 'To')
poi_names

Unnamed: 0,YesNo,Last_Name,First_Name,Full_Name,id,Email,From_count,To_count
0,(y),Lay,Kenneth,"Lay, Kenneth",lay-k,kenneth.lay@enron.com,36,4261
1,(y),Skilling,Jeffrey,"Skilling, Jeffrey",skilling-j,jeffrey.skilling@enron.com,0,7
2,(n),Howard,Kevin,"Howard, Kevin",howard-k,kevin.howard@enron.com,0,110
3,(n),Krautz,Michael,"Krautz, Michael",krautz-m,michael.krautz@enron.com,1,44
4,(n),Yeager,Scott,"Yeager, Scott",yeager-s,scott.yeager@enron.com,0,63
5,(n),Hirko,Joseph,"Hirko, Joseph",hirko-j,joe.hirko@enron.com,0,105
6,(n),Shelby,Rex,"Shelby, Rex",shelby-r,rex.shelby@enron.com,39,225
7,(n),Bermingham,David,"Bermingham, David",bermingham-d,,0,0
8,(n),Darby,Giles,"Darby, Giles",darby-g,,0,0
9,(n),Mulgrew,Gary,"Mulgrew, Gary",mulgrew-g,,0,0


Who are the recipients of Lay Kenneth's emails?

In [465]:
email_list =[]
count =0
for i, addr in enumerate(enron_df['From']):
    if addr == 'kenneth.lay@enron.com':
        count +=1
        email_list.append(enron_df['To'][i].split(',')[:10])
print(count)
email_list

36


[['erica.adams@enron.com',
  ' john.addison@enron.com',
  ' matthew.almy@enron.com',
  ' \n\thector.alviar@enron.com',
  ' chuck.ames@enron.com',
  ' \n\tmatt.anderson@enron.com',
  ' james.bakondy@enron.com',
  ' \n\thicham.benjelloun@enron.com',
  ' shelia.benke@enron.com',
  ' \n\tchristina.benkert@enron.com'],
 ['erica.adams@enron.com',
  ' john.addison@enron.com',
  ' matthew.almy@enron.com',
  ' \n\thector.alviar@enron.com',
  ' chuck.ames@enron.com',
  ' \n\tmatt.anderson@enron.com',
  ' james.bakondy@enron.com',
  ' \n\thicham.benjelloun@enron.com',
  ' shelia.benke@enron.com',
  ' \n\tchristina.benkert@enron.com'],
 ['erica.adams@enron.com',
  ' john.addison@enron.com',
  ' matthew.almy@enron.com',
  ' \n\thector.alviar@enron.com',
  ' chuck.ames@enron.com',
  ' \n\tmatt.anderson@enron.com',
  ' james.bakondy@enron.com',
  ' \n\thicham.benjelloun@enron.com',
  ' shelia.benke@enron.com',
  ' \n\tchristina.benkert@enron.com'],
 ['k..allen@enron.com',
  ' sally.beck@enron.com',
 