* * * 

<div align="Right">
  ©    Josefin Axberg 2017<br>
 </div>

# Final Project: Discovering the Enron Fraud

## Introduction

In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. In the resulting Federal investigation, there was a significant amount of typically confidential information entered into public record, including tens of thousands of emails and detailed financial data for top executives. In this project, you will play detective, and put your new skills to use by building a person of interest identifier based on financial and email data made public as a result of the Enron scandal. To assist you in your detective work, we've combined this data with a hand-generated list of persons of interest in the fraud case, which means individuals who were indicted, reached a settlement, or plea deal with the government, or testified in exchange for prosecution immunity.

![](enron.jpg)

In [104]:
import time
print("Today is %s" % time.strftime("%Y-%m-%d"))

Today is 2017-09-14


In [105]:
#!/usr/bin/python

import os
import pickle
import re
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

sys.path.append("../tools/")

from feature_format import featureFormat, targetFeatureSplit
from tester import dump_classifier_and_data

# Explore Data

In [106]:
### Task 1: Select what features you'll use.
### features_list is a list of strings, each of which is a feature name.
### The first feature must be "poi".
features_list = ['poi','salary', 'bonus', 'total_stock_value', 'from_poi_to_this_person', 'from_this_person_to_poi'] 

In [107]:
# Load the dictionary containing the dataset
with open("final_project_dataset.pkl", "r") as data_file:
    data_dict = pickle.load(data_file)

my_dataset = data_dict
enron_df = pd.DataFrame(my_dataset) # Load pickle data to DataFrame for feature enginering etc
enron_df = enron_df.T               # Setting names as indexes and features as columns


enron_df.head()

Unnamed: 0,bonus,deferral_payments,deferred_income,director_fees,email_address,exercised_stock_options,expenses,from_messages,from_poi_to_this_person,from_this_person_to_poi,...,long_term_incentive,other,poi,restricted_stock,restricted_stock_deferred,salary,shared_receipt_with_poi,to_messages,total_payments,total_stock_value
ALLEN PHILLIP K,4175000.0,2869717.0,-3081055.0,,phillip.allen@enron.com,1729541.0,13868,2195.0,47.0,65.0,...,304805.0,152.0,False,126027.0,-126027.0,201955.0,1407.0,2902.0,4484442,1729541
BADUM JAMES P,,178980.0,,,,257817.0,3486,,,,...,,,False,,,,,,182466,257817
BANNANTINE JAMES M,,,-5104.0,,james.bannantine@enron.com,4046157.0,56301,29.0,39.0,0.0,...,,864523.0,False,1757552.0,-560222.0,477.0,465.0,566.0,916197,5243487
BAXTER JOHN C,1200000.0,1295738.0,-1386055.0,,,6680544.0,11200,,,,...,1586055.0,2660303.0,False,3942714.0,,267102.0,,,5634343,10623258
BAY FRANKLIN R,400000.0,260455.0,-201641.0,,frank.bay@enron.com,,129142,,,,...,,69.0,False,145796.0,-82782.0,239671.0,,,827696,63014


In [108]:
enron_df.describe()

Unnamed: 0,bonus,deferral_payments,deferred_income,director_fees,email_address,exercised_stock_options,expenses,from_messages,from_poi_to_this_person,from_this_person_to_poi,...,long_term_incentive,other,poi,restricted_stock,restricted_stock_deferred,salary,shared_receipt_with_poi,to_messages,total_payments,total_stock_value
count,146.0,146.0,146.0,146.0,146.0,146.0,146.0,146.0,146.0,146.0,...,146.0,146.0,146,146.0,146.0,146.0,146.0,146.0,146.0,146.0
unique,42.0,40.0,45.0,18.0,112.0,102.0,95.0,65.0,58.0,42.0,...,53.0,93.0,2,98.0,19.0,95.0,84.0,87.0,126.0,125.0
top,,,,,,,,,,,...,,,False,,,,,,,
freq,64.0,107.0,97.0,129.0,35.0,44.0,51.0,60.0,60.0,60.0,...,80.0,53.0,128,36.0,128.0,51.0,60.0,60.0,21.0,20.0


In [109]:
enron_df.shape

(146, 21)