# Part 3: Final Conclusion

## 5 W's
<br>
**Who:** Cedric Cyrus Harrison
<br>
**What:** Committed Corporate Espionage by Emailing Confidential Information to
> Lockheed Martin

**When:** During the Month of August on the Following Dates:
<br>
> 08/16/2017, 08/19/2017, 08/24/2017, 08/27/2017, 08/30/2017

**Where:** DTAA Headquarters
<br>
**Why:** TBD
<br>

## Initial Assumptions
<br>
1. The malicious actor indended to relay confidential information to another **defense technology company**.
<br>
> **Reasoning**: DTAA is a defense technology company. Thus, any confidential information DTAA has would be most valuable to another defense technology company. 
<br>
2. The malicious actor sent the information through **email**.
<br>
> **Reasoning**: With the data available, the only direct way to connect the employees of DTAA with those of another company is through email.
<br>
3. The malicious actor did not send the email amongst multiple members.
<br>
> **Reasoning**: It would be too risky to send information through email amongst multiple members. The chance of being caught increases drastically if more people are involved.   

## Course of Action
###### Filter emails to those sent to other defense contracting companies' email addresses, not including group emails.

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
device_info = pd.read_csv('device_info.csv')
email_info = pd.read_csv('email_info.csv')
employee_info = pd.read_csv('employee_info.csv')
http_info = pd.read_csv('http_info.csv')
logon_info = pd.read_csv('logon_info.csv')

In [None]:
# Filter Emails Sent Within the Company
fil_dtaa = email_info['to'].apply(lambda t: t.split('@')[1] != 'dtaa.com')
df_fil_dtaa = email_info[fil_dtaa]
# Filter Group Emails
fil_group = df_fil_dtaa['to'].apply(lambda t: ';' not in t)
df_fil_group = df_fil_dtaa[fil_group]
# Filter Non-Defense Contract Company Emails
email_to_rm = ['comcast.net','aol.com','gmail.com','yahoo.com','cox.net','hotmail.com','verizon.net',
               'juno.com','netzero.com','msn.com','charter.net','earthlink.net','sbcglobal.net','bellsouth.net',
               'optonline.net','hp.com']
fil_common_email = df_fil_group['to'].apply(lambda t: t.split('@')[1] not in email_to_rm)
df_fil_common_email = df_fil_group[fil_common_email]

In [None]:
# Create DataFrames With Only Raytheon, Boeing, Harris, Northropgrumman, and Lockheed Emails
df_ray_all = df_fil_common_email[df_fil_common_email['to'].apply(lambda t: t.split('@')[1] == 'raytheon.com')]
df_boeing_all = df_fil_common_email[df_fil_common_email['to'].apply(lambda t: t.split('@')[1] == 'boeing.com')]
df_harris_all = df_fil_common_email[df_fil_common_email['to'].apply(lambda t: t.split('@')[1] == 'harris.com')]
df_north_all = df_fil_common_email[df_fil_common_email['to'].apply(lambda t: t.split('@')[1] == 'northropgrumman.com')]
df_lock_all = df_fil_common_email[df_fil_common_email['to'].apply(lambda t: t.split('@')[1] == 'lockheed.com')]

In [None]:
# Get Number of Emails Sent to Each Company
num_to_ray = len(df_ray_all)
num_to_boeing = len(df_boeing_all)
num_to_harris = len(df_harris_all)
num_to_north = len(df_north_all)
num_to_lock = len(df_lock_all)

In [None]:
x = ['Raytheon','Boeing','Harris','Northrop Grumman','Lockheed Martin']
y = [num_to_ray,num_to_boeing,num_to_harris,num_to_north,num_to_lock]
fig = plt.figure(figsize=(22,13))
fig.subplots_adjust(bottom=0.3)
plt.bar(x,y,color=['red','blue','green','orange','purple'])
plt.grid(axis='y',zorder=0.0)
plt.xticks(size=20)
plt.yticks(size=20)
plt.title('Number of Emails Sent From DTAA to Defense Contract Companies',size=25)
plt.xlabel('Defense Contract Companies',size=20,labelpad=20)
plt.ylabel('Number of Emails Sent\n *excluding group emails',size=20,labelpad=20)

## Conclude: All of these Defense Contract Companies were clients of DTAA, except for Lockheed Martin.  

## A Closer Look
###### Examine How Many Emails Sent to These Companies Based on Individual DTAA Email Addresses 

### Additional Assumption
1. The malicious actor sent an attachment at one point or another containing confidential information
<br>
> **Reasoning:** It would be impractical to copy everything into an email, especially if the confidential information contained any sort of image or complex diagram.

In [None]:
# Filter All Emails with No Attachments
fil_no_attach = df_fil_common_email['attachments'].apply(lambda t: t != 0)
df_pure = df_fil_common_email[fil_no_attach]

# Create DataFrames With Only Raytheon, Boeing, Harris, Northropgrumman, and Lockheed Emails
df_ray = df_pure[df_pure['to'].apply(lambda t: t.split('@')[1] == 'raytheon.com')]
df_boeing = df_pure[df_pure['to'].apply(lambda t: t.split('@')[1] == 'boeing.com')]
df_harris = df_pure[df_pure['to'].apply(lambda t: t.split('@')[1] == 'harris.com')]
df_north = df_pure[df_pure['to'].apply(lambda t: t.split('@')[1] == 'northropgrumman.com')]
df_lock = df_pure[df_pure['to'].apply(lambda t: t.split('@')[1] == 'lockheed.com')]

In [None]:
# Plotting Emails Sent to Boeing by DTAA User Email Address
df_boeing_cnts = pd.DataFrame(columns=['email','counts_from'])
df_boeing_cnts['email'] = list(df_boeing.groupby('from').size().index)
df_boeing_cnts['counts_from'] = list(df_boeing.groupby('from').size())
df_boeing_cnts.sort_values('counts_from',ascending=False).head(10)

x_boe = list(df_boeing_cnts.email.unique())
y_boe = list(df_boeing_cnts.counts_from)
ax = plt.subplots(1,1,figsize=(40,15))
plt.plot(x_boe,y_boe)
plt.xticks(rotation=90,size=15)
plt.yticks(list(range(1,17,1)),size=30)
plt.grid(axis='y',zorder=0.0)
plt.title('Number of Emails Sent From DTAA to Boeing',size=40)
plt.ylabel('Number of Emails Sent\n *excluding group emails\n *excluding 0 attachments',size=30,labelpad=30)
plt.xlabel('DTAA Email Addresses',size=30,labelpad=20)
plt.margins(0.005)
plt.show()

In [None]:
# Plotting Emails Sent to Harris by DTAA User Email Address
df_harris_cnts = pd.DataFrame(columns=['email','counts_from'])
df_harris_cnts['email'] = list(df_harris.groupby('from').size().index)
df_harris_cnts['counts_from'] = list(df_harris.groupby('from').size())
df_harris_cnts.sort_values('counts_from',ascending=False).head(10)

x_harris = list(df_harris_cnts.email.unique())
y_harris = list(df_harris_cnts.counts_from)
ax = plt.subplots(1,1,figsize=(40,15))
plt.plot(x_harris,y_harris)
plt.xticks(rotation=90,size=15)
plt.yticks(list(range(1,23,1)),size=30)
plt.grid(axis='y',zorder=0.0)
plt.title('Number of Emails Sent From DTAA to Harris',size=40)
plt.ylabel('Number of Emails Sent\n *excluding group emails\n *excluding 0 attachments',size=30,labelpad=30)
plt.xlabel('DTAA Email Addresses',size=30,labelpad=20)
plt.margins(0.005)
plt.show()

In [None]:
# Plotting Emails Sent to North by DTAA User Email Address
df_north_cnts = pd.DataFrame(columns=['email','counts_from'])
df_north_cnts['email'] = list(df_north.groupby('from').size().index)
df_north_cnts['counts_from'] = list(df_north.groupby('from').size())
df_north_cnts.sort_values('counts_from',ascending=False).head(10)

x_north = list(df_north_cnts.email.unique())
y_north = list(df_north_cnts.counts_from)
ax = plt.subplots(1,1,figsize=(40,15))
plt.plot(x_north,y_north)
plt.xticks(rotation=90,size=15)
plt.yticks(list(range(1,24,1)),size=30)
plt.grid(axis='y',zorder=0.0)
plt.title('Number of Emails Sent From DTAA to Northrop Grumman',size=40)
plt.ylabel('Number of Emails Sent\n *excluding group emails\n *excluding 0 attachments',size=30,labelpad=30)
plt.xlabel('DTAA Email Addresses',size=30,labelpad=20)
plt.margins(0.005)
plt.show()

In [None]:
# Plotting Emails Sent to Raytheon by DTAA User Email Address
df_ray_cnts = pd.DataFrame(columns=['email','counts_from'])
df_ray_cnts['email'] = list(df_ray.groupby('from').size().index)
df_ray_cnts['counts_from'] = list(df_ray.groupby('from').size())
df_ray_cnts.sort_values('counts_from',ascending=False).head(10)

x_ray = list(df_ray_cnts.email.unique())
y_ray = list(df_ray_cnts.counts_from)
ax = plt.subplots(1,1,figsize=(40,15))
plt.plot(x_ray,y_ray)
plt.xticks(rotation=90,size=15)
plt.yticks(list(range(1,35,1)),size=30)
plt.grid(axis='y',zorder=0.0)
plt.title('Number of Emails Sent From DTAA to Raytheon',size=40)
plt.ylabel('Number of Emails Sent\n *excluding group emails\n *excluding 0 attachments',size=30,labelpad=30)
plt.xlabel('DTAA Email Addresses',size=30,labelpad=20)
plt.margins(0.005)
plt.show()

## All Emails Sent From DTAA to Lockheed Martin

** *not including group emails **

In [None]:
df_lock_all

In [None]:
employee_info[employee_info.email == 'Cedric.Cyrus.Harrison@dtaa.com']

## DTAA Employees Who Left Early
** based on logon activity **

In [None]:
# separating date into date_only and time only
device_info['date_only'] = device_info.date.apply(lambda t: t.split(' ')[0])
device_info['time_only'] = device_info.date.apply(lambda t: t.split(' ')[1])
#get the dates the person has logged on/off of
pd_person_timeseries = pd.DataFrame(list(device_info.user.unique()),columns=['user'])
pd_person_timeseries['dates'] = pd_person_timeseries.user.apply(lambda t: device_info[device_info.user == t].date_only.unique())
#get the start date for each person
pd_person_timeseries['start_date'] = pd_person_timeseries.dates.apply(lambda t: t[0])
#get the end date for each person
pd_person_timeseries['end_date'] = pd_person_timeseries.dates.apply(lambda t: t[len(t)-1])
fil_october = pd_person_timeseries.end_date.apply(lambda t: int(t.split('/')[0]) != 10)
fil_person = pd_person_timeseries[fil_october]
fil_person