# Claim to support / refute:

#### - “The   financial   markets   do   not   punish   security   breaches.”

Data from: https://www.gracefulsecurity.com/data-breaches-and-stock-prices/  
(using table only)

## Organizing & Cleaning the Data

In [1]:
# Imports

import numpy as np
import pandas as pd
import pdfquery as pdf

In [2]:
# List of breached companies, their stock tickers, and dates of breach disclosure
Link = pdf.PDFQuery("Data Breaches and Stock Prices — GracefulSecurity.pdf")
Link.load()

### Company names:

In [92]:
Link_Data = Link.extract([
        # ('with_parent','LTPage[page_index="0"]'), # PAGE Specifier if multiple pages in PDF
        ('with_formatter', 'text'),
        ('Company', ':in_bbox("30, 447, 90, 760")')])
DataDump = Link_Data['Company'].split(' ')
DataDump[1::2]

['Sony',
 'Sony',
 'Terracom',
 'Adobe',
 'Target',
 'Ebay',
 'Home',
 'Home',
 'Google',
 'Sony',
 'Sony',
 'Premera',
 'Anthem',
 'Experian',
 'TalkTalk',
 'Juniper']

In [93]:
DataDump = DataDump[1::2].copy()

del DataDump[1] # Duplicate Sony
del DataDump[5] # Duplicate Home Depot
del DataDump[7] # Duplicate Sony

In [94]:
DataDump

['Sony',
 'Terracom',
 'Adobe',
 'Target',
 'Ebay',
 'Home',
 'Google',
 'Sony',
 'Premera',
 'Anthem',
 'Experian',
 'TalkTalk',
 'Juniper']

Name fixes:

In [95]:
DataDump[5] = 'Home Depot'

In [96]:
DataDump

['Sony',
 'Terracom',
 'Adobe',
 'Target',
 'Ebay',
 'Home Depot',
 'Google',
 'Sony',
 'Premera',
 'Anthem',
 'Experian',
 'TalkTalk',
 'Juniper']

In [97]:
Companies = pd.Series(DataDump)  # To put into dataframe later

### Stock Tickers:

In [98]:
Link_Data = Link.extract([
        # ('with_parent','LTPage[page_index="0"]'), # PAGE Specifier if multiple pages in PDF
        ('with_formatter', 'text'),
        ('Company', ':in_bbox("92, 447, 140, 760")')])
DataDump = Link_Data['Company'].split(' ')
DataDump[1::2]

['SNE',
 'TER.AX',
 'ADBE',
 'TGT',
 'EBAY',
 'HD',
 'GOOG',
 'SNE',
 'ESRX',
 'ANTM',
 'EXPN.L',
 'TALK.L',
 'JNPR']

In [99]:
Tickers = pd.Series(DataDump[1::2])

### Disclosure Dates:

In [106]:
Link_Data = Link.extract([
        # ('with_parent','LTPage[page_index="0"]'), # PAGE Specifier if multiple pages in PDF
        ('with_formatter', 'text'),
        ('Company', ':in_bbox("147, 447, 200, 760")')])
DataDump = Link_Data['Company'].split(' ')
DataDump[1::2]

['31-May-11',
 '23-May-13',
 '03-Oct-13',
 '01-Jan-14',
 '21-Mar-14',
 '02-Sep-14',
 '10-Sep-14',
 '08-Dec-14',
 '29-Jan-15',
 '04-Feb-15',
 '01-Oct-15',
 '06-Nov-15',
 '17-Dec-15']

In [107]:
Dates = pd.Series(DataDump[1::2])

### Creating DataFrame:

In [115]:
df = pd.DataFrame(data=[Companies,Tickers,Dates],\
                  copy=True)

In [119]:
df = df.T

In [121]:
df.columns = ['Company Name', 'Stock Code', 'Disclosure Date']

In [122]:
df

Unnamed: 0,Company Name,Stock Code,Disclosure Date
0,Sony,SNE,31-May-11
1,Terracom,TER.AX,23-May-13
2,Adobe,ADBE,03-Oct-13
3,Target,TGT,01-Jan-14
4,Ebay,EBAY,21-Mar-14
5,Home Depot,HD,02-Sep-14
6,Google,GOOG,10-Sep-14
7,Sony,SNE,08-Dec-14
8,Premera,ESRX,29-Jan-15
9,Anthem,ANTM,04-Feb-15


#### Splitting Date elements into their own columns

In [137]:
# Month dictionary:
Month_Nums = {'Jan': 1, 'Feb': 2, 'Mar':3, 'Apr':4, 'May':5, 'Jun':6, 
              'Jul': 7, 'Aug':8, 'Sep':9, 'Oct':10, 'Nov':11, 'Dec':12}

In [282]:
df['Day'] = 0
df['Month'] = 0
df['Year'] = 2000

In [283]:
for i in range(0,13):
    DashRM = df.iloc[i]['Disclosure Date'].split('-')
    df.loc[i,'Day'] = int(DashRM[0])
    df.loc[i,'Month'] = int(Month_Nums[DashRM[1]])
    df.loc[i,'Year'] = 2000 + int(DashRM[2])

## Gathering stock prices for each company during year of data breach:

We will find price change rate from date of disclosure, to three days afterwards

In [5]:
from pandas_datareader import data, wb
import pandas_datareader.data as web
import datetime
from datetime import timedelta

<B>MANUALLY</B> iterate through rows with the below code to get stock price rate for each row, from 0 to len of DF.  
Unable to use complete loop automation because web datareader doesn't always work.  
Can try re-run for a given row if errors occur and it might properly pull the data.  Otherwise, probably no stock data.  
Successful price printout means data entry success.

In [291]:
df['3-day Price Change'] = np.nan

In [294]:
n = 9  # CHANGE THIS STARTING NUMBER VALUE YOURSELF for all rows
       # when "unable to read" error occurs

for i in range(n, len(df)):

    entry = df.values[i]

    entity = entry[0]
    ticker = entry[1]
    day = entry[3]
    month = entry[4]
    year = entry[5]

    start = datetime.datetime(year, month, day)
    end = start + timedelta(6)
    f = web.DataReader(ticker, 'yahoo', start, end)

    f = f.reset_index()      # indexes form dates to index.  Dates still retained.

    df.loc[i,'3-day Price Change'] = f.loc[3,'Close'] - f.loc[0,'Close']

    # print(i, f.loc[3,'Close'] - f.loc[0,'Close'])
    print(i, df.loc[i,'3-day Price Change'])

9 -2.769989
10 1.0
11 19.899994
12 -1.600001


In [295]:
df

Unnamed: 0,Company Name,Stock Code,Disclosure Date,Day,Month,Year,3-day Price Change
0,Sony,SNE,31-May-11,31,5,2011,-0.360001
1,Terracom,TER.AX,23-May-13,23,5,2013,0.0
2,Adobe,ADBE,03-Oct-13,3,10,2013,-1.299999
3,Target,TGT,01-Jan-14,1,1,2014,-0.27
4,Ebay,EBAY,21-Mar-14,21,3,2014,-0.572391
5,Home Depot,HD,02-Sep-14,2,9,2014,0.459999
6,Google,GOOG,10-Sep-14,10,9,2014,-9.945313
7,Sony,SNE,08-Dec-14,8,12,2014,-0.41
8,Premera,ESRX,29-Jan-15,29,1,2015,-0.409996
9,Anthem,ANTM,04-Feb-15,4,2,2015,-2.769989


In [296]:
# Export to CSV

df.to_csv("DataBreach&Stocks_v2.csv", header=True, encoding='utf-8')

### Create date fields

In [83]:
df2 = pd.read_csv('DataBreach&Stocks_v2.csv')

In [84]:
df2 = df2.drop('Unnamed: 0', axis=1)

In [85]:
df2

Unnamed: 0,Company Name,Stock Code,Disclosure Date,Day,Month,Year,3-day Price Change
0,Sony,SNE,31-May-11,31,5,2011,-0.360001
1,Terracom,TER.AX,23-May-13,23,5,2013,0.0
2,Adobe,ADBE,03-Oct-13,3,10,2013,-1.299999
3,Target,TGT,01-Jan-14,1,1,2014,-0.27
4,Ebay,EBAY,21-Mar-14,21,3,2014,-0.572391
5,Home Depot,HD,02-Sep-14,2,9,2014,0.459999
6,Google,GOOG,10-Sep-14,10,9,2014,-9.945313
7,Sony,SNE,08-Dec-14,8,12,2014,-0.41
8,Premera,ESRX,29-Jan-15,29,1,2015,-0.409996
9,Anthem,ANTM,04-Feb-15,4,2,2015,-2.769989


In [86]:
df2['Day_Token'] = 1

In [87]:
df2 = df2.append([df2],ignore_index=True)

In [88]:
len(df2)

26

#### Need to duplicate each row, with 2nd stock price value and date, in order to compare in Tableau

In [89]:
df2.loc[13:,'3-day Price Change'] = 0
df2.loc[13:,'Day_Token'] = 0

In [92]:
df2.head()

Unnamed: 0,Company Name,Stock Code,Disclosure Date,Day,Month,Year,3-day Price Change,Day_Token
0,Sony,SNE,31-May-11,31,5,2011,-0.360001,1
1,Terracom,TER.AX,23-May-13,23,5,2013,0.0,1
2,Adobe,ADBE,03-Oct-13,3,10,2013,-1.299999,1
3,Target,TGT,01-Jan-14,1,1,2014,-0.27,1
4,Ebay,EBAY,21-Mar-14,21,3,2014,-0.572391,1


In [93]:
df2.tail()

Unnamed: 0,Company Name,Stock Code,Disclosure Date,Day,Month,Year,3-day Price Change,Day_Token
21,Premera,ESRX,29-Jan-15,29,1,2015,0.0,0
22,Anthem,ANTM,04-Feb-15,4,2,2015,0.0,0
23,Experian,EXPN.L,01-Oct-15,1,10,2015,0.0,0
24,TalkTalk,TALK.L,06-Nov-15,6,11,2015,0.0,0
25,Juniper,JNPR,17-Dec-15,17,12,2015,0.0,0


In [94]:
df2.to_csv('DataBreach&Stocks_v3.csv')

# Conclusion

In the previous attempt, our data only contained year of data breach, not date. This time I decided to find another dataset that contained dates that company announced they have been breached, so I can track price changes immediately following announcement. Price change is measured from the day a company announces the breach, to 3 days after the announcement.

I visualized rise and drop in stock prices with line slopes. One can easily see that the lines tend to have negative slopes than positive.

Stock prices can both rise and fall after a data breach announcement, but more likely they will  drop slightly. This visual does not take into account that stock changes can be affected by a variety of factors, not just data breaches.