# CKO Revision

Compustat firms over 2000 and 2017 that meet the following criteria
- All controls including Ravenpack
- Delete finance 6000 <= SIC <= 6999 and utility 4900 <= SIC <= 4999 firms
- Should be in the HP dataset

Merge in with SDC
- Should be in Linking table
- Merge with SDC: (a) 50% > shares purchased in transaction and (b) Complete
- Should have (a) Target LT sales (b) Target market value 4 weeks prior to announcements (c) Deal values (d) Target total assets

### Import modules

In [1]:
import pandas as pd
import numpy as np
import rpy2.rinterface #ggplot tool
from pandas_profiling import ProfileReport
import dask.dataframe as dd
import wrds
import pandasql as ps
import sqlite3

In [2]:
# ## Review TNIC-3 data

# ### Import TNIC3 data from Hoberg and Philips data library 

# # !wget -P ../2_pipeline/ http://hobergphillips.tuck.dartmouth.edu/idata/tnic3_data.zip
# # !unzip -q ../2_pipeline/tnic3_data.zip -d ../2_pipeline/ && rm ../2_pipeline/tnic3_data.zip

# """
# Hoberg and Philips TNIC3 database
# """
# tnic = pd.read_csv('/Users/ohn0000/Dropbox/Project/cko/0_data/external/tnic3_data.txt', 
#                    delimiter='\t', header=0, index_col=['gvkey1', 'year', 'gvkey2'])
# tnic.dropna(inplace=True)

# ### Subset to 20-closest competitors

# # tnic_industry = tnic.groupby(level=['gvkey1', 'year'])["score"].nlargest(20).reset_index(level=[0,1], drop=True)
# # tnic_industry = tnic_industry.to_frame(name='score')
# # tnic_industry.to_pickle('../2_pipeline/tnic_industry.pkl')
# tnic_industry = pd.read_pickle('../2_pipeline/tnic_industry.pkl')

# ```tnic_industry``` still has firm-years with less than 20 competitors.

# # """
# # Require at least 20 closest competitors
# # """
# # tnicind_sub = tnic.groupby(level=['gvkey1', 'year'])["score"].filter(lambda x: x.size == 20)
# # tnicind_sub = tnicind_sub.to_frame(name='score')

# """
# tnic_industry['gvkey1'] = tnic_industry['gvkey1'].apply(lambda x: str(x).zfill(6))
# tnic_industry['gvkey2'] = tnic_industry['gvkey2'].apply(lambda x: str(x).zfill(6))
# """

# Remeber that _year_ in __tnic_industry__ is the base year for identifying close competitors. Accordingly, _lead1_ is the M&A year and _lead2_ is the year following M&A.

# Readme_tnic3.txt explains that _year_ equals the first four digits of the __compustat__ _datadate_.

# ### Shift years in __tnic_industry__ to get _lead1_ and _lead2_ similarity scores

# tnic_industry.rename(columns={'score':'score_0'}, inplace=True)

# for i in range(1,3):
#     colname = 'score' + '_' + str(i)
#     tnic_industry['score'] = np.NaN
#     tnic_industry.index = tnic_industry.index.set_levels(tnic_industry.index.levels[1] + 1, level=1)
#     tnic_industry.update(tnic)
#     tnic_industry.rename(columns={'score':colname}, inplace=True)

# tnic_industry.reset_index(inplace=True)
# tnic_industry["year"] -= 2
# tnic_industry.set_index(["gvkey1", "year", "gvkey2"], inplace=True)

# tnic_industry.to_pickle('../2_pipeline/tnic_industry.pkl')

# ### Run __*tnic_industry.py*__ on _WRDS_ to update lead scores

# Many of the _lead1_ and _lead2_ values are missing. Grab these values from __TNIC_Advanced__ uploaded on _WRDS_. 

# # !scp ../2_pipeline/tnic_industry.pkl tnic_industry.py $WRDS:~

# ### Download updated __*tnic_industry*__ file from WRDS

# # !scp $WRDS:/scratch/ou/hohn/tnic_ind_update.pkl ../2_pipeline/ 

In [3]:
tnic_industry = pd.read_pickle('../2_pipeline/tnic_ind_update.pkl')

In [4]:
tnic_10 = tnic_industry.groupby(level=['gvkey1', 'year'])["score_0"].nlargest(10).reset_index(level=[0,1], drop=True).to_frame()

Redefine _tnic_industry_ as below to subset to 10-closest competitors.

In [5]:
tnic_industry = tnic_10.join(tnic_industry[['score_1','score_2']])

Average TNIC similarity score across closest competitors.  
Remeber that in __TNIC_ALL__ most of the scores equals to zero. The _z\__ might be the more suitable.
- Close pair in t0 not appearing in t1 or t2 is meaningful.
- __BE CAREFUL__ of year 2016 and 2017. __TNIC is available only up to 2017__.

In [6]:
avg_sim = tnic_industry.groupby(level=['gvkey1','year']).mean()
avg_sim = avg_sim.join(tnic_industry.groupby(level=['gvkey1','year']).count().add_prefix("n_"))
avg_sim = avg_sim.join(tnic_industry.fillna(0).groupby(level=['gvkey1','year']).mean().add_prefix("z_"))

Calculate _avg_sim_ of competitors

In [7]:
ind_avg = tnic_industry.join(avg_sim.reset_index().rename(columns={'gvkey1':'gvkey2'}).set_index(['gvkey2', 'year']), 
                   on=['gvkey2', 'year'], how='left', rsuffix='_a').groupby(level=['gvkey1','year']).mean()
ind_avg = ind_avg[['score_0_a', 'score_1_a', 'score_2_a', 'z_score_0', 'z_score_1', 'z_score_2']]
ind_avg.rename(columns={'score_0_a':'s_0_a', 
                        'score_1_a':'s_1_a',
                        'score_2_a':'s_2_a',
                        'z_score_0':'z_0_a', 
                        'z_score_1':'z_1_a',
                        'z_score_2':'z_2_a'}, inplace=True)

In [8]:
avg_sim = avg_sim.join(ind_avg)

### Grab COMPUSTAT _datadate_

Revise here if _avg\_sim_ needs additional COMPUSTAT variables

In [9]:
db = wrds.Connection(wrds_username='hohn')

Loading library list...
Done


In [10]:
comp_query = """
select distinct gvkey, datadate, fyear, indfmt, sale, sich
from comp.funda
where consol = %(consol)s and indfmt in %(indfmt)s 
    and datafmt = %(datafmt)s and popsrc = %(popsrc)s
    and curcd in %(curcd)s
order by gvkey, fyear
"""

parm = {'consol':('C'), 'indfmt' : ('INDL', 'FS'), 'datafmt': ('STD'), 'popsrc' : ('D'), 'curcd' : ('USD', 'CAD')}
comp = db.raw_sql(comp_query, date_cols=['datadate'], params=parm)

Exclude finance and utility firms based on SIC codes

In [11]:
from itertools import chain
sic = list(chain(range(6000,7000), range(4900,5000)))
comp = comp.loc[lambda x : ~ x['sich'].isin(sic)]

In [12]:
comp['fyear'] = comp['fyear'].astype('int16')
comp['year'] = comp['datadate'].dt.year.astype('int16')

comp['gvkey1'] = pd.to_numeric(comp['gvkey']).astype('int64')
comp.drop(columns='gvkey', inplace=True)

comp.set_index(['gvkey1', 'fyear'], inplace=True)

In [13]:
comp.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,datadate,indfmt,sale,sich,year
gvkey1,fyear,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1000,1961,1961-12-31,INDL,0.9,,1961
1000,1962,1962-12-31,INDL,1.6,,1962
1000,1963,1963-12-31,INDL,1.457,,1963
1000,1964,1964-12-31,INDL,2.032,,1964
1000,1965,1965-12-31,INDL,1.688,,1965


Not sure how TNIC deals fiscal years where fyear are differenct but
year are the same. Assume the lastest datadate within _year_

In [19]:
datadate = comp[~comp.index.duplicated(keep='last')][['datadate', 'year']].reset_index()

In [20]:
avg_sim = pd.merge(avg_sim, datadate, 
                 left_index=True, 
                 right_on=['gvkey1', 'year'], how='left').drop_duplicates(['gvkey1', 'year'], keep='last')

In [21]:
avg_sim = avg_sim[avg_sim['datadate'].notnull()]

In [22]:
col = list(avg_sim)
col = col[-4:] + col[:-4]
avg_sim = avg_sim[col].sort_values(['gvkey1', 'year', 'datadate'])

### Grab _permno_

Revise here if _avg\_sim_ needs additional CRSP variables

In [24]:
sql_query= """
select gvkey, liid as iid, lpermno as permno, linkdt, linkenddt
from crsp.ccmxpf_linktable
where linktype in %(type)s and linkprim in %(prim)s and usedflag = 1
"""

parm = {'type':('LU', 'LC'), 'prim':('P', 'C')}
linktable = db.raw_sql(sql_query, date_cols=['linkdt', 'linkenddt'], params=parm)

linktable['gvkey'] = pd.to_numeric(linktable['gvkey'])
linktable['permno'] = pd.to_numeric(linktable['permno']).astype('int64')
linktable['iid'] = linktable['iid'].astype('category')

In [25]:
enddt = pd.to_datetime('2020-01-07 00:00:00')

In [26]:
linktable['linkenddt'] = linktable['linkenddt'].fillna(value = enddt)
linktable['linkenddt'] = linktable['linkenddt'].dt.date

In [27]:
conn = sqlite3.connect(':memory:')
avg_sim.to_sql('avg_sim', conn, index=False)
linktable.to_sql('linktable', conn, index=False)

qry = '''
    select  
        avg_sim.*, linktable.permno
    from
        avg_sim left join linktable on
        avg_sim.datadate between linkdt and linkenddt and avg_sim.gvkey1 = linktable.gvkey
    '''
df = pd.read_sql_query(qry, conn)
df['permno'] = df['permno'].astype('Int64')
df['datadate'] = df['datadate'].astype('datetime64[ns]')

df[df['permno'].isna()].to_sql('df', conn, index=False)

qry = '''
    select
        a.*, b.gvkey1, b.datadate
        from linktable a join df b
        on 
            a.gvkey = b.gvkey1
'''
aug = pd.read_sql_query(qry, conn)

aug.to_sql('aug', conn, index=False)
qry = '''
    select gvkey1, permno, iid, min(linkdt) as linkdt, max(linkenddt) as linkenddt
    from aug
    group by gvkey1, permno, iid
    order by gvkey1, linkdt
'''

df = df.merge(aug[aug['iid'].isin(['01','02'])].rename(columns={'permno':'permno1'})[['gvkey1', 'permno1']], 
         left_on = ['gvkey1'], right_on=['gvkey1'], how='left')
df['permno'] = np.where(df['permno'].isna(), df['permno1'], df['permno'])

In [28]:
df = df.drop_duplicates(['gvkey1', 'year']).drop(columns='permno1')

In [29]:
col = list(df)
col.insert(2, col.pop(col.index('permno')))
df = df[col]

In [30]:
df.head()

Unnamed: 0,gvkey1,fyear,permno,datadate,year,score_0,score_1,score_2,n_score_0,n_score_1,n_score_2,z_score_0,z_score_1,z_score_2,s_0_a,s_1_a,s_2_a,z_0_a,z_1_a,z_2_a
0,1004,1994.0,54594,1995-05-31,1995,0.0058,0.1231,0.0696,1,1,1,0.0058,0.1231,0.0696,,,,,,
1,1004,1995.0,54594,1996-05-31,1996,0.02083,0.043288,0.048657,10,8,7,0.02083,0.03463,0.03406,0.046411,0.050805,0.04707,0.046411,0.036585,0.027966
2,1004,1996.0,54594,1997-05-31,1997,0.04576,0.027556,0.037771,10,9,7,0.04576,0.0248,0.02644,0.050885,0.062084,0.057949,0.050885,0.056042,0.035902
3,1004,1997.0,54594,1998-05-31,1998,0.02231,0.0261,0.04256,10,6,5,0.02231,0.01566,0.02128,0.034312,0.043136,0.050472,0.034312,0.020987,0.012926
4,1004,1998.0,54594,1999-05-31,1999,0.03026,0.0325,0.04791,10,10,10,0.03026,0.0325,0.04791,0.041202,0.053419,0.051189,0.041202,0.041825,0.036488


In [31]:
df['dt_s1'] = np.where(df['year'] == 2017, np.NaN, df['score_1'] - df['score_0'])
df['dt_s2'] = np.where(df['year'] >= 2016, np.NaN, df['score_2'] - df['score_1'])
df['dt_z1'] = np.where(df['year'] == 2017, np.NaN, df['z_score_1'] - df['z_score_0'])
df['dt_z2'] = np.where(df['year'] >= 2016, np.NaN, df['z_score_2'] - df['z_score_1'])
df['dt_s1a'] = np.where(df['year'] == 2017, np.NaN, (df['score_1'] - df['score_0'])
                        - (df['s_1_a'] - df['s_0_a']))
df['dt_s2a'] = np.where(df['year'] >= 2016, np.NaN, (df['score_2'] - df['score_1'])
                        - (df['s_2_a'] - df['s_1_a']))
df['dt_z1a'] = np.where(df['year'] == 2017, np.NaN, (df['z_score_1'] - df['z_score_0'])
                        - (df['z_1_a'] - df['z_0_a']))
df['dt_z2a'] = np.where(df['year'] >= 2016, np.NaN, (df['z_score_2'] - df['z_score_1'])
                        - (df['z_2_a'] - df['z_1_a']))

In [32]:
df.set_index(['gvkey1', 'year'], inplace=True)

In [33]:
dt_sim = df.rename(columns={'n_score_0':'n0',
                            'n_score_1':'n1',
                            'n_score_2':'n2'})[['fyear','permno','datadate',
                                                'n0','n1','n2',
                                                'dt_s1','dt_s2','dt_z1','dt_z2',
                                                'dt_s1a','dt_s2a','dt_z1a','dt_z2a']]

## Control

_IBES_
- Change in number of analysts

_CRSP_
- Ret. volatility
- MTB

_Compustat_
- Net asset growth
- Income
- Assets
- Age
- R&D
- Loss

_Ravenpack DJ, PR_
- Number of articles

Cross-section
- M&A result
- Differentiation

Run _1\_code/cko.sas_ on WRDS using the following file

In [48]:
# dt_sim.to_csv('../2_pipeline/get_these.csv')

# !scp ../2_pipeline/get_these.csv hohn@wrds-cloud.wharton.upenn.edu:/scratch/ou/hohn/

# !scp hohn@wrds-cloud.wharton.upenn.edu:/scratch/ou/hohn/compcontrol.csv ../0_data/external/compcontrol.csv

In [49]:
controls=[
    'cusip',
    'atg',
    'mb',
    'income',
    'r_d',
    'age',
    'neg_ni'
         ]

In [50]:
compcont = pd.read_csv('../0_data/external/compcontrol.csv', 
                       index_col=['gvkey1', 'year'])[controls]

In [51]:
compcont

Unnamed: 0_level_0,Unnamed: 1_level_0,cusip,atg,mb,income,r_d,age,neg_ni
gvkey1,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1004,1995,00036110,,,,,,
1004,1996,00036110,,,,,,
1004,1997,00036110,,,,,,
1004,1998,00036110,,,,,,
1004,1999,00036110,,1.75682,0.0569,0.0,34.0,0.0
...,...,...,...,...,...,...,...,...
317264,2015,Y2106R11,,,,,,
317264,2016,Y2106R11,,,,,,
317264,2017,Y2106R11,,,,,,
318728,2016,G0684D10,,,,,,


In [52]:
list(dt_sim.join(compcont))

['fyear',
 'permno',
 'datadate',
 'n0',
 'n1',
 'n2',
 'dt_s1',
 'dt_s2',
 'dt_z1',
 'dt_z2',
 'dt_s1a',
 'dt_s2a',
 'dt_z1a',
 'dt_z2a',
 'cusip',
 'atg',
 'mb',
 'income',
 'r_d',
 'age',
 'neg_ni']

In [53]:
begdt = '2000-12-31'
enddt = '2017-12-31'

dt_sim = dt_sim.join(compcont).loc[lambda x : 
                                   (x['datadate'] >= begdt) & 
                                   (x['datadate'] <= enddt)]

In [55]:
rpna = pd.read_sas('../0_data/external/wrds_company_mapping.sas7bdat')
rpna = rpna.loc[lambda x : x['CUSIP'].notnull()]
rpna.RP_ENTITY_ID = rpna.RP_ENTITY_ID.str.decode(encoding='UTF8')
rpna.CUSIP = rpna.CUSIP.str.decode(encoding='UTF8')

rpna.CUSIP = rpna.CUSIP.str.slice(stop = -1)

In [56]:
rpna.head()

Unnamed: 0,RP_ENTITY_ID,ENTITY_TYPE,ENTITY_NAME,TICKER,CUSIP,ISIN
0,0005DF,b'COMP',b'Predator Oil & Gas Holding PLC',b'PRD.LN',G7209010,b'JE00BFZ1D698'
1,0005DF,b'COMP',b'Predator Oil & Gas Holding PLC',b'PRD',G7209010,b'JE00BFZ1D698'
2,00067A,b'COMP',b'Humana Inc.',b'HUM',44485910,b'US4448591028'
3,000713,b'COMP',b'Af1 Capital Corp.',b'AFC.P.V',00109310,b'CA0010931032'
4,000713,b'COMP',b'Af1 Capital Corp.',b'AFC.P',00109310,b'CA0010931032'


## Link SDC to COMPUSTAT

### SDC and Compustat Link File

The link file is from [Michael Ewens](https://github.com/michaelewens/SDC-to-Compustat-Mapping.git). Cite papers below.

 ```
@article{phillips2013r,
  title={R\&D and the Incentives from Merger and Acquisition Activity},
  author={Phillips, Gordon M and Zhdanov, Alexei},
  journal={The Review of Financial Studies},
  volume={26},
  number={1},
  pages={34--78},
  year={2013},
  publisher={Society for Financial Studies}
  }
 ```

```
@article{ewensPetersWang2018,
 title={Acquisition prices and the measurement of intangible capital},
 author={Ewens, Michael and Peters, Ryan and Wang, Sean},
 journal={Working Paper}
 year={2018}
 }
```

### SDC processing prohibitively slow. Work on the WRDS cloud using _sdc_link.sas_.

In [61]:
col = ["DealNumber","agvkey","tgvkey","AMANAMES","ACUSIP",
       "APUBLIC","ATTITUDE","FORM","STATUSCODE", "EBITLTM",
       "AMV","ENTVAL","BOOKVALUE",
       "EQVAL","MV","NETASS","NILTM","PCT_CASH","PCT_STK",
       "PCT_OTHER","PCT_UNKNOWN","PR","RANKVAL","SALESLTM",
       "TMANAMES","TNATIONCODE","TPUBLIC","MASTER_CUSIP","TTICKER"]

In [62]:
type = {}
for var in col:
    if var in ["DealNumber","agvkey","tgvkey"]:
        type[var] = 'Int64'
    if var in ["ACUSIP","APUBLIC","ATTITUDE","FORM","STATUSCODE",
               "TNATIONCODE","TPUBLIC","MASTER_CUSIP","TTICKER"]:
        type[var] = 'category'


In [63]:
sdc_link = pd.read_csv('../0_data/external/sdc_gvkey.csv', 
                       header=0,
                       parse_dates=['DATEANN','DATEEFF','DATEFIN'],
                       dtype=type, low_memory=False)

In [64]:
for var in ["EBITLTM", "AMV","ENTVAL","BOOKVALUE",
            "EQVAL","MV","NETASS","NILTM","PR","RANKVAL","SALESLTM"]:
    sdc_link[var] = np.where(sdc_link[var].isin(['nan', 'None', 'P', 'M']), np.NaN,
                             sdc_link[var].str.replace(',',''))
    sdc_link[var] = pd.to_numeric(sdc_link[var]).astype('float')

In [65]:
sdc_link.set_index('DealNumber', inplace=True)

In [66]:
compmna = sdc_link[lambda x : ~x['FORM'].isin(['Acq. Part. Int.','Acq. Rem. Int.'])]

In [67]:
compmna[['FORM','agvkey']].groupby('FORM').count()

Unnamed: 0_level_0,agvkey
FORM,Unnamed: 1_level_1
Acq. Cert. Asts.,337
Acq. Maj. Int.,3244
Acq. Part. Int.,0
Acq. Rem. Int.,0
Acq. of Assets,54803
Acquisition,3
Merger,15771


Drop _AP_ and _AR_.

Form of the Transaction: 10 codes describing the specific form of the transaction:
- M (MERGER): A combination of business takes place or 100% of the stock of a public or private company is acquired.
- A (ACQUISITION): deal in which 100% of a company is spun off or split off is classified as an acquisition by shareholders.
- AM (ACQ OF MAJORITY INTEREST): the acquiror must have held less than 50% and be seeking to acquire 50% or more, but less than 100% of the target company’s stock.
- AP (ACQ OF PARTIAL INTEREST): deals in which the acquiror holds less than 50% and is seeking to acquire less than 50%, or the acquiror holds over 50% and is seeking less than 100% of the target company’s stock. 
- AR (ACQ OF REMAINING INTEREST): deals in which the acquiror holds over 50% and is seeking to acquire 100% of the target company’s stock.
- AA (ACQ OF ASSETS): deals in which the assets of a company, subsidiary, division, or branch are acquired. This code is used in all transactions when a company is being acquired and the consideration sought is not given.
- AC: (ACQ OF CERTAIN ASSETS): deals in which sources state that “certain assets” of a company, subsidiary, or division are acquired.
- R (RECAPITALIZATION): deals in which a company undergoes a shareholders’ Leveraged recapitalization in which the company issues a special one-time dividend (in the form of cash, debt securities, preferred stock, or assets) allowing shareholders to retain an equity interest in the company.
- B (BUYBACK): deals in which the company buys back its equity securities or securities convertible into equity, either on the open market, through privately negotiated transactions, or through a tender offer. Board authorized repurchases are included.
- EO (EXCHANGE OFFER): deals in which a company offers to exchange new securities for its equity securities outstanding or its securities convertible into equity.

In [68]:
datadate['lagdate'] = datadate.groupby('gvkey1')['datadate'].shift(1) + pd.DateOffset(days=1)

In [69]:
datadate['lagdate'] = np.where(datadate['lagdate'].isna(),
                              datadate['datadate'] - pd.DateOffset(years=1) + pd.DateOffset(days=1),
                              datadate['lagdate'])

In [70]:
compmna.to_sql('compmna', conn, index=True, if_exists='replace')
datadate.to_sql('datadate', conn, index = True, if_exists='replace')
qry = '''
    select *
    from 
        (select
                a.*, b.datadate, b.fyear
            from
                compmna a left join datadate b on
                a.agvkey == b.gvkey1 and b.datadate >= a.dateeff
            group by 
                a.DealNumber
        )
        '''
temp1 = pd.read_sql_query(qry, conn)

In [71]:
for col in ['tgvkey', 'fyear']:
    temp1[col] = temp1[col].astype('Int64')
for col in ['agvkey', 'tgvkey', 'ACUSIP', 'APUBLIC', 'ATTITUDE', 'FORM', 'STATUSCODE',
             'TNATIONCODE', 'TPUBLIC', 'MASTER_CUSIP', 'TTICKER']:
    temp1[col] = temp1[col].astype('category')
for col in ['datadate', 'DATEANN', 'DATEEFF', 'DATEFIN']:
    temp1[col] = temp1[col].astype('datetime64[ns]')

In [72]:
compmna = temp1[(temp1['datadate'] - temp1['DATEEFF']).dt.days <= 370]

col = list(compmna)
col.insert(5, col.pop(col.index('datadate')))
col.insert(6, col.pop(col.index('fyear')))
compmna = compmna[col]

In [73]:
compmna.set_index('DealNumber', inplace=True)

SDC obsevations with logical (less than 370 day difference from effective date) Compustat _datadate_

In [74]:
compmna

Unnamed: 0_level_0,agvkey,tgvkey,AMANAMES,ACUSIP,datadate,fyear,APUBLIC,ATTITUDE,FORM,STATUSCODE,...,PCT_OTHER,PCT_UNKNOWN,PR,RANKVAL,SALESLTM,TMANAMES,TNATIONCODE,TPUBLIC,MASTER_CUSIP,TTICKER
DealNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
336679020,9236,23979,Rhone-Poulenc Rorer Inc,76242T,1995-12-31,1995,Public,Friendly,Acq. Maj. Int.,C,...,,100.0,25.00,150.000,1.540,Applied Immune Sciences Inc,US,Public,03820F,AISX
368462040,8543,,Philip Morris Cos Inc,718154,1998-12-31,1998,Public,Friendly,Acq. Maj. Int.,C,...,,100.0,,211.895,,Tabaqueira Empresa Industrial,PO,Govt.,87327A,
378271020,25341,,Kronos Inc,501052,1999-09-30,1999,Public,Friendly,Acq. of Assets,C,...,,,,,,Interboro Systems Corp,US,Priv.,45843L,
396640020,28633,13901,Multi-Market Radio Inc,625432,1995-12-31,1995,Public,Friendly,Merger,C,...,,,13.61,26.594,8.489,Southern Starr Broadcasting,US,Public,843835,SSBG
400205020,13623,2252,CONMED Corp,207410,1995-12-31,1995,Public,Friendly,Merger,C,...,,,1.77,29.113,28.418,Birtcher Medical Systems Inc,US,Public,091311,BIRT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3097373020,64768,24893,Amazon.Com Inc,023135,2017-12-31,2017,Public,Friendly,Merger,C,...,,,42.00,13598.398,15856.000,Whole Foods Market Inc,US,Public,966837,WFM
3113578020,22037,,Sientra Inc,82621J,2017-12-31,2017,Public,Friendly,Merger,C,...,69.15,,0.32,9.529,6.615,Miramar Labs Inc,US,Public,60503C,MRLB
3128101020,28742,14170,BorgWarner Inc,099724,2017-12-31,2017,Public,Friendly,Merger,C,...,,,22.00,150.487,60.363,Sevcon Inc,US,Public,81783K,SEV
3136151020,24720,4156,Wabash National Corp,929566,2017-12-31,2017,Public,Friendly,Merger,C,...,,100.0,21.00,675.407,300.791,Supreme Industries Inc,US,Public,868607,STS


Non-missing target sales + Missing target sales but target _gvkey_ available

## Construct IV by acquirer's _gvkey_ and _datadate_

### Grab Compustat Sales

In [75]:
sale = comp[comp['sale'].notnull() & comp['sale']!=0][['sale']]

In [76]:
compmna.to_sql('compmna', conn, index=True, if_exists='replace')

sale.to_sql('sale', conn, index=True, if_exists='replace')
qry = '''
        select a.*, b.sale as a_sale, b.fyear as amatch
        from compmna a left join sale b
        on a.agvkey = b.gvkey1 and b.fyear =  
        (select fy.fyear 
            from sale as fy 
            where a.agvkey = fy.gvkey1 and fy.fyear < a.fyear
            order by fy.fyear desc
            limit 1
        )
        '''
temp = pd.read_sql_query(qry, conn)
temp.to_sql('temp', conn, index=True, if_exists='replace')

qry = '''
        select a.*, b.sale as t_sale, b.fyear as tmatch
        from temp a left join sale b
        on a.tgvkey = b.gvkey1 and b.fyear =  
        (select fy.fyear 
            from sale as fy 
            where a.tgvkey = fy.gvkey1 and fy.fyear < a.fyear
            order by fy.fyear desc
            limit 1
        )
        '''
compmna = pd.read_sql_query(qry, conn)

In [77]:
for col in ['agvkey', 'tgvkey']:
    compmna[col] = pd.to_numeric(compmna[col]).astype('Int64')
for col in ['fyear', 'amatch', 'tmatch']:
    compmna[col] = compmna[col].astype('Int64')
for col in ['agvkey', 'tgvkey', 'ACUSIP', 'APUBLIC', 'ATTITUDE', 'FORM', 'STATUSCODE',
             'TNATIONCODE', 'TPUBLIC', 'MASTER_CUSIP', 'TTICKER']:
    compmna[col] = compmna[col].astype('category')
for col in ['datadate', 'DATEANN', 'DATEEFF', 'DATEFIN']:
    compmna[col] = compmna[col].astype('datetime64[ns]')

In [78]:
compmna.drop(columns=['index'], inplace=True)

In [79]:
compmna.set_index(['agvkey','DealNumber'], inplace=True, verify_integrity=True)

### Calculate IV based on effective date

#### Number of days from effective date to fiscal year-end

In [80]:
compmna['days'] = (compmna['datadate'] - compmna['DATEEFF']).dt.days

Very few observations have negative target and acquirer sales.

In [81]:
compmna = compmna[~(compmna['a_sale']<0) & ~(compmna['t_sale']<0) & compmna['a_sale'].notnull()]

In [82]:
compmna['iv1'] = compmna['days']/365 * (abs(compmna['SALESLTM']/compmna['a_sale']))

In [83]:
compmna['iv4'] = abs(compmna['SALESLTM']/compmna['a_sale'])

In [84]:
compmna['iv2'] = np.where(compmna['iv1'].notnull(), compmna['iv1'], 
                         compmna['days']/365 * (abs(compmna['MV']/compmna['AMV'])))

In [85]:
compmna['iv5'] = np.where(compmna['iv4'].notnull(), compmna['iv4'],(abs(compmna['MV']/compmna['AMV'])))

In [86]:
compmna['iv3'] = np.where(compmna['iv2'].notnull(), compmna['iv2'], 
                         compmna['days']/365 * (abs(compmna['t_sale']/compmna['a_sale'])))

In [87]:
compmna['iv6'] = np.where(compmna['iv5'].notnull(), compmna['iv5'], (abs(compmna['t_sale']/compmna['a_sale'])))

In [88]:
iv = compmna.reset_index().groupby(['agvkey','datadate']).sum()[['iv1','iv2','iv3','iv4','iv5','iv6']]

In [89]:
iv['n'] = compmna.reset_index()[['agvkey','datadate','DealNumber']].groupby(['agvkey','datadate']).count()

In [90]:
iv = iv[iv['n'].notnull()]

#### Change IV to NaN based on values

- Change 4 to NaN if 4 is 0 and 5 is not 
- Change 4, 5 to NaN if 4,5 are 0 and 6 is not
- Change 6 to NaN if 4-6 are 0

In [91]:
iv['iv4'] = np.where((iv['iv4']==0)&(iv['iv5']>0),np.NaN, iv['iv4'])

In [92]:
iv['iv4'] = np.where((iv['iv4']==0)&(iv['iv5']==0)&(iv['iv6']>0),
                     np.NaN, iv['iv4'])
iv['iv5'] = np.where((iv['iv4'].isna())&(iv['iv5']==0)&(iv['iv6']>0),
                     np.NaN, iv['iv5'])

In [93]:
iv['iv4'] = np.where((iv['iv4']==0)&(iv['iv5']==0)&(iv['iv6']==0),
                     np.NaN, iv['iv4'])
iv['iv5'] = np.where((iv['iv4'].isna())&(iv['iv5']==0)&(iv['iv6']==0),
                     np.NaN, iv['iv5'])
iv['iv6'] = np.where((iv['iv4'].isna())&(iv['iv5'].isna())&(iv['iv6']==0),
                     np.NaN, iv['iv6'])

In [94]:
iv['iv1'] = np.where(iv['iv4'].isna(), np.NaN, iv['iv1'])
iv['iv2'] = np.where(iv['iv5'].isna(), np.NaN, iv['iv2'])
iv['iv3'] = np.where(iv['iv6'].isna(), np.NaN, iv['iv3'])

## Combine _dt_sim_ and _iv_

In [96]:
sim_iv = dt_sim[dt_sim['dt_s1'].notnull()].reset_index().rename(columns={'gvkey1':'agvkey'}).merge(iv.reset_index(),
                                                                        on=['agvkey','datadate'],
                                                                        indicator=True, how='left').set_index(['agvkey','datadate'])

In [145]:
sim_iv

Unnamed: 0_level_0,Unnamed: 1_level_0,year,fyear,permno,n0,n1,n2,dt_s1,dt_s2,dt_z1,dt_z2,...,age,neg_ni,iv1,iv2,iv3,iv4,iv5,iv6,n,_merge
agvkey,datadate,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1004,2001-05-31,2001,2000.0,54594,10,9,9,-0.005536,0.016811,-0.008620,0.015130,...,36.0,0.0,0.013052,0.013052,0.013052,0.019525,0.019525,0.019525,1.0,both
1004,2002-05-31,2002,2001.0,54594,10,9,10,0.024633,-0.019463,0.018530,-0.013360,...,37.0,1.0,,,,,,,,left_only
1004,2003-05-31,2003,2002.0,54594,7,6,6,-0.000119,-0.009033,-0.005571,-0.007743,...,38.0,1.0,,,,,,,,left_only
1004,2004-05-31,2004,2003.0,54594,9,8,7,0.013556,0.005100,0.010000,0.000411,...,39.0,0.0,,,,,,,,left_only
1004,2005-05-31,2005,2004.0,54594,10,8,7,0.011350,0.041850,0.005200,0.026220,...,40.0,0.0,,,,,,,,left_only
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
316056,2014-12-31,2014,2014.0,14297,2,1,1,-0.007250,0.023500,-0.009450,0.011750,...,,,,,,,,,,left_only
316056,2015-12-31,2015,2015.0,14297,2,1,1,0.014650,-0.012400,0.000700,-0.006200,...,,,,,,,,,,left_only
316056,2016-12-31,2016,2016.0,14297,2,1,0,0.001250,,-0.006500,,...,,,,,,,,,,left_only
317264,2015-03-31,2015,2014.0,14642,10,7,8,-0.004083,-0.016020,-0.052400,0.003290,...,,,,,,,,,,left_only


In [153]:
len(sim_iv[lambda x : 
                  (x['dt_s1'].notnull()) & 
                  (x['permno'].notnull()) & 
                  (x['cusip'].notnull())
                 ])

50795

In [99]:
len(sim_iv[lambda x : (x['fyear']>=1997)&(x['fyear']<=2012)&(x['n0']==10)&(x['_merge']=='both')])

8647

In [100]:
len(sim_iv[lambda x : (x['fyear']>=1997)&
           (x['fyear']<=2012)&
           (x['n0']==10)&
           (x['_merge']=='both')&
           (x['iv2'].notnull())])

3008

In [101]:
np.mean(sim_iv[(sim_iv['_merge']=='left_only')&(sim_iv['n0']==10)])

year       2007.084703
fyear      2006.963350
permno    70932.370947
n0           10.000000
n1            8.551088
n2            6.431142
dt_s1        -0.007179
dt_s2        -0.003920
dt_z1        -0.017696
dt_z2        -0.018515
dt_s1a        0.004639
dt_s2a        0.002998
dt_z1a        0.014682
dt_z2a        0.003715
atg           7.741768
mb            5.275408
income       -0.212009
r_d           7.508611
age          15.821728
neg_ni        0.490305
iv1                NaN
iv2                NaN
iv3                NaN
iv4                NaN
iv5                NaN
iv6                NaN
n                  NaN
dtype: float64

In [102]:
np.mean(sim_iv[(sim_iv['_merge']=='both')&(sim_iv['n0']==10)])

year       2007.274022
fyear      2007.165394
permno    69119.720917
n0           10.000000
n1            8.492975
n2            6.682791
dt_s1        -0.004673
dt_s2        -0.002375
dt_z1        -0.014121
dt_z2        -0.013299
dt_s1a        0.004816
dt_s2a        0.002721
dt_z1a        0.013058
dt_z2a        0.004372
atg           0.523801
mb            3.673853
income       -0.053865
r_d           0.924613
age          19.702934
neg_ni        0.296678
iv1           0.504296
iv2           0.499703
iv3          86.799923
iv4           1.249472
iv5           1.238394
iv6         158.141450
n             1.884771
dtype: float64

### M&A Disclosure

In [172]:
disc = pd.read_csv('~/Dropbox/Project/cko/0_data/manual/disc.csv', parse_dates=['DATADATE'])
disc['CIK'] = disc['CIK'].apply(lambda x: str(int(x)).zfill(10) if pd.notnull(x) else None)

In [171]:
temp = sim_iv[lambda x : (x['fyear']>=1997)&
           (x['fyear']<=2012)&
           (x['n0']==10)&
           (x['_merge']=='both')&
           (x['iv6'].notnull())]

In [173]:
disc = disc.rename(columns={'GVKEY':'agvkey','DATADATE':'datadate'}).set_index(['agvkey','datadate'])

In [175]:
temp.drop(columns=['_merge']).merge(disc,left_index=True,right_index=True,how='inner')

Unnamed: 0_level_0,Unnamed: 1_level_0,year,fyear,permno,n0,n1,n2,dt_s1,dt_s2,dt_z1,dt_z2,...,iv2,iv3,iv4,iv5,iv6,n,FYEAR,CIK,MD_A,PROFORMA
agvkey,datadate,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1056,2002-06-30,2002,2002.0,62500,10,10,9,0.008890,0.007063,0.00889,0.00019,...,0.016165,0.016165,0.536382,0.536382,0.536382,1.0,2002,0000002601,2,C
1072,2008-03-31,2008,2007.0,81912,10,6,6,-0.029860,-0.003017,-0.03480,-0.00181,...,0.031562,0.031562,0.061277,0.061277,0.061277,1.0,2007,0000859163,3,N
1078,2001-12-31,2001,2001.0,20482,10,10,10,0.014280,0.028030,0.01428,0.02803,...,0.146678,2.215521,0.177959,0.177959,2.661932,4.0,2001,0000001800,3,I
1078,2004-12-31,2004,2004.0,20482,10,9,10,0.015122,-0.004732,0.01186,-0.00147,...,0.010365,0.010365,0.013248,0.013248,0.013248,4.0,2004,0000001800,3,N
1078,2006-12-31,2006,2006.0,20482,10,10,9,0.024410,-0.019516,0.02441,-0.02212,...,0.000853,0.197027,0.019458,0.019458,0.301361,2.0,2006,0000001800,1,N
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
264386,2006-12-31,2006,2006.0,90220,10,8,6,-0.014332,0.011163,-0.01650,0.00453,...,0.292291,0.292291,0.348648,0.348648,0.348648,1.0,2006,0001106942,5,C
264386,2008-12-31,2008,2008.0,90220,10,8,7,0.003007,0.002727,-0.00337,-0.00128,...,0.116777,0.116777,0.127234,0.127234,0.127234,1.0,2008,0001106942,2,C
264386,2009-12-31,2009,2009.0,90220,10,8,0,0.004815,,-0.00288,-0.03078,...,0.043475,0.043475,0.066955,0.066955,0.066955,2.0,2009,0001106942,2,I
264387,2007-05-31,2007,2006.0,90179,10,9,8,-0.004423,-0.000592,-0.01561,-0.01166,...,0.136713,0.136713,0.409020,0.409020,0.409020,1.0,2006,0001275187,2,I


In [177]:
crit1 = "Compustat firm-years between 2000 and 2017 excluding the financing and utility sectors"
crit2 = "TNIC data insufficient to calculate change in product market similarity"
crit3 = "Missing _PERMNO_ or _CUSIP_"
crit4 = "Compustat firm-years with change in similarity"
crit5 = "Non-M&A firm years including acquisition of partial or remaining interest"
crit6 = "M&A firm-years"
crit7 = "Acquirers with less than 10 close competitors"
crit8 = "Target information insufficient to calculate instrumental variable"
crit9 = "M&A firm-years with IV"
crit10 = "M&A disclosure already collected"

obs1 = len(comp[lambda x : x['year'] >= 2000])
obs2 = len(avg_sim[lambda x : x['year'] >= 2000])
obs3 = len(sim_iv[lambda x : 
                  (x['dt_s1'].notnull()) & 
                  (x['permno'].notnull()) & 
                  (x['cusip'].notnull())
                 ])
obs4 = len(sim_iv[lambda x : 
                  (x['_merge'] == 'both') & 
                  (x['dt_s1'].notnull()) & 
                  (x['permno'].notnull()) & 
                  (x['cusip'].notnull())
                 ])
obs5 = len(sim_iv[lambda x : 
                  (x['_merge'] == 'both') & 
                  (x['dt_s1'].notnull()) & 
                  (x['permno'].notnull()) & 
                  (x['cusip'].notnull()) &
                  (x['n0']==10)
                 ])
obs6 = len(sim_iv[lambda x : 
                  (x['_merge'] == 'both') & 
                  (x['dt_s1'].notnull()) & 
                  (x['permno'].notnull()) & 
                  (x['cusip'].notnull()) &
                  (x['n0'] == 10) &
                  (x['iv6'].notnull())
                 ])
obs7 = len(temp.drop(columns=['_merge']).merge(disc,left_index=True,right_index=True,how='inner'))
sample_selection = pd.DataFrame(data={"Criteria": 
                                      [
                                          crit1,
                                          crit2,
                                          crit3,
                                          crit4,
                                          crit5,
                                          crit6,
                                          crit7,
                                          crit8,
                                          crit9,
                                          crit10
                                      ],
                                      "Observations":
                                      [
                                          obs1,
                                          obs2 - obs1,
                                          obs3 - obs2,
                                          obs3,
                                          obs4 - obs3,
                                          obs4,
                                          obs5 - obs4,
                                          obs6 - obs5,
                                          obs6,
                                          obs7
                                      ]
                                     }).set_index("Criteria")

print(sample_selection.to_markdown())

| Criteria                                                                               |   Observations |
|:---------------------------------------------------------------------------------------|---------------:|
| Compustat firm-years between 2000 and 2017 excluding the financing and utility sectors |         192392 |
| TNIC data insufficient to calculate change in product market similarity                |        -130283 |
| Missing _PERMNO_ or _CUSIP_                                                            |         -11314 |
| Compustat firm-years with change in similarity                                         |          50795 |
| Non-M&A firm years including acquisition of partial or remaining interest              |         -36245 |
| M&A firm-years                                                                         |          14550 |
| Acquirers with less than 10 close competitors                                          |          -3957 |
| Target information insuffi

In [180]:
14550/50795

0.28644551629097353

In [179]:
sim_iv[lambda x : 
                  (x['_merge'] == 'both') & 
                  (x['dt_s1'].notnull()) & 
                  (x['permno'].notnull()) & 
                  (x['cusip'].notnull()) &
                  (x['n0'] == 10) &
                  (x['iv6'].notnull())
                 ].dropna(subset=controls)

Unnamed: 0_level_0,Unnamed: 1_level_0,year,fyear,permno,n0,n1,n2,dt_s1,dt_s2,dt_z1,dt_z2,...,age,neg_ni,iv1,iv2,iv3,iv4,iv5,iv6,n,_merge
agvkey,datadate,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1004,2001-05-31,2001,2000.0,54594,10,9,9,-0.005536,0.016811,-0.00862,0.01513,...,36.0,0.0,0.013052,0.013052,0.013052,0.019525,0.019525,0.019525,1.0,both
1004,2012-05-31,2012,2011.0,54594,10,9,9,0.024151,-0.025556,0.02008,-0.02300,...,47.0,0.0,0.023760,0.023760,0.526889,0.041083,0.041083,1.055681,3.0,both
1013,2004-10-31,2004,2004.0,50906,10,10,10,-0.003890,0.001050,-0.00389,0.00105,...,31.0,0.0,0.185870,0.185870,0.185870,0.408691,0.408691,0.408691,2.0,both
1013,2005-10-31,2005,2005.0,50906,10,10,8,0.001800,0.008462,0.00180,-0.00327,...,32.0,0.0,,,0.369150,,,0.769941,2.0,both
1013,2008-10-31,2008,2008.0,50906,10,10,6,0.002400,-0.020507,0.00240,-0.03472,...,35.0,1.0,0.057271,0.057271,0.057271,0.062774,0.062774,0.062774,2.0,both
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
264509,2016-12-31,2016,2016.0,90253,10,7,0,0.006604,,-0.03906,,...,18.0,0.0,,,80.496015,,,141.533737,3.0,both
271794,2013-12-31,2013,2013.0,92894,10,10,8,-0.004720,-0.027607,-0.00472,-0.04562,...,11.0,1.0,0.002269,0.002269,0.002269,0.027605,0.027605,0.027605,1.0,both
271987,2016-12-31,2016,2016.0,13279,10,9,0,-0.003768,,-0.01368,,...,8.0,1.0,,,67116.786626,,,168949.152542,1.0,both
284041,2012-02-29,2012,2011.0,93355,10,9,0,-0.008357,,-0.01727,-0.08022,...,5.0,1.0,,,0.215005,,,0.613099,3.0,both


In [None]:
# iv = iv.join(disc.rename(columns={'GVKEY':'agvkey', 'DATADATE':'datadate'}).set_index(['agvkey', 'datadate']),\
#         on=['agvkey', 'datadate'], how='left')

# iv[(iv['MD_A']==5)&(iv['disc']==0)]

# iv['disc'] = np.where((iv['MD_A']>=3)&(iv['PROFORMA'].isin(['I','C'])), 1, 0)

Disclosure also might need additonal data collection.

## Private Target Data (Chen 2019)

The _encoding_ option allows proper string imports.

In [None]:
cw = pd.read_sas('../0_data/manual/CW2019.sas7bdat', format = 'sas7bdat', encoding="utf-8")

In [None]:
cw

In [None]:
cw[['gvkey','ATTITUDE']].groupby('ATTITUDE').count()

In [None]:
cw['gvkey1'] = pd.to_numeric(cw['gvkey']).astype('int64')

_priv_ dataset is a subset of _compmna_ that will be matched to Ciao-Wei's data.

In [None]:
priv = sdc[(sdc['RANKVAL'].notnull()) & (sdc['TPUBLIC'] == 'Priv.')]
priv.drop_duplicates(inplace=True)
priv = priv[(priv['DATEEFF'] >= '1997-01-01') & (priv['DATEEFF'] <= '2013-12-31')]
private = priv.compute()

for var in ['DATEANN', 'DATEEFF']:
    private[var] = pd.to_datetime(private[var]).astype('datetime64[ns]')

The code below grabs all but 5 _MASTER_DEAL_NO_ from _private_.

In [None]:
# merge by dates and names
cw = cw.merge(private[['AMANAMES', 'DATEANN', 'DATEEFF', 'TMANAMES', 'MASTER_DEAL_NO']],
         left_on=['ACQ_NAME', 'ANN', 'EFF', 'TRG_NAME'],
         right_on=['AMANAMES', 'DATEANN', 'DATEEFF', 'TMANAMES'], how='left')
col = list(cw)
col.insert(0, col.pop())
cw = cw[col]

cw.drop(columns=['AMANAMES', 'DATEANN', 'DATEEFF', 'TMANAMES'], inplace=True)

cw.drop_duplicates(inplace=True)

# review dates and names of the missing
missing = private[['MASTER_DEAL_NO', 'AMANAMES', 'DATEANN', 'DATEEFF', 'TMANAMES']].merge(cw[cw['MASTER_DEAL_NO'].isna()][['ANN', 'EFF', 'ACQ_NAME', 'TRG_NAME']],
                                                           how='right',left_on=['DATEANN', 'DATEEFF'], right_on=['ANN', 'EFF'])

missing.drop_duplicates(inplace=True)
missing.reset_index(drop=True, inplace=True)

# manual match
missing = missing.iloc[[0, 1, 5, 16, 24, 26, 28, 29, 39, 44, 53]][['MASTER_DEAL_NO', 'ANN', 'EFF', 'ACQ_NAME', 'TRG_NAME']]

# update MASTER_DEAL_NO
cw = cw.merge(missing, 
         left_on=['ACQ_NAME', 'ANN', 'EFF', 'TRG_NAME'],
         right_on=['ACQ_NAME', 'ANN', 'EFF', 'TRG_NAME'], how='left', suffixes=('','_y'))

cw['MASTER_DEAL_NO'] = np.where(cw['MASTER_DEAL_NO'].isna(), cw['MASTER_DEAL_NO_y'], cw['MASTER_DEAL_NO'])

cw.drop(columns=['MASTER_DEAL_NO_y'], inplace=True)

In [None]:
col = list(cw)
col.insert(1, col.pop())
cw = cw[col]

In [None]:
cw.to_sql('compmna', conn, index=False, if_exists='replace')
datadate.to_sql('datadate', conn, index = True, if_exists='replace')
qry = '''
    select  
        a.*, b.datadate
    from
        compmna a join datadate b on
        a.gvkey1 == b.gvkey1 and a.EFF between b.lagdate and b.datadate 
    '''
cw = pd.read_sql_query(qry, conn)

In [None]:
cw['datadate'] = pd.to_datetime(cw['datadate']).astype('datetime64[ns]')

cw['year'] = cw['datadate'].dt.year

In [None]:
df.to_sql('avg_sim', conn, index=False, if_exists='replace')
cw.to_sql('cw', conn, index = False, if_exists='replace')
qry = '''
    select  
        a.*
    from
        avg_sim a join (select distinct gvkey1, year from cw) b
        on a.gvkey1 = b.gvkey1 and a.year = b.year
    '''
cw_sim = pd.read_sql_query(qry, conn)

In [None]:
cw_sim = cw_sim[['gvkey1', 'year', 'dt_s1', 'dt_z1', 'dt_s2', 'dt_z2']]

In [None]:
tnic_industry.to_sql('tnic', conn, index=True, if_exists='replace')
cw_sim.to_sql('cw_sim', conn, index = False, if_exists='replace')
qry = '''
    select  
        a.gvkey1, a.year, a.gvkey2
    from
        tnic a join (select distinct gvkey1, year from cw_sim) b
        on a.gvkey1 = b.gvkey1 and a.year = b.year
    '''
cw_tnic = pd.read_sql_query(qry, conn)

In [None]:
cw_tnic = cw_tnic.merge(df[['gvkey1', 'year', 'dt_s1', 'dt_s2', 'dt_z1', 'dt_z2']]
                        , left_on=['gvkey2', 'year'], right_on=['gvkey1', 'year'])

In [None]:
cw_tnic.drop(columns=['gvkey1_y'], inplace=True)
cw_tnic.rename(columns={'gvkey1_x':'gvkey1'}, inplace=True)

In [None]:
tnic_avg = cw_tnic.groupby(['gvkey1', 'year']).mean().drop(columns=['gvkey2'])

In [None]:
cw_sim = cw_sim.merge(tnic_avg, left_on=['gvkey1', 'year'], right_on=['gvkey1', 'year'])

In [None]:
cw_sim['dt_s1'] = cw_sim['dt_s1_x'] - cw_sim['dt_s1_y']
cw_sim['dt_z1'] = cw_sim['dt_z1_x'] - cw_sim['dt_z1_y']
cw_sim['dt_s2'] = cw_sim['dt_s2_x'] - cw_sim['dt_s2_y']
cw_sim['dt_z2'] = cw_sim['dt_z2_x'] - cw_sim['dt_z2_y'] 
# cw_sim = cw_sim[['gvkey1', 'year', 'dt_s1', 'dt_z1', 'dt_s2', 'dt_z2']]

In [None]:
cw = cw.merge(cw_sim, left_on=['gvkey1', 'year'], right_on=['gvkey1', 'year'])

In [None]:
cw.to_stata('/Users/ohn0000/Dropbox/Project/cko/2_pipeline/cw.dta')

### Materiality of M&A

In [None]:
material = pd.read_csv('/Users/ohn0000/Project/cko/0_data/external/materiality.csv')
material.set_index(["year", "gvkey1"], inplace=True, verify_integrity=True)

Be careful since the _year_ here refers to the M&A firm-year. The _year_ in __avg_sim__ is the year competitors are identified. 

In [None]:
manual = disc.join(material)[['DATADATE', 'CIK', 'TGTAT_ACQAT', 'TGTDVAL_ACQAT', 'MD_A', 'PROFORMA']].sort_index()

In [None]:
# import wrds
# db = wrds.Connection(wrds_username = "yaera")

# ma_details_desc = db.describe_table('sdc', 'ma_details').sort_values('name')
# with pd.option_context('display.max_rows', None):
#     print(ma_details_desc)

|     Variable | Description                    |
|:------------:|:-------------------------------|
|bookvalue     |Target Book Value (\$mil)       |
|compete       |Competing Bidder (Y/N)          |
|competecode   |Competing Bid Deal Code         |  
|dateann       |Date Announced                  |
|dateannest    |_dateann_ is estimated (Y/N)    | 
|dateeff       |Date Effective                  | 
|ebitltm       |Target EBIT LTM (\$mil)         |
|pct_cash      |Percentage of consideration paid in cash|
|pct_other|Percentage of consideration paid in other then cash or stock|
|pct_stk|Percentage of consideration paid in stock|
|pct_unknown|Percentage of consideration which is unknown|
|ptincltm|Target Pre-Tax Income LTM (\$mil)|
|salesltm|Target Sales LTM (\$mil)|
|rankval|Ranking Value incl Net Debt of Target (\$mil)|

Run sql query below on _WRDS_

In [None]:
# import wrds
# sdc_query = """
# select master_deal_no as DealNumber, 
#         bookvalue, 
#         compete, 
#         competecode, 
#         dateann, 
#         dateannest, 
#         dateeff, 
#         ebitltm, 
#         pct_cash,
#         pct_other,
#         pct_stk,
#         pct_unknown,
#         ptincltm,
#         salesltm,
#         rankval
# from sdc.ma_details
# where dateeff is not null 
# """
# # and master_deal_no in %(deal_no)s
# sdc = db.raw_sql(sdc_query, date_cols=['dateann', 'dateeff'])
# sdc.to_pickle('/home/upenn/yaera/sdc.pkl')

In [None]:
sdc = pd.read_pickle('/Users/ohn0000/Project/cko/0_data/external/sdc.pkl')
sdc.drop_duplicates('dealnumber', inplace = True)
sdc['dealnumber'] = sdc['dealnumber'].astype('int64')

# clear up values and change dtype to 'float'
for column in ['bookvalue', 'ebitltm', 'pct_cash', 'pct_other', 'pct_stk', 'pct_unknown', 'ptincltm', 'salesltm', 'rankval']:
    sdc[column] = sdc[column].apply(lambda x: np.NaN if x == '*********' else (np.NaN if pd.isna(x) else (float(x.replace(',', '')) if isinstance(x, str) else float(x))))
    sdc[column].astype('float16')

In [None]:
sdc_sub = pd.merge(sdc_link, sdc,
                   left_index=True, right_on='dealnumber').drop('dealnumber', axis='columns')
sdc_sub.index.name = 'dealnumber'

In [None]:
sdc_sub.sort_values(['agvkey', 'dateeff'], inplace=True)

In [None]:
sdc_link['agvkey'].count() / sdc['dealnumber'].count()

In [None]:
sdc_sub['agvkey'].count() / sdc_link['agvkey'].count() 

In [None]:
sdc_sub.profile_report(style={'full_width':True})

Use __compustat__ _datadate_ and gvkey to link the sdc data to the similarity scores

In [None]:
import wrds
db = wrds.Connection(wrds_username = 'hohn')

sdc_quary = """
select gvkey, datadate, fyear, cusip,  cik
from comp.funda
where consol = %(consol)s and indfmt in %(indfmt)s and datafmt = %(datafmt)s and popsrc = %(popsrc)s and curcd in %(curcd)s
"""

parm = {'consol':('C'), 'indfmt' : ('INDL', 'FS'), 'datafmt': ('STD'), 'popsrc' : ('D'), 'curcd' : ('USD', 'CAD')}

funda = db.raw_sql(sdc_quary, params = parm, date_cols = ['datadate'])

In [None]:
funda['start'] = funda['datadate'] - pd.DateOffset(months = 12) + pd.DateOffset(days = 1)
funda['gvkey'] = funda['gvkey'].astype('int64')
funda.set_index('gvkey', inplace=True)

In [None]:
funda.fyear = funda.fyear.astype('Int16')

In [None]:
import pandasql as ps

sql_query = '''
select a.*, b.datadate, b.fyear, b.cusip, b.cik
from sdc_sub a left join funda b
on a.agvkey = b.gvkey and a.dateeff between b.start and b.datadate
'''

newdf = ps.sqldf(sql_query, locals())

In [None]:
col = list(newdf)
for i in range(2, 6):
    col.insert(i, col.pop(-1))
newdf = newdf.loc[:,col]

In [None]:
for i in ['datadate', 'dateann', 'dateeff']:
    newdf[i] = newdf[i].astype('datetime64[ns]')
    
newdf['year'] = newdf['datadate'].dt.year.astype('Int16')
for i in ['fyear', 'agvkey', 'tgvkey']:
    newdf[i] = newdf[i].astype('Int64')

In [None]:
col = list(newdf)
col.insert(col.index('datadate'), col.pop(col.index('year')))
newdf = newdf.loc[:,col]

In [None]:
newdf = newdf.drop_duplicates(subset='dealnumber')

In [None]:
newdf[newdf['agvkey'].notnull()]

In [None]:
newdf['rankval'].count()

18994 observations with non-missing _rankval_

In [None]:
newdf['salesltm'].count()

8055 observations with non-missing _salesltm_

In [None]:
np.sum(newdf['rankval'].notnull() & newdf['salesltm'].notnull())

6445 observations with both _rankval_ and _salesltm_ available

## Append similarity score between acquirer and target

In [None]:
upload = newdf[newdf['agvkey'].notnull() & newdf['tgvkey'].notnull() & newdf['year'].notnull()][['agvkey', 'tgvkey', 'year']].rename(columns={'agvkey':'gvkey1', 'tgvkey':'gvkey2'})
upload.to_csv('/Users/ohn0000/Project/cko/2_pipeline/upload.csv', index=False)
!scp /Users/ohn0000/Project/cko/2_pipeline/upload.csv $WRDS:/scratch/ou/hohn

Run this on wrds server. The __TNIC_All__ files should be uploaded in scratch beforehand.

In [None]:
"""
The server killed the previous code that joins after combines all files. The current code instead loop over the files.
"""
# !cd /scratch/ou/hohn/TNIC_AllPairsDistrib
# !cat tnicall1996.txt > tnicall_combined.txt
# !for file in tnicall{1997..2017}.txt; do sed '1d' $file >> tnicall_combined.txt; done
# !cd ~


"""
atsim.py
"""


In [None]:
!scp atsim.py $WRDS:~

In [None]:
!scp $WRDS:/scratch/ou/hohn/atsim.csv /Users/ohn0000/Project/cko/2_pipeline/

In [None]:
col = list(newdf)
col.insert(col.index('bookvalue'), col.pop(col.index('atsim')))
newdf = newdf.loc[:,col]

## IV candidates

The materiality measure based on deal value will be the last resort for the IV.   
Alternatively, 2SLS using multiple IVs is feasible.

Candidates
* Max deal value
* Sum deal value
* Datedif between _dateeff_ and _datadate_
    * _dateeff_ of the first M&A
    * _dateeff_ of the largest M&A
    * weighted average of _dateeff_ 

## Cross-sections
* Similarity between acquirer and target 
    - Relation stronger in diversifying
    - Could be more of a U-shaped relation, i.e., competitors don't follow when you move far enough
* Average value of pre-similarities between acquirer and close competitors 
    - Prediction not clear
* M&A performance during the completed firm-year
    - Relation stronger when M&A was more successful <-> how do we define success of an M&A?
* Number of close competitors of the target
    - Potential targets are candidates of future mergers
* How many competitors were there initially?
    - The size of the TNIC industry