# Intro

A special purpose acquisition company (SPAC) is a company with no commercial operations that is formed strictly to raise capital through an initial public offering (IPO) for the purpose of acquiring an existing company. It is gaining more and more attention compared to the original IPO style.
If you are interested in more background, please read [this article](https://www.investopedia.com/terms/s/spac.asp) and [this article](https://www.specialsituationinvestments.com/spacs/) for more detail. 

Ideally, when a target acquiring company is introduced by the SPAC, the stock price would increase above $10 as it is no loonger a blank check stock.
In this script, we try to see if this statement is true, and how much money we can gain from this. 
Finally, we also want to explore some other features/indicators that would result in significant price change between the target announced date.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
# for dirname, _, filenames in os.walk('/kaggle/input'):
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session



# Data preprocessing

Assumption:
* I use Close price as the day price
* Spacs information is gathered from here: https://www.spactrax.com/companies.html

In [None]:
import time
from datetime import datetime
import plotly_express as px
from plotly.offline import init_notebook_mode
from collections import defaultdict
import seaborn as sns
from matplotlib.pyplot import figure
init_notebook_mode(connected=True)

def plot_stock_data(data,title):
    '''function for plotting stock data'''
    plot = px.line(data, 
                        x="Date", 
                        y=["Close"], 
                        hover_name="Date",
                        line_shape="linear",
                        title=title) 
    return plot

In [None]:
# get csv encoding type
import chardet    
rawdata = open('../input/stock-spac/SPAC Spreadsheet - Sort filter and export the SPACTRAX list.csv', 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']
print(charenc)

# load in SPAC data
# dateparse = lambda c: pd.to_datetime(c, format='%m/%d/%Y', errors='coerce')
dateparse = lambda c: pd.to_datetime(c, format='%Y-%m-%d', errors='coerce')
        
spac = pd.read_csv('../input/stock-spac/SPAC Spreadsheet - Sort filter and export the SPACTRAX list.csv',
                  parse_dates=['Initial S-1 Date', 'IPO Date', 'Definitive Agreement'], date_parser=dateparse, encoding = charenc)


In [None]:
spac.head()

In [None]:
print("number of all stocks in spacs info csv file  : %d " % len(spac))
spac_target_announced = spac[spac['Stage'].isin(['3. Target Announced', '4. Deal Approved', '5. Merger Complete']) ]
spac_target_announced.dropna(subset=['Definitive Agreement'], inplace=True)
print("number of all stocks where target is announced: %d " % len(spac_target_announced)) 
sym_date = spac_target_announced[["Shares Symbol", "Definitive Agreement", "IPO Date", "Initial Size (in millions)", "Tags", "Sponsors", "Underwriters"]]

In [None]:
#show some data
sym_date = sym_date.reset_index()
sym_date.head()

# Let's see if spac prices increase after 'Definitive Agreement date'

In [None]:
#get prices_diffs, prices_pmaxs

found_count = 0
prices_diffs_avg= [] # 7days average price difference betweeen accqusition target info released ('Definitive Agreement date')
prices_diffs_max= [] # 7days max/average price difference betweeen accqusition target info released ('Definitive Agreement date')
prices_pmaxs= [] # maximum price of stocks for all time period
prices_premn= []
prices_prepreratio1= []
prices_prepreratio2= []
prices_prepreratio3= []
volume_prepreratio1= []
volume_prepreratio2= []
volume_prepreratio3= []

for i in range(len(sym_date)):
    symbol = sym_date['Shares Symbol'][i]
    date = sym_date['Definitive Agreement'][i]
    path = "../input/stock-spac/spac_dailyprice/%s.csv" % (symbol)
    if os.path.exists(path):
        dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m-%d')
        stock = pd.read_csv(path, index_col='Date', parse_dates=['Date'], date_parser=dateparse)
        # calc 7 day stock prices right BEFORE Definitive Agreement date
        pre_price = stock[date-pd.to_timedelta(7, unit='d') : date-pd.to_timedelta(1, unit='d')]['Close'] 
        # calc 7 day stock prices right AFTER Definitive Agreement date
        aft_price = stock[date : date+pd.to_timedelta(7, unit='d')]['Close'] 
        if len(pre_price) < 2:
            print("possible error: not enought pre_price data, merged right after ipo")
            print(symbol)
            print(date)
            print(i)
            sym_date = sym_date.drop([i])
        elif len(aft_price) < 2:
            print("possible error: not enought aft_price data, just announced target")
            print(symbol)
            print(date)
            print(i)
            sym_date = sym_date.drop([i])
        else:
            # calc 7 day average for both before and after 
            avg_price_diff = np.mean(aft_price) - np.mean(pre_price) 
            prices_diffs_avg.append(avg_price_diff)
            assert not pd.isna(avg_price_diff), "possible error: avg_price_diff is nan. {} {}".format(symbol, date)
            # calc 7 day max for after and 7 day mean for before
            avg_price_diff = np.max(aft_price) - np.mean(pre_price) 
            prices_diffs_max.append(avg_price_diff)
            assert not pd.isna(avg_price_diff), "possible error: avg_price_diff is nan. {} {}".format(symbol, date)
            # 
            tmp = stock[date-pd.to_timedelta(14, unit='d') : date-pd.to_timedelta(7, unit='d')]['Close'] 
            prices_prepreratio1.append(np.mean(pre_price)/np.mean(tmp) )
            prices_prepreratio2.append(pre_price[-1]/np.mean(pre_price) )
            tmp = stock[date-pd.to_timedelta(31, unit='d') : date-pd.to_timedelta(1, unit='d')]['Close'] 
            prices_prepreratio3.append(pre_price[-1]/np.mean(tmp) )
            #
            pre_vol = stock[date-pd.to_timedelta(7, unit='d') : date-pd.to_timedelta(1, unit='d')]['Volume'] 
            tmp = stock[date-pd.to_timedelta(14, unit='d') : date-pd.to_timedelta(7, unit='d')]['Volume'] 
            if np.mean(tmp) > 0:
                volume_prepreratio1.append(min(np.mean(pre_vol)/np.mean(tmp), 4.0 ))
            else:
                volume_prepreratio1.append(0)
            if np.mean(pre_vol) > 0:
                volume_prepreratio2.append(min(pre_vol[-1]/np.mean(pre_vol), 4.0 ))
            else:
                volume_prepreratio1.append(0)
            #
            prices_premn.append(np.mean(pre_price))
            found_count += 1
            prices_pmaxs.append(stock[:date]['Close'].max())
#           if avg_price_diff < 1:
#               print(pre_price)
#               print(aft_price)
#               print(symbol)
    else:
        sym_date = sym_date.drop([i])
print(found_count)
print(len(prices_diffs_avg))
print(len(prices_diffs_max))
print(len(sym_date))
prices_diffs_avg = np.array(prices_diffs_avg)  
prices_diffs_max = np.array(prices_diffs_max)  
prices_pmaxs = np.array(prices_pmaxs) 
prices_premn = np.array(prices_premn)
prices_prepreratio1 = np.array(prices_prepreratio1)
prices_prepreratio2 = np.array(prices_prepreratio2)
prices_prepreratio3 = np.array(prices_prepreratio3)
volume_prepreratio1 = np.array(volume_prepreratio1)
volume_prepreratio2 = np.array(volume_prepreratio2)
volume_prepreratio3 = np.array(volume_prepreratio3)

sym_date = sym_date.reset_index() 
sym_date["Price diff"] = prices_diffs_max
sym_date["Price premn"] = prices_premn
sym_date["Price prepriceratio7days"] = prices_prepreratio1 # average of 7-0 days before aggreement date devide that of  14-7 days 
sym_date["Price 7dayaverage"] = prices_prepreratio2        # last before aggreement date devide that of average of 7-0 days
sym_date["Volume prepriceratio7days"] = volume_prepreratio1
sym_date["Volume 7dayaverage"] = volume_prepreratio2

All preprocess is complete!

But before we can see if we can earn some money, lets define some notions first.

1. "Definitive Agreement date" is the date when the final target is announced.
2. To measure how much many we earn, I introduce two specific evaluation metrics (prices_diffs_max and prices_diffs_avg). Since prices are volatile between Definitive Agreement date. we took the average of price in 5 consecutive trasaction days before and after. Basically, we assumed that we buy @ the 5 day average before Definitive Agreement date and sell with 2 different techinique.

prices_diffs_max: if then sold on 5 trasaction day maximum 

prices_diffs_avg: if then sold on 5 trasaction day average


In [None]:

import matplotlib.pyplot as plt

n, bins, patches = plt.hist(prices_diffs_max, 24, range=[-3, 20])
plt.title("Histgram: Prices differences between 'Definitive Agreement date' where x axis is the diff value in dollars and y axis is the count")
plt.xlabel('Price')
plt.ylabel('Counts')
plt.show()

n, bins, patches = plt.hist(prices_pmaxs)
plt.title("Histgram: Maximum price during the whole stock life")
plt.xlabel('Price')
plt.ylabel('Counts')
plt.show()

print("prices_premn - mean price before Definitive Agreement Date：")
print(np.mean(prices_premn))
print("prices_diffs_avg - mean earing per stock for buying spacs before Definitive Agreement Date (and sold on 7 day average after)：")
print(np.mean(prices_diffs_avg))
print("prices_diffs_max - mean earing per stock for buying spacs before Definitive Agreement Date (and sold on 7 day maximum after)：")
print(np.mean(prices_diffs_max))
# print(prices_diffs_max)
print("Number of stock that end up lower: {} out of {}".format(np.sum(prices_diffs_avg<0), len(prices_diffs_avg)))

From above, we can see that Close price increases after "Definitive Agreement Date". And it is about \\$2 earining per \\$10 for a week. That is some profit for only a short amount of time. However, this number ranges from -2.5 to 17.5. So even if we build a perfect model that predicts this date right, we could still end up losing money for a particular stock. If we could select some stocks to invest, we could end up earning more.

# How many sudden increase exist besides 'Definitive Agreement'?

Before checking "what affect the diff value between 'Definitive Agreement' days?" Lets see if there exist more potential buying times that end up incease the price of spac all of a sudden.

In [None]:
potential_spac_symbol = []
potential_spac_pricediffs = []
potential_spac_diffedates = []
sudden_price_diff_threshold = 0.5

for i in range(len(sym_date)):
    symbol = sym_date['Shares Symbol'][i]
    date = sym_date['Definitive Agreement'][i]
    path = "../input/stock-spac/spac_dailyprice/%s.csv" % (symbol)
    if os.path.exists(path):
        dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m-%d')
        stock = pd.read_csv(path, index_col='Date', parse_dates=['Date'], date_parser=dateparse)
        first_date = stock.index.values[0]
        if len(stock[:date]) > 45:
            last_times2 = stock[first_date+pd.to_timedelta(30, unit='d'): date].index.values
            tmpa = []
            tmpb = []
            last_time = last_times2[0]
            for timee in last_times2:
                before = stock[timee-pd.to_timedelta(7, unit='d') : timee-pd.to_timedelta(1, unit='d')]['Close'] 
                after = stock[timee : timee + pd.to_timedelta(7, unit='d')]['Close']
                price_diff = np.mean(after) - np.mean(before)
                if price_diff > sudden_price_diff_threshold and timee - last_time > pd.to_timedelta(7, unit='d'):
                    tmpa.append(price_diff)
                    tmpb.append(date - timee)
                    last_time = timee 
            potential_spac_symbol.append(symbol)
            potential_spac_pricediffs.append(tmpa)
            potential_spac_diffedates.append(tmpb)

counta = 0
for tmp in potential_spac_pricediffs:
    counta += len(tmp)
countb = 0
for tmp in potential_spac_diffedates:
    for tmpp in tmp:
        if tmpp > pd.to_timedelta(7, unit='d'):
            countb+=1
            
# print(count)
# print(np.sum(prices_diffs_avg  > 1))
print("Total number of stocks considered {}".format(len(potential_spac_symbol)))
print("Defined suuden increase limit is > {} between 7 days average".format(sudden_price_diff_threshold))
print("potential number of sudden increase(including finaldays) {}".format(counta))
print("potential number of sudden increase(excluding finaldays) {}".format(countb))
            

# What else would affect the diff value between 'Definitive Agreement' days? 

would it be the days between 'Definitive Agreement' and 'Ipo days', would it be the category of target (Technology, Ecommerce, Software, Healthcare, Financial Services, Artificial Intelligence)?

# # Dates

Because most acquisitions must complete within 3 years. Lets first see if this time constraint affects the stock moving price between the 'Definitive Agreement Date'.

In [None]:
figure(num=None, figsize=(20, 6), dpi=80, facecolor='w', edgecolor='k')
plt.title("The date where SPACs are ipoed")
n, bins, patches = plt.hist(sym_date["IPO Date"], bins=26)
plt.xlabel('Dates')
plt.ylabel('Counts')
plt.show()

figure(num=None, figsize=(20, 6), dpi=80, facecolor='w', edgecolor='k')
plt.title("The date where SPACs announced their final target (Definitive Agreement Date)")
n, bins, patches = plt.hist(sym_date["Definitive Agreement"], bins=26)
plt.xlabel('Dates')
plt.ylabel('Counts')
plt.show()

figure(num=None, figsize=(20, 6), dpi=80, facecolor='w', edgecolor='k')
plt.title("The month of year when Definitive Agreement is announced")
sym_date["Definitive Agreement"].groupby(sym_date["Definitive Agreement"].dt.month).count().plot(kind="bar")
plt.xlabel('Month')
plt.ylabel('Counts')
plt.show()

figure(num=None, figsize=(20, 6), dpi=80, facecolor='w', edgecolor='k')
released_info_after_ipo_days = ((sym_date["Definitive Agreement"] - sym_date["IPO Date"])/ np.timedelta64(1, 'D')).astype(int)
print("Average Number of days gapped beteween Definitive Agreement date and IPO Date: %d" % np.mean(released_info_after_ipo_days))
n, bins, patches = plt.hist(released_info_after_ipo_days, bins=26)
sym_date["Days gapped"] = released_info_after_ipo_days
plt.title("Gap beteween Definitive Agreement date  and IPO Date")
plt.xlabel('Days')
plt.ylabel('Counts')
plt.show()


We can see that SPACs are getting more popular starting from year 2020. And majority of SPACs announced their final target within a year. So, in the worst case scenario, if we purchase SPAC on the IPO date, we will gain \\$2 earining after 300 days on average. But we definetly want to shorten this time to reduce risk.

But anyways, would days beteween Definitive Agreement date and IPO Date affect the price change (our earning strategy)?


In [None]:
days_money_sum = np.array([0.0]*(sym_date["Days gapped"].max()//91+1))
days_money_cnt = np.array([0]*(sym_date["Days gapped"].max()//91+1))
# print(len(days_money_cnt))
for index, row in sym_date.iterrows():
    pos = row["Days gapped"] // 91
    days_money_sum[pos] += row["Price diff"]
    days_money_cnt[pos] += 1
    
tmp = days_money_sum/days_money_cnt
idx = 1
for price, count in zip(tmp, days_money_cnt):
    print("mean earing per stock if acquisition target is announced in the {}th quarter after IPO (out of {} sotcks): {}".format(idx, count, price))
    idx += 1

sns.jointplot(x="Days gapped", y="Price diff", data=sym_date, kind='hex', 
              gridsize=40)
plt.show()

it seems that target announcement have some minor influence on the stock price differences.

# # Price and Volume

In [None]:

sns.jointplot(x="Price premn", y="Price diff", data=sym_date, kind='hex', 
              gridsize=40)
plt.show()
sns.jointplot(x="Price prepriceratio7days", y="Price diff", data=sym_date, kind='hex', 
              gridsize=40)
plt.show()
sns.jointplot(x="Price 7dayaverage", y="Price diff", data=sym_date, kind='hex', 
              gridsize=40)
plt.show()
sns.jointplot(x="Volume prepriceratio7days", y="Price diff", data=sym_date, kind='hex', 
              gridsize=40)
plt.show()
sns.jointplot(x="Volume 7dayaverage", y="Price diff", data=sym_date, kind='hex', 
              gridsize=40)
plt.show()

# # Initial IPO Size

Lets see if ipo size makes more sense:

In [None]:
sns.jointplot(x="Initial Size (in millions)", y="Price diff", data=sym_date, kind='hex', 
              gridsize=40)

abc = figure(num=None, figsize=(15, 15), dpi=80, facecolor='w', edgecolor='k')
tmp_df = sym_date [["Initial Size (in millions)", "Price diff"]]
tmp_df.plot(x='Initial Size (in millions)', y='Price diff', kind="scatter", marker='.')
plt.show()

# # category
Lets see if category makes more sense:

In [None]:
cate_money_sum = defaultdict(float)
cate_money_cnt = defaultdict(int)
 
for index, row in sym_date.iterrows():
    if not pd.isna(row["Tags"]):
        categories_list = row["Tags"].split(',') 
        categories_list = [i.strip() for i in categories_list]
        for category in categories_list:
            cate_money_sum[category] += row["Price diff"]
            cate_money_cnt[category] += 1

results = []
for k in cate_money_sum.keys():
    avg = cate_money_sum[k] / cate_money_cnt[k]
    results.append((k, avg, cate_money_cnt[k]))
    
results.sort(key = lambda x: x[1], reverse=True) 
print("mean earing per stock if the company is specialized on the following category and its support cases in int:")

for k in results:
    print("%s\t%.3f\t%s"%(k[0],k[1],k[2]))

From above, we can see that 

things like： Life Sciences， Sustainability， Artificial Intelligence， Telecommunications， Media,  Cannabis are great indications on strong suppoters for the market to believe a sucessful acquisitions.

things like: Hospitality and Fashion makes bad acquisitions for now. It somewhat matches with the covid-19 situation. Since covid, people stays home and travels less, and Hospitality bbusiness is not booming as of now. So there is no hurry in buying those SPACs as well.

# # sponsors and underwritters

Similarly lets see if we could find some storng sponsors and underwritters

In [None]:
spns_money_sum = defaultdict(float)
spns_money_cnt = defaultdict(int)
 
for index, row in sym_date.iterrows():
    if not pd.isna(row["Sponsors"]):
        categories_list = row["Sponsors"].split(',') 
        categories_list = [i.strip()[:10] for i in categories_list] 
        # this Sponsors list is really wierd, we have unique strings but are really similar like "Altimeter Growth Holdings" and "Altimeter Growth Holdings 2"
        # here we assume theat the first 10 char is unqiue, using regular expression might be better here.
        for category in categories_list:
            spns_money_sum[category] += row["Price diff"]
            spns_money_cnt[category] += 1

results = []
for k in spns_money_sum.keys():
    avg = spns_money_sum[k] / spns_money_cnt[k]
    results.append((k, avg, spns_money_cnt[k]))
    
results.sort(key = lambda x: x[1], reverse=True) 
print("mean earing per stock if the company has underwriter and its support cases in int:")

cnt = 0
for k in results:
    print("%s\t%.3f\t%d"%(k[0],k[1],k[2]))
    cnt += k[2]
print("total number of support cases: %d" % cnt)

In [None]:
undr_money = defaultdict(list)
 
for index, row in sym_date.iterrows():
    if not pd.isna(row["Underwriters"]):
        categories_list = row["Underwriters"].split(',') 
        categories_list = [i.strip() for i in categories_list]
        for category in categories_list:
            undr_money[category].append(row["Price diff"])

results = []
for k in undr_money.keys():
    avg = np.mean(undr_money[k])
    results.append((k, avg, len(undr_money[k])))
    
results.sort(key = lambda x: x[1], reverse=True) 
print("mean earing per stock if the company has underwriter and its support cases in int:")

cnt = 0
for k in results:
    print("%s\t%.3f\t%d"%(k[0],k[1],k[2]))
    cnt += k[2]
print("total number of support cases: %d" % cnt)

While "Robert W. Baird & Co" is the best among all, however, there is only 1 support case as of 02/2021. But let me take a pause here and buy some SPACs that are underwitten by "Barclays Capital"

In [None]:
figure(num=None, figsize=(22, 5), dpi=80, facecolor='w', edgecolor='k')

tmp_df = [["Barclays Capital", k] for k in undr_money["Barclays Capital"]]
tmp_df += [["EarlyBirdCapital", k] for k in undr_money["EarlyBirdCapital"]]
tmp_df += [["UBS Securities", k] for k in undr_money["UBS Securities"]]
tmp_df += [["Jefferies", k] for k in undr_money["Jefferies"]]
tmp_df += [["Morgan Stanley & Co", k] for k in undr_money["Morgan Stanley & Co"]]
tmp_df += [["Deutsche Bank Securities", k] for k in undr_money["Deutsche Bank Securities"]]
tmp_df += [["Goldman Sachs & Co", k] for k in undr_money["Goldman Sachs & Co"]]
tmp_df += [["I-Bankers Securities", k] for k in undr_money["I-Bankers Securities"]]
tmp_df += [["J.P. Morgan Securities", k] for k in undr_money["J.P. Morgan Securities"]]
tmp_df += [["Credit Suisse Securities (USA)", k] for k in undr_money["Credit Suisse Securities (USA)"]]


tmp_df = pd.DataFrame(tmp_df, 
                   columns=['category', 'price diff'])



ax = sns.violinplot(x="category", y="price diff",data=tmp_df)


plt.show()

print(undr_money["Barclays Capital"])
print(undr_money["Goldman Sachs & Co"])

# Conclusion and Future work:

* If we guessed the "Definitive Agreement Date" right and buy before the SPAC releases info, the average gain per stock is about \\$1.628 out of about 10 (max is 2.672). That is some profit for only a week. However, this number ranges from -2.03 to 6.84. So even if we build a perfect model that predicts this date right, we could still end up losing money. How to solve this?
* Compare version 16 with version 17, the dataset is updated from Feb 2021 to end of March 2021. Because of the general trend (NASDAQ and SP500) went down in Feb and March 2021, we can see that SPAC also went down from 1.6 to 1.2.
* Some guy in the reddit claimed that we end up gaining more if we buy before "Definitive Agreement Date" and sold after "Merger Completion Date". Even though this means that we have higher opportunity (time) cost, but how much do we gain? See this [script](https://www.kaggle.com/overfit/spac-explore-merger) for more detail
* What are some good indicators to predict those days that spac will announce: 
[volume](https://www.reddit.com/r/SPACs/comments/jdib8u/indicators_to_look_for_spac_that_will_announce/), 
[rumor](https://api.stocktwits.com/developers/docs/api), 
[sec-fillings](https://www.semanticscholar.org/paper/Predicting-Merger-Targets-and-Acquirers-from-Text-Routledge-Sacchetto/9bc8fb17a0708b2d9d4597c499bbdb385b9f8c56)
* It seems that some companies may annouce plan for merger (Form 425 - Prospectuses and communications, business combinations) multiple times, and sometimes this price is higher than the effect on the final Definitive Agreement Date. Maybe we could buy at those times as well to increase profit. See this [script](https://www.kaggle.com/overfit/spac-price-filling-explore) for more detial.

POST EDIT

For volume, here we used LSTMs to predict buy time: https://www.kaggle.com/overfit/spac-definiteagreementdate-prediction