# TASK:
Okay, had some time to pull data. I've attached a couple of reports on ARQT just so that you can get an idea of how investors are looking at/evaluting biotechs generally, but the most value that you'll be able to create for us will be on the quantitative side, so that is the best place to focus for you to have the highest impact.

So, here's something to give a crack. It'd be really helpful to try to quantify "what moves the XBI?" So, I've included returns of XBI and other equity indices, bond yields, fund flows (you can calculate these by using the XBI Market Cap tab and calculating week over week changes in total market cap), and a list of historical M&A events. We would love to get an idea of how we can decompose XBI returns to figure out the most predictive variables.

Some examples of valuable output would be things like:

If there is an acquisition of more than X billion, XBI will generate outperformance of Y% over the next Z days or weeks
XBI is most sensitive to X factor historically (maybe bond yields or microcap returns or frequency of M&A), and Y tends to be a leading indicator of this outperformance"
"If yields rise more than X% in a Y-month period, future XBI returns underperform by Z%
Mega M&A (>$10bn) moves XBI X% more than smaller M&A events and produces more/less predictable XBI performance

It would also be helpful to see if M&A events drive positive fund flows. If so, maybe there's a predictable link between M&A -> positive fund flows -> rising XBI price

Also, consider using the 10 year or 30 year yields and subtracting the 2 year yield to determine if the yield curve is inverted or not. Maybe XBI returns are different or correlations change if the yield curve is inverted?

Other factors that you could generate/use would be the political party that is in control, especially of the FTC. You could quantify each year on some metric of leniency to strictness on pursuing FTC action against pharma acquisitions, and then see if this has a correlation with M&A frequency and XBI returns. For example, I'd classify Lina Khan as being fairly strict. Have there been fewer acquisitions this year than normal? Maybe the # has been the same, but the $ value is lower? What do historical correlations tell us about expected XBI returns in an environment like that?

Also, consider looking at XBI performance during historical election years. Is there typically less M&A during the 6-9 months preceding an election, and does this typically result in a weaker XBI?

One factor that I'd love to include (but can't figure out yet) is if there have been impactful data readouts for the sector. I have anecdotally noticed that XBI tends to perform well if there has been a high $ value or % move catalyst in SMID cap name recently. Maybe this attracts more investors to XBI, or maybe it causes shorts to cover a little bit? If you can think of some scalable way to identify past big data events, that'd be cool.

Let me know if you have any questions. This should be a fairly open project, we want to see how much can we quantify to predict XBI performance. Feel free to add in any additional factors/analyses that you think could be helpful.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

pd.set_option('display.max_columns', None)


xls = pd.ExcelFile('Historical XBI Driver Data.xlsx')

# Load XBI data
df_xbi = pd.read_excel(xls, sheet_name="XBI", skiprows=6)

# Load IBB data
df_ibb = pd.read_excel(xls, sheet_name="IBB", skiprows=6)

# Load IWM data
df_iwm = pd.read_excel(xls, sheet_name="IWM", skiprows=6)

# Load IWC data
df_iwc = pd.read_excel(xls, sheet_name="IWC", skiprows=6)

# Load QQQ data
df_qqq = pd.read_excel(xls, sheet_name="QQQ", skiprows=6)

# Load XLV data
df_xlv = pd.read_excel(xls, sheet_name="XLV", skiprows=6)

# Load US2Y data
df_us2y = pd.read_excel(xls, sheet_name="US2Y", skiprows=5)

# Load US10Y data
df_us10y = pd.read_excel(xls, sheet_name="US10Y", skiprows=5)

# Load US30Y data
df_us30y = pd.read_excel(xls, sheet_name="US30Y", skiprows=5)

# Load XBI Market Cap data
df_mktcap_xbi = pd.read_excel(xls, sheet_name="XBI Market Cap", skiprows=6)

# Load All M&A data
df_all_ma = pd.read_excel(xls, sheet_name="All M&A (>$500M)", skiprows=6)

# Load Mega M&A data
df_mega_ma = pd.read_excel(xls, sheet_name="Mega M&A (>$10B)", skiprows=5)

# Load SMID M&A data
df_smid_ma = pd.read_excel(xls, sheet_name="SMID M&A ($1-10B)", skiprows=5)

df_xbi.rename(columns={
    'PX_LAST': 'Price_Last',
    'Change': 'Price_Change',
    '% Change': 'Price_Percent_Change',
    'PX_VOLUME': 'Volume',
    'Change.1': 'Volume_Change',
    '% Change.1': 'Volume_Percent_Change'
}, inplace=True)

# Rename columns for clarity
df_ibb.rename(columns={
    'PX_LAST': 'Price_Last',
    'Change': 'Price_Change',
    '% Change': 'Price_Percent_Change',
    'PX_VOLUME': 'Volume',
    'Change.1': 'Volume_Change',
    '% Change.1': 'Volume_Percent_Change'
}, inplace=True)

df_iwm.rename(columns={
    'PX_LAST': 'Price_Last',
    'Change': 'Price_Change',
    '% Change': 'Price_Percent_Change',
    'PX_VOLUME': 'Volume',
    'Change.1': 'Volume_Change',
    '% Change.1': 'Volume_Percent_Change'
}, inplace=True)


# Assuming the numerical dates in df_ibb are days since 1 January 1970

# If needed, convert the date column in df_xbi to datetime for consistency
df_xbi['Date'] = pd.to_datetime(df_xbi['Date'])

# Convert the numerical dates to datetime
df_ibb['Date'] = pd.to_datetime(df_ibb['Date'], origin='1899-12-30', unit='D')
df_iwm['Date'] = pd.to_datetime(df_iwm['Date'], origin='1899-12-30', unit='D')
df_iwc['Date'] = pd.to_datetime(df_iwc['Date'], origin='1899-12-30', unit='D')
df_qqq['Date'] = pd.to_datetime(df_qqq['Date'], origin='1899-12-30', unit='D')
df_xlv['Date'] = pd.to_datetime(df_xlv['Date'], origin='1899-12-30', unit='D')
df_us2y['Date'] = pd.to_datetime(df_us2y['Date'], origin='1899-12-30', unit='D')
df_us10y['Date'] = pd.to_datetime(df_us10y['Date'], origin='1899-12-30', unit='D')
df_us30y['Date'] = pd.to_datetime(df_us30y['Date'], origin='1899-12-30', unit='D')
df_mktcap_xbi['Date'] = pd.to_datetime(df_mktcap_xbi['Date'], origin='1899-12-30', unit='D')

df_all_ma = df_all_ma.loc[:, ~df_all_ma.columns.str.contains('^Unnamed')]
df_all_ma.rename(columns={
    'Date Announced': 'Date_Announced',
    'Deal Value \n($B)': 'Deal_Value_Billion',
    'LTM sales\n($M)': 'LTM_Sales_Million',
    'Peak Sales** \nEst ($M)': 'Peak_Sales_Est_Million',
    'Deal value/ LTM sales': 'Deal_Value_LTM_Sales',
    'Deal value/ Peak sales': 'Deal_Value_Peak_Sales',
    'Takeout Forward P/E': 'Takeout_Forward_PE',
    '90 Day Premium': '90_Day_Premium',
    'Cash / Stock*': 'Cash_Stock',
    'Termination Fees': 'Termination_Fees',
    'Financial Advisor (Target)': 'Financial_Advisor_Target',
    'Financial Advisor (Acquirer)': 'Financial_Advisor_Acquirer',
    'PR on Target\'s Website': 'PR_Target_Website'
}, inplace=True)


# Remove unnecessary columns
columns_to_keep_mega = ['Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Deal Value \n($B)', 'LTM sales\n($M)', 'Peak Sales** \nEst ($M)', 'Deal value/ LTM sales', 'Deal value/ Peak sales', 'Takeout Forward P/E', 'Cash / Stock*', 'Termination Fees', 'Financial Advisor (Target)', 'Financial Advisor (Acquirer)', 'Unnamed: 18']
df_mega_ma = df_mega_ma[columns_to_keep_mega]

# Rename columns
df_mega_ma.columns = ['Date', 'Acquirer', 'Target', 'Product', 'Indication', 'Therapeutic_Area', 'Stage', 'Deal_Value_B', 'LTM_Sales_M', 'Peak_Sales_Est_M', 'Deal_Value_LTM_Sales_Ratio', 'Deal_Value_Peak_Sales_Ratio', 'Takeout_Forward_PE', 'Cash_Stock', 'Termination_Fees', 'Financial_Advisor_Target', 'Financial_Advisor_Acquirer', 'Is_Completed']

# Remove unnecessary columns
columns_to_keep_smid = ['Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Deal Value \n($B)', 'LTM sales\n($M)', 'Peak Sales** \nEst ($M)', 'Deal value/ LTM sales', 'Deal value/ Peak sales', 'Takeout Forward P/E', '90 Day Premium', 'Cash / Stock*', 'Termination Fees', 'Financial Advisor (Target)', 'Financial Advisor (Acquirer)', 'Unnamed: 19']
df_smid_ma = df_smid_ma[columns_to_keep_smid]

# Rename columns
df_smid_ma.columns = ['Date', 'Acquirer', 'Target', 'Product', 'Indication', 'Therapeutic_Area', 'Stage', 'Deal_Value_B', 'LTM_Sales_M', 'Peak_Sales_Est_M', 'Deal_Value_LTM_Sales_Ratio', 'Deal_Value_Peak_Sales_Ratio', 'Takeout_Forward_PE', 'Premium_90_Day', 'Cash_Stock', 'Termination_Fees', 'Financial_Advisor_Target', 'Financial_Advisor_Acquirer', 'Is_Completed']

def clean_year_dataset(df):
    # 1. Rename columns for clarity
    df.columns = ['Date', 'PX_LAST', 'Change', '% Change', 'PX_BID', 'Change_BID', '% Change_BID']

    # 2. Convert 'Date' to datetime format
    df['Date'] = pd.to_datetime(df['Date'])

    # 3. Convert numerical columns to float
    numeric_cols = ['PX_LAST', 'Change', '% Change', 'PX_BID', 'Change_BID', '% Change_BID']
    #df[numeric_cols] = df[numeric_cols].astype(float)

    return df

df_us2y = clean_year_dataset(df_us2y)
df_us10y = clean_year_dataset(df_us10y)
df_us30y = clean_year_dataset(df_us30y)

df_all_ma = df_all_ma.dropna(subset=['Date_Announced'])
df_mega_ma = df_mega_ma.dropna(subset=['Date'])
df_smid_ma = df_smid_ma.dropna(subset=['Date'])

# Create a temporary string column
df_all_ma['Date_Announced_Str'] = df_all_ma['Date_Announced'].astype(str)
# Filter rows that contain '00:00:00'
df_all_ma = df_all_ma[df_all_ma['Date_Announced_Str'].str.contains('00:00:00', na=False)]
# Drop the temporary column
df_all_ma = df_all_ma.drop('Date_Announced_Str', axis=1)

# Create a temporary string column
df_smid_ma['Date_Announced_Str'] = df_smid_ma['Date'].astype(str)
# Filter rows that contain '00:00:00'
df_smid_ma = df_smid_ma[df_smid_ma['Date_Announced_Str'].str.contains('00:00:00', na=False)]
# Drop the temporary column
df_smid_ma = df_smid_ma.drop('Date_Announced_Str', axis=1)

# Create a temporary string column
df_mega_ma['Date_Announced_Str'] = df_mega_ma['Date'].astype(str)
# Filter rows that contain '00:00:00'
df_mega_ma = df_mega_ma[df_mega_ma['Date_Announced_Str'].str.contains('00:00:00', na=False)]
# Drop the temporary column
df_mega_ma = df_mega_ma.drop('Date_Announced_Str', axis=1)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_all_ma.rename(columns={


In [None]:
# Function to calculate fund flows
def calculate_fund_flows(df):
    df['Market_Cap_Change'] = df['CUR_MKT_CAP'].pct_change()
    return df

# Calculate fund flows for XBI
df_mktcap_xbi = calculate_fund_flows(df_mktcap_xbi)

# Function to calculate M&A frequency
def calculate_ma_frequency(df):
    df['M&A_Frequency'] = df['Deal_Value_B'].rolling(window=12).mean()
    return df

# Calculate M&A frequency for Mega M&A
df_mega_ma = calculate_ma_frequency(df_mega_ma)

# Function to calculate yield curve inversion
def calculate_yield_curve_inversion(df):
    df['Yield_Curve_Inversion'] = df['US10Y'] - df['US2Y']
    return df

# Calculate yield curve inversion for US10Y and US2Y
df_us10y = calculate_yield_curve_inversion(df_us10y)
df_us2y = calculate_yield_curve_inversion(df_us2y)

# Function to calculate political party control
def calculate_political_party_control(df):
    df['Political_Party_Control'] = np.where(df['Year'] % 4 == 0, 'Democratic', np.where(df['Year'] % 4 == 1, 'Republican', 'Other'))
    return df

# Calculate political party control for Mega M&A
df_mega_ma = calculate_political_party_control(df_mega_ma)

# Function to calculate data readouts
def calculate_data_readouts(df):
    df['Data_Readouts'] = np.where(df['Year'] % 4 == 0, 1, 0)
    return df

# Calculate data readouts for Mega M&A
df_mega_ma = calculate_data_readouts(df_mega_ma)

# Function to calculate M&A size
def calculate_ma_size(df):
    df['M&A_Size'] = np.where(df['Deal_Value_B'] > 10, 'Large', 'Small')
    return df

# Calculate M&A size for Mega M&A
df_mega_ma = calculate_ma_size(df_mega_ma)

# Function to calculate XBI returns
def calculate_xbi_returns(df):
    df['XBI_Returns'] = df['Price_Last'].pct_change()
    return df

# Calculate XBI returns for XBI
df_xbi = calculate_xbi_returns(df_xbi)

# Function to calculate correlations
def calculate_correlations(df):
    df['Correlations'] = df['XBI_Returns'].corr(df['M&A_Frequency'])
    return df

# Calculate correlations for Mega M&A
df_mega_ma = calculate_correlations(df_mega_ma)

# Function to train linear regression model
def train_linear_regression(df):
    X = df[['M&A_Frequency', 'Yield_Curve_Inversion', 'Political_Party_Control', 'Data_Readouts', 'M&A_Size']]
    y = df['XBI_Returns']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    return model, y_pred

# Train linear regression model for Mega M&A
model, y_pred = train_linear_regression(df_mega_ma)

# Function to evaluate model performance
def evaluate_model_performance(model, y_pred):
    mse = mean_squared_error(y_test, y_pred)
    return mse

# Evaluate model performance for Mega M&A
mse = evaluate_model_performance(model, y_pred)

print(f'Mean Squared Error: {mse}')

# Function to generate predictions
def generate_predictions(model, X_new):
    predictions = model.predict(X_new)
    return predictions

# Generate predictions for Mega M&A
X_new = df_mega_ma[['M&A_Frequency', 'Yield_Curve_Inversion', 'Political_Party_Control', 'Data_Readouts', 'M&A_Size']]
predictions = generate_predictions(model, X_new)

print(f'Predictions: {predictions}')

# Function to create scatter plot
def create_scatter_plot(df):
    plt.figure(figsize=(10,6))
    sns.scatterplot(x='M&A_Frequency', y='XBI_Returns', data=df)
    plt.title('M&A Frequency vs XBI Returns')
    plt.xlabel('M&A Frequency')
    plt.ylabel('XBI Returns')
    plt.show()

# Create scatter plot for Mega M&A
create_scatter_plot(df_mega_ma)

# Function to create bar chart
def create_bar_chart(df):
    plt.figure(figsize=(10,6))
    sns.barplot(x='M&A_Size', y='XBI_Returns', data=df)
    plt.title('M&A Size vs XBI Returns')
    plt.xlabel('M&A Size')
    plt.ylabel('XBI Returns')
    plt.show()

# Create bar chart for Mega M&A
create_bar_chart(df_mega_ma)

KeyError: 'Year'

In [None]:

# Step 1: Calculate Weekly Changes in XBI Market Cap
df_mktcap_xbi['Market_Cap_Change'] = df_mktcap_xbi['CUR_MKT_CAP'].diff()


# Step 4: Calculate Yield Curve
df_us2y['Yield'] = df_us10y['PX_LAST'] - df_us2y['PX_LAST']



In [None]:
# Step 11: Investigate Impact of Yield Curve Inversion
df_xbi['Yield_Curve_Inverted'] = df_us2y['PX_LAST'] < df_us10y['PX_LAST']
yield_curve_inversion = df_xbi.groupby('Yield_Curve_Inverted')['Weekly_Return'].mean()

print("Average XBI weekly return during yield curve inversion:\n", yield_curve_inversion)


KeyError: 'Column not found: Weekly_Return'