3. Hypothesis Testing Section


This notebook performs statistical testing on IPO performance metrics.
It compares 30-day cumulative abnormal returns (CAR_30) between two IPO samples
using a pooled t-test. This test is part of the full analytical pipeline and
will become statistically meaningful once each sample contains multiple IPOs.

3.1 Preparing the Data for Hypothesis Testing

We load the final processed CSVs that contain:
- ipo_return  
- nifty_return  
- abnormal_return  
- car  
- car_30 (added in Notebook_02)

These are required to run hypothesis tests on 30-day performance metrics.

In [16]:
import os
import pandas as pd
import numpy as np

sample1_final = pd.read_csv("../Data/Processed_Data/final_sample1.csv")
sample2_final = pd.read_csv("../Data/Processed_Data/final_sample2.csv")
print (sample1_final.head())
print (sample2_final.head())

         date  close  ipo_return  nifty_return  abnormal_return       car
0  2024-01-02    104    0.019608      0.000691         0.018917  0.018917
1  2024-01-03    103   -0.009615     -0.000460        -0.009155  0.009762
2  2024-01-04    105    0.019417      0.001520         0.017898  0.027660
3  2024-01-05    108    0.028571      0.001104         0.027468  0.055127
4  2024-01-08    110    0.018519      0.001696         0.016823  0.071950
         date  close  ipo_return  nifty_return  abnormal_return       car
0  2024-01-02    225    0.022727      0.000691         0.022036  0.022036
1  2024-01-03    223   -0.008889     -0.000460        -0.008429  0.013608
2  2024-01-04    228    0.022422      0.001520         0.020902  0.034510
3  2024-01-05    232    0.017544      0.001104         0.016440  0.050950
4  2024-01-08    231   -0.004310      0.001696        -0.006006  0.044944


3.2 Prepare Combined Dataset for Hypothesis Testing

We concatenate the two dataframes into a single pooled dataset.
This structure allows us to apply statistical tests easily.

In [18]:
#Pooling of Datasets for Hypothesis Testing
pooled = pd.concat([sample1_final,sample2_final])
pooled

Unnamed: 0,date,close,ipo_return,nifty_return,abnormal_return,car
0,2024-01-02,104,0.019608,0.000691,0.018917,0.018917
1,2024-01-03,103,-0.009615,-0.00046,-0.009155,0.009762
2,2024-01-04,105,0.019417,0.00152,0.017898,0.02766
3,2024-01-05,108,0.028571,0.001104,0.027468,0.055127
4,2024-01-08,110,0.018519,0.001696,0.016823,0.07195
5,2024-01-09,113,0.027273,0.001235,0.026037,0.097987
6,2024-01-10,111,-0.017699,-0.001005,-0.016694,0.081294
7,2024-01-11,115,0.036036,0.001464,0.034572,0.115866
8,2024-01-12,118,0.026087,0.001279,0.024808,0.140674
0,2024-01-02,225,0.022727,0.000691,0.022036,0.022036


In [8]:
#Summary Metrics
sample1_summary = {
    'mean_return': 0.01646627623624749,
    'mean_abnormal_30': 0.015630431752341144,
    'car_30': 0.14067388577107032,
    'volatility_return_30': 0.018059501648673135,
    'volatility_abnormal': 0.017204581295557363,
    'beta': 17.435858126734892,
    'sharpe_like': 0.911779104239972,
    'max_drawdown': -0.01769911504424772
}

sample2_summary = {
    'mean_return': 0.012090668953962522,
    'mean_abnormal_30': 0.011254824470056178,
    'car_30': 0.10129342023050561,
    'volatility_return_30': 0.011643540248923749,
    'volatility_abnormal': 0.011439452894603411,
    'beta': 3.1464102771212685,
    'sharpe_like': 1.0384014393800978,
    'max_drawdown': -0.008888888888888839
}
#Extracted the CAR_30 Column
import pandas as pd
car1 = sample1_summary['car_30']
car2 = sample2_summary['car_30']

df_pooled = pd.DataFrame({"car_30": [car1,car2], "group": ["sample1", "sample2"]})
df_pooled

Unnamed: 0,car_30,group
0,0.140674,sample1
1,0.101293,sample2


3.3 Hypothesis Test: Difference in 30-Day CAR Between Samples

We perform a pooled (equal variance) t-test on CAR_30.
This test checks whether the two IPO groups have significantly different
average 30-day cumulative abnormal returns.


In [23]:
#Sample v/s Sample T test done here.

import pandas as pd
import numpy as np
from scipy.stats import ttest_ind

# Extract grouped CAR(30)
group1 = df_pooled[df_pooled['group'] == 'sample1']["car_30"]
group2 = df_pooled[df_pooled['group'] == 'sample2']["car_30"]

# Guard condition for sample size
if len(group1) > 1 and len(group2) > 1:
    t_stat, p_value = ttest_ind(group1, group2, equal_var=True)
    print("T-statistic:", t_stat)
    print("P-value:", p_value)
else:
    print("Not enough IPOs in each group to run a statistically valid t-test.")
    print("T-test step added for pipeline completeness. Will activate when more IPOs are added.")


Not enough IPOs in each group to run a statistically valid t-test.
T-test step added for pipeline completeness. Will activate when more IPOs are added.


Interpretation of Results (Pooled t-test on 30-Day CAR)

Even though the test is currently inactive due to only one IPO in each group, here is how the interpretation works:

If p-value < 0.05:
There is statistically significant evidence that the mean 30-day cumulative abnormal returns (CAR) of the two IPO groups are different. This means the two groups behave differently in the first 30 days post-listing.

If p-value > 0.05:
There is no statistically significant difference between the 30-day CAR of the two groups. Any difference observed is likely noise rather than a real performance gap.

Current Status:
With only one IPO per group, a pooled t-test cannot compute meaningful variance. The test is included purely for pipeline completeness and will automatically become meaningful once each group contains multiple IPOs.
