# The Use of Asset Growth in Empirical Asset Pricing Models

Source: https://doi.org/10.1016/j.jfineco.2023.103746

## 1. Introduction

This study investigates the explanatory power and mechanisms of asset growth (AG) in empirical asset pricing models, specifically within the context of Chinese listed companies. 

Building on literature of Fama French 5 factor model, it tests the robustness and economic implications of AG and its subcomponents, such as inventory (INVT) and accounts receivable (AREC).

In [None]:
import pandas as pd
import warnings
import statsmodels.api as sm
from scipy import stats

warnings.filterwarnings("ignore")
data = pd.read_csv("FS_Combas.csv")

## 2. Preprocessing and Feature Engineering

In [None]:
data = data[(data["Typrep"] == "A")]

data['Newdate'] = data['Accper'].apply(lambda x: int(x.replace('-','')))
# Normalize lagged total assets and calculate the rate of change
for yy in range(2015,2023):
    time = 10000*yy+101
    data.drop(data[(data["Newdate"]==time)].index,axis=0,inplace=True)
data["TotalAsset"]=data["A001000000"].shift(1)
data = data.drop(["A001000000"],axis=1)
data["A001111000_1"]=data["A001111000"].shift(1)
data["A001123000_1"]=data["A001123000"].shift(1)
data["INVT"] = (data["A001123000"] - data["A001123000_1"])/(data["TotalAsset"]*3)
data["AREC"] = (data["A001111000"] - data["A001111000_1"])/(data["TotalAsset"]*3)
# Remove any remaining missing values
data.dropna(subset=['A001111000', 'A001123000'], inplace=True)
data.drop(data[(data['A001123000']==0)].index, axis=0, inplace=True)

In [None]:
data['Accper'] = pd.to_datetime(data['Accper'])

data = data[(data['Accper'].dt.year >= 2014) & (data['Accper'].dt.year <= 2024)]

# Sort by Stkcd and Accper to ensure the dates are monotonically increasing
data.sort_values(by=['Stkcd', 'Accper'], ascending=[True, True], inplace=True)

# Generate a complete set of monthly data for each stock from 2015 to 2023
months = pd.date_range('2015-01-01', '2024-01-01', freq='M')
all_combinations = pd.MultiIndex.from_product([data['Stkcd'].unique(), months], names=['Stkcd', 'Accper'])

# Reindex the data by Stkcd and Accper, ensuring dates are monotonically increasing
data.set_index(['Stkcd', 'Accper'], inplace=True)

# When using reindex, ensure the data index matches all_combinations
data = data.reindex(all_combinations, method='ffill')

# Create a new column Newmnt (formatted as YYYYMM)
data['Newmnt'] = data.index.get_level_values('Accper').strftime('%Y%m')

# Reset the index back to regular columns
data.reset_index(inplace=True)

# Obtain INVT and AREC change rates
data = data[["Stkcd", "Newmnt", "INVT", "AREC"]]
data.head(5)


In [None]:
# Import individual stock returns
stkret = pd.read_csv('TRD_Mnth_New.csv')
stkret = stkret.loc[(stkret['Markettype'] == 1) & (stkret['Mretwd'].notnull())]
stkret['Newmnt'] = stkret['Trdmnt'].apply(lambda x: int(x.replace('-', '')))
stkret.head(5)

In [None]:
# Merge the two tables
data["Newmnt"] = data["Newmnt"].astype(int)
factordata = pd.merge(stkret, data, on=['Newmnt', 'Stkcd'], how='inner')
factordata.head(5)

In [None]:
def getgroup(x):
    return pd.qcut(x,2,labels=False)

factordata["INVT_group"] = factordata.groupby("Newmnt")["INVT"].transform(getgroup)
factordata["AREC_group"] = factordata.groupby("Newmnt")["AREC"].transform(getgroup)

In [None]:
f_INVT = factordata.groupby(["Newmnt", "INVT_group"])["INVT"].mean().reset_index()
f_AREC = factordata.groupby(["Newmnt", "AREC_group"])["AREC"].mean().reset_index()

# Pivot the data for easier difference calculation
f_INVT_pivot = f_INVT.pivot(index="Newmnt", columns="INVT_group", values="INVT")
f_AREC_pivot = f_AREC.pivot(index="Newmnt", columns="AREC_group", values="AREC")

# For each Newmnt, calculate the difference: INVT_group == 1 minus INVT_group == 0
f_INVT_pivot["INVT_diff"] = f_INVT_pivot[1] - f_INVT_pivot[0]
f_AREC_pivot["AREC_diff"] = f_AREC_pivot[1] - f_AREC_pivot[0]

# Return the final factors
f_INVT = f_INVT_pivot.reset_index()[["Newmnt", "INVT_diff"]]
f_AREC = f_AREC_pivot.reset_index()[["Newmnt", "AREC_diff"]]
newfactor = pd.merge(f_INVT, f_AREC, on="Newmnt", how="inner")
newfactor.head(5)

In [None]:
# Import FF5F data and clean it
ff5ret = pd.read_csv("STK_MKT_FIVEFACMONTH.csv")
ff5ret = ff5ret.loc[(ff5ret['MarkettypeID'] == 'P9706') & (ff5ret['Portfolios'] == 3)]
ff5ret.dropna(axis=0, inplace=True)
ff5ret['Newmnt'] = ff5ret['TradingMonth'].apply(lambda x: int(x.replace('-', '')))
# ff5ret = ff5ret[(ff5ret["Newmnt"] >= 201501) & (ff5ret["Newmnt"] <= 202012)]
ff5ret.head(5)

In [None]:
# Dataset containing the two new factors
marketfactor = pd.merge(ff5ret, newfactor, on="Newmnt", how="inner")
marketfactor.drop(["MarkettypeID", "Portfolios", "TradingMonth"], axis=1, inplace=True)
marketfactor.head(5)

In [None]:
all = pd.merge(factordata, marketfactor, on="Newmnt", how="inner")
all.head(5)

## 3. Regressions, Portfolio Construction and Results
### (Based on FF5 + INVT + AREC Factor Model)

This section applies on the Fama-French 5-factor suite (Market Excess Return, SMB, HML, RMW, CMA), replace the CMA factor with the newly constructed INVTdiff and ARECdiff to evaluate multiple regression strategies and grouped portfolio analyses.

#### Pooled Regression  
Directly replace the CMA factor in the Fama-French 5-factor model with INVT and AREC, and regress stock returns on the new factor model to obtain factor exposures and regression results.

In [None]:
# Pooled regression
mod = sm.OLS(all.loc[:,"Mretwd"], sm.add_constant(all.iloc[:,[-7,-6,-5,-4,-2,-1]]))
reshac=mod.fit(cov_type='hac',cov_kwds={'maxlags':2}) #Newey-west estimator
print(reshac.summary())

Findings: Both INVTdiff and ARECdiff (spread returns between high/low inventory/accounting receivable groups) are significantly positive, showing a robust long-term effect of the new factor model.

#### Two-pass Regression  
First, calculate the rolling risk exposures to the other four FF5 factors (excluding CMA) for each stock in each time window.

In [None]:
# Two-pass regression
def doreg(data):
    mod = sm.OLS(data.loc[:, 'Mretwd'], sm.add_constant(data.loc[:, ['RiskPremium2','SMB2','HML2','RMW2','CMA2']]))
    res = mod.fit()
    return res.params

allregres = pd.DataFrame()

for yy in range(2015, 2022):
    for mm in range(1, 13):
        begym = yy * 100 + mm   # Example: 201001
        endym = (yy + 2) * 100 + mm   # Example: 201201
        
        traindata = all[(all['Newmnt'] >= begym) & (all['Newmnt'] < endym)]
        
        # Optional: add filtering conditions here
        traindata['nobs'] = traindata.groupby('Stkcd')['Newmnt'].transform(len)
        traindata = traindata[traindata['nobs'] >= 15]
        
        regres = traindata.groupby('Stkcd').apply(doreg)
        regres.rename({'RiskPremium2': 'beta'}, inplace=True, axis=1)
        regres['Newmnt'] = endym
        
        allregres = pd.concat((allregres, regres), axis=0)

In [None]:
testdata=pd.merge(all,allregres.reset_index(),how='inner',on=['Stkcd','Newmnt'])
allregres.head(5)

Then, compute the excess returns after adjusting for these four factors. Finally, regress the residual excess returns on INVT and AREC factors, where the dependent variable is the alpha residuals of the FF5 factor model (net of the four factors: Market, SMB, HML, RMW) for all stocks across all time periods, and the independent variables are INVT and AREC factors.

In [None]:
testdata["FF5_alpha"] = (testdata["Mretwd"] - testdata["RiskPremium2"] * testdata["beta"] 
    - testdata["SMB2_y"] * testdata["SMB2_x"] - testdata["HML2_y"] * testdata["HML2_x"] 
    - testdata["RMW2_y"] * testdata["RMW2_x"])
mod = sm.OLS(testdata["FF5_alpha"], sm.add_constant(testdata.loc[:, ["INVT_diff","AREC_diff"]]))
reshac = mod.fit()  # Newey–West estimator
print(reshac.summary())

Findings: ARECdiff are significant, but INVTdiff has an significant reversal influence on the return

#### Quantile Portfolios  
Stocks are grouped by factor values (e.g., deciles). Group and high-low spread portfolio returns are tracked across time.  

In [None]:
## Quantile portfolios
def getgroup10(x):
    return pd.qcut(x, 10, labels=False)

# Create decile portfolios
testdata['group'] = testdata.groupby('Newmnt')['beta'].transform(getgroup10)
gret = testdata.groupby(['Newmnt', 'group'])['Mretwd'].mean()
gretnew = gret.reset_index().pivot_table(index='Newmnt', columns='group', values='Mretwd')
gretnew.loc[:, 'spread'] = gretnew.loc[:, 9] - gretnew.loc[:, 0]
gretnew

In [None]:
stats.ttest_1samp(gretnew['spread'], 0)
tmpdata = sm.add_constant(gretnew['spread'])
mod = sm.OLS(tmpdata.iloc[:, 1], tmpdata.iloc[:, 0])
reshac = mod.fit(cov_type='hac', cov_kwds={'maxlags': 2})  # Newey–West estimator
print(reshac.summary())

Findings: Results indicate that INVT and AREC spreads yield positive excess returns during most periods, but t-test shows high p-values on excess returns.