Run the cells below:

In [1]:
import pandas as pd
import statsmodels.api as sm

In [2]:
comp = pd.read_pickle('../data/comp_clean.zip')
comp.dtypes

permno          float64
datadate         object
ib              float64
at              float64
che             float64
dltt            float64
ppent           float64
sich            float64
year              int64
roa             float64
future_roa      float64
cash            float64
leverage        float64
investment      float64
w_future_roa    float64
w_cash          float64
w_leverage      float64
w_investment    float64
const             int64
dtype: object

Create a new variable called ``future_invest`` which tells us the firm's ``investment`` in the following year.

In [3]:
comp = comp.sort_values(['permno','year'])
comp['future_invest'] = comp.groupby('permno')['investment'].shift(-1)  

Produce a table that gives us only the mean and standard deviation of ``future_invest``, ``roa`` and ``cash``.

In [4]:
comp[['future_invest','roa', 'cash']].agg(['mean','std'])

Unnamed: 0,future_invest,roa,cash
mean,-0.035447,-0.056431,0.174782
std,11.025308,1.494764,0.219955


Winsorize ``future_invest``, ``roa`` and ``cash`` at the 5 and 95\% level. Call the winsorized variables ``w5_future_invest``, ``w5_roa`` and ``w5_cash``. Then produce a table that gives us just their mean and standard deviations.

In [5]:
for v in ['future_invest','roa', 'cash']:
    comp[f'w5_{v}'] = comp[v].clip(lower=comp[v].quantile(0.05), upper=comp[v].quantile(0.95))
    
comp[['w5_future_invest','w5_roa', 'w5_cash']].agg(['mean','std'])

Unnamed: 0,w5_future_invest,w5_roa,w5_cash
mean,0.016095,-0.027871,0.16785
std,0.055204,0.166513,0.198858


Regress ``w5_future_invest`` on ``w5_roa``, ``w5_cash`` and a constant. Print the results of your regression.

In [6]:
# Using winsorized data at 5 and 95% level
results5 = sm.OLS(endog = comp['w5_future_invest'],
                      exog = comp[['const','w5_roa','w5_cash']],
                      missing = 'drop').fit()
print(results5.summary())

                            OLS Regression Results                            
Dep. Variable:       w5_future_invest   R-squared:                       0.043
Model:                            OLS   Adj. R-squared:                  0.043
Method:                 Least Squares   F-statistic:                     4602.
Date:                Thu, 03 Mar 2022   Prob (F-statistic):               0.00
Time:                        14:07:47   Log-Likelihood:             3.1015e+05
No. Observations:              206817   AIC:                        -6.203e+05
Df Residuals:                  206814   BIC:                        -6.203e+05
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0131      0.000     83.496      0.0

Normalize the ``w5_future_invest``, ``w5_roa`` and ``w5_cash`` variables (subtract the mean and divide by the standard deviation). Call these normalized variables ``n_future_invest``, ``n_roa`` and ``n_cash``. Then produce a table that gives us their mean and standard deviation.

In [7]:
main_vars_wins = ['w5_future_invest','w5_roa', 'w5_cash']
normalized_vars = ['n_future_invest','n_roa','n_cash']
comp[normalized_vars] = (comp[main_vars_wins] - comp[main_vars_wins].mean()) / comp[main_vars_wins].std()
comp[normalized_vars].agg(['mean','std'])

Unnamed: 0,n_future_invest,n_roa,n_cash
mean,1.172636e-14,-4.694453e-14,1.094284e-13
std,1.0,1.0,1.0


Regress ``n_future_invest`` on ``n_roa`` and ``n_cash`` (and a constant). Print the results of your regression.

In [8]:
# Using normalized data
results5 = sm.OLS(endog = comp['n_future_invest'],
                      exog = comp[['const','n_roa','n_cash']],
                      missing = 'drop').fit()
print(results5.summary())

                            OLS Regression Results                            
Dep. Variable:        n_future_invest   R-squared:                       0.043
Model:                            OLS   Adj. R-squared:                  0.043
Method:                 Least Squares   F-statistic:                     4602.
Date:                Thu, 03 Mar 2022   Prob (F-statistic):               0.00
Time:                        14:07:47   Log-Likelihood:            -2.8894e+05
No. Observations:              206817   AIC:                         5.779e+05
Df Residuals:                  206814   BIC:                         5.779e+05
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0114      0.002     -5.284      0.0

Given the results in the last regression, what is the estimated effect on future investment of increasing ROA by one standard deviation today?

A: Since the variables are normalized, increasing ROA by one standard deviation amounts to adding 1 to ``n_roa``. The regression results above say that doing this would increase ``n_future_invest`` by 0.2251. This implies an increase of 0.2251 standard deviations in the ``future_invest`` variable.