In [58]:
import numpy as np
import pandas as pd
import matplotlib as mp
import statsmodels.api as sm

from statsmodels.sandbox.regression.gmm import IV2SLS 
# There is a package named IV2SLS in Python. Do not use this package! The exogenous explanatory variables must
# be entered as instruments. So it gives wrong answers
from statsmodels.sandbox.regression.gmm import GMM

Within this library, the IV2SLS class from the statsmodels.sandbox.regression.gmm module is used for instrumental variable regression, specifically the Two-Stage Least Squares (2SLS) method.

Two-Stage Least Squares (2SLS)
The Two-Stage Least Squares method is a way to handle endogeneity in a regression model. It involves two main steps:

First Stage: Regress the endogenous variable on all the exogenous variables in the model, including the instrumental variable(s). Save the predicted values from this regression.

Second Stage: Use the predicted values from the first stage as a replacement for the endogenous variable in the original regression model. Estimate the parameters of this model using Ordinary Least Squares (OLS).

In [59]:
input_table = pd.read_csv('small_retailers_stock_performance.csv')
input_table.head()

Unnamed: 0,Constant,Stock Change,Inventory Turnover,Operating Profit,Interaction Effect,Current Ratio,Quick Ratio,Debt Asset Ratio
0,1,0.870332,1.795946,0.115846,0.208053,1.672527,0.255171,0.473317
1,1,-0.047347,1.395501,0.436967,0.609788,1.637261,0.221763,0.489967
2,1,0.001176,1.664563,0.541016,0.900555,1.640619,0.189141,0.374269
3,1,-0.9012,1.605738,0.539399,0.866133,1.436221,0.131944,0.224399
4,1,-0.176353,1.591451,0.539938,0.859285,1.43314,0.183095,0.213446


In this regression:

Dependent Variable: "Inventory Turnover"
Independent Variables: "Constant", "Current Ratio", "Quick Ratio", "Debt Asset Ratio"

In [60]:
#run the ols regression
model_iv = sm.OLS(input_table["Inventory Turnover"],input_table[["Constant","Current Ratio","Quick Ratio",\
                                                                 "Debt Asset Ratio"]]).fit()
#making predictions, note that only the independent variables are in predictions 
endog_predict = model_iv.predict(input_table[["Constant","Current Ratio","Quick Ratio","Debt Asset Ratio"]])
#adding predictions to data table
input_table["Endogenous Param"] = endog_predict

the second stage of a Two-Stage Least Squares (2SLS) regression using Ordinary Least Squares (OLS). 

in this regression,

Dependent Variable: "Stock Change"
Independent Variables: "Constant", "Endogenous Param" (the predicted values of the endogenous variable from the first stage), "Operating Profit", and "Interaction Effect"


In [61]:
model_2sls = sm.OLS(input_table["Stock Change"], input_table[["Constant","Endogenous Param",\
                                                              "Operating Profit","Interaction Effect",\
                                                             ]]).fit()
model_2sls.summary()

0,1,2,3
Dep. Variable:,Stock Change,R-squared:,0.015
Model:,OLS,Adj. R-squared:,0.013
Method:,Least Squares,F-statistic:,8.53
Date:,"Sun, 16 Oct 2022",Prob (F-statistic):,1.27e-05
Time:,00:51:06,Log-Likelihood:,-1186.5
No. Observations:,1696,AIC:,2381.0
Df Residuals:,1692,BIC:,2403.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Constant,-0.0176,0.020,-0.896,0.370,-0.056,0.021
Endogenous Param,0.0011,0.001,1.827,0.068,-7.76e-05,0.002
Operating Profit,-0.1201,0.028,-4.319,0.000,-0.175,-0.066
Interaction Effect,0.0014,0.000,3.621,0.000,0.001,0.002

0,1,2,3
Omnibus:,368.832,Durbin-Watson:,2.243
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3433.92
Skew:,0.742,Prob(JB):,0.0
Kurtosis:,9.811,Cond. No.,109.0


Dependent Variable
Dep. Variable: The dependent variable in the model is "Stock Change".

1. Model Fit

R-squared: 0.015. This value is quite low, suggesting that the model explains only 1.5% of the variance in the dependent variable.

Adj. R-squared: 0.013. This is the adjusted R-squared, which adjusts for the number of independent variables in the model. It is very close to the R-squared, indicating that the number of variables is not inflating the explanatory power of the model.

F-statistic: 8.530. This is the overall F-test for the model, which tests whether at least one of the coefficients (excluding the intercept) is significantly different from 0.

Prob (F-statistic): 1.27e-05. This is the p-value associated with the F-statistic. A value close to 0 suggests that at least one of the coefficients in the model is significantly different from 0.

2. Model Parameters

Constant: The estimated coefficient for the constant term is -0.0176, but it is not statistically significant (p-value = 0.370).

Endogenous Param: The estimated coefficient is 0.0011, with a p-value of 0.068. This suggests that there is some evidence that this variable is related to "Stock Change", but the evidence is not strong (a p-value less than 0.05 is typically considered strong evidence).

Operating Profit: The estimated coefficient is -0.1201, and it is statistically significant (p-value < 0.001). This suggests a negative relationship between "Operating Profit" and "Stock Change".

Interaction Effect: The estimated coefficient is 0.0014, and it is statistically significant (p-value < 0.001). This suggests a positive relationship between "Interaction Effect" and "Stock Change".

3. Residuals

Omnibus: The Omnibus test is significant, suggesting that the residuals are not normally distributed.

Prob(Omnibus): Close to 0, confirming the Omnibus test result.

Durbin-Watson: 2.243, which is close to 2, suggesting that there is no autocorrelation in the residuals.

Jarque-Bera (JB): A large value, also indicating that the residuals are not normally distributed.

Prob(JB): Close to 0, confirming the Jarque-Bera test result.

4. Other

Skew: 0.742, indicating some skewness in the residuals.
Kurtosis: 9.811, indicating that the residuals have heavy tails.
Cond. No.: 109. This is a measure of multicollinearity. Values over 30 may indicate that the model has multicollinearity issues.

5. Overall Interpretation

The model has a low R-squared value, indicating that it does not explain much of the variability in "Stock Change". The variable "Operating Profit" is negatively related to "Stock Change", and "Interaction Effect" is positively related to "Stock Change". "Endogenous Param" also shows a positive relationship, but it is not statistically significant at the 0.05 level.

The tests for normality of the residuals suggest that the residuals are not normally distributed, which might be a concern depending on the assumptions and requirements of your analysis.

The condition number is moderate, suggesting that multicollinearity might not be a major concern in this model.

### GMM
It is used to estimate the parameters of the model using the GMM methodology. 

GMM is particularly useful when dealing with endogeneity, as it allows for the use of instrumental variables.

The validity of the GMM estimates heavily relies on the validity of the instrumental variables and the moment conditions.

In [62]:
#Dependent Variable: y_vals, which represents "Stock Change".
#Independent Variables: x_vals, which includes "Inventory Turnover", "Operating Profit", and "Interaction Effect".
#Instrumental Variables: iv_vals, which includes "Current Ratio", "Quick Ratio", and "Debt Asset Ratio".

y_vals  = np.array(input_table["Stock Change"])
x_vals  = np.array(input_table[["Inventory Turnover","Operating Profit","Interaction Effect"]])
iv_vals = np.array(input_table[["Current Ratio","Quick Ratio","Debt Asset Ratio"]])


# The purpose is to create a specific GMM estimator that can be used to estimate the parameters of a linear regression model
# taking into account potential endogeneity of the independent variables. 
# By using instrumental variables and defining appropriate moment conditions
# GMM estimator aims to provide consistent  parameter estimates even when the standard OLS assumptions are violated.


class gmm(GMM):
    def momcond(self, params):
        p0, p1, p2, p3 = params
        endog = self.endog
        exog = self.exog
        inst = self.instrument   

        error0 = endog - p0 - p1 * exog[:,0] - p2 * exog[:,1] - p3 * exog[:,2]
        error1 = (endog - p0 - p1 * exog[:,0] - p2 * exog[:,1] - p3 * exog[:,2]) * exog[:,1]
        error2 = (endog - p0 - p1 * exog[:,0] - p2 * exog[:,1] - p3 * exog[:,2]) * exog[:,2]
        error3 = (endog - p0 - p1 * exog[:,0] - p2 * exog[:,1] - p3 * exog[:,2]) * inst[:,0] 
        error4 = (endog - p0 - p1 * exog[:,0] - p2 * exog[:,1] - p3 * exog[:,2]) * inst[:,1] 
        error5 = (endog - p0 - p1 * exog[:,0] - p2 * exog[:,1] - p3 * exog[:,2]) * inst[:,2] 

        g = np.column_stack((error0, error1, error2, error3, error4, error5))
        return g

#k_moms=6: The number of moment conditions.
#k_params=4: The number of parameters to be estimated.
#beta0: The initial values for the parameters.

beta0 = np.array([0.1, 0.1, 0.1, 0.1])
res = gmm(endog = y_vals, exog = x_vals, instrument = iv_vals, k_moms=6, k_params=4).fit(beta0)

res.summary()


Optimization terminated successfully.
         Current function value: 0.000046
         Iterations: 8
         Function evaluations: 12
         Gradient evaluations: 12
Optimization terminated successfully.
         Current function value: 0.000373
         Iterations: 7
         Function evaluations: 13
         Gradient evaluations: 13
Optimization terminated successfully.
         Current function value: 0.000372
         Iterations: 5
         Function evaluations: 9
         Gradient evaluations: 9
Optimization terminated successfully.
         Current function value: 0.000372
         Iterations: 5
         Function evaluations: 11
         Gradient evaluations: 11
Optimization terminated successfully.
         Current function value: 0.000372
         Iterations: 0
         Function evaluations: 1
         Gradient evaluations: 1


0,1,2,3
Dep. Variable:,y,Hansen J:,0.6317
Model:,gmm,Prob (Hansen J):,0.729
Method:,GMM,,
Date:,"Sun, 16 Oct 2022",,
Time:,00:51:06,,
No. Observations:,1696,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
p 0,-0.0200,0.021,-0.964,0.335,-0.061,0.021
p 1,0.0011,0.001,1.843,0.065,-6.89e-05,0.002
p 2,-0.1071,0.032,-3.370,0.001,-0.169,-0.045
p 3,0.0011,0.000,2.760,0.006,0.000,0.002


### iterations
the number of iterations required varied across different runs of the optimization, as indicated by the "Iterations" line in each output block.

### Function Evaluations
The "Function evaluations" line indicates how many times the GMM objective function was evaluated during the optimization process. This number can be larger than the number of iterations, as the optimization algorithm may evaluate the function multiple times per iteration to determine the direction in which to adjust the parameters.

### Gradient Evaluations
how many times the gradient of the GMM objective function was calculated. The gradient provides information on how to adjust the parameters to minimize the objective function.

### Current Function Value
The "Current function value" line shows the value of the GMM objective function at the final set of parameters found by the optimization algorithm. A lower function value indicates a better fit of the model to the data, given the moment conditions.

### Hansen J Test

Hansen J: 0.6317. This is the test statistic for the Hansen J test, which is used to test the overall validity of the instrumental variables.

Prob (Hansen J): 0.729. This is the p-value associated with the Hansen J test. A high p-value (typically > 0.05) suggests that we do not reject the null hypothesis that the instruments are valid. In this case, the p-value is 0.729, indicating that the instruments appear to be valid.

The Hansen J test suggests that the instrumental variables used in the model are valid, as we do not reject the null hypothesis of the test.