# Example of Fixed Effect Regressions (Industry and Country) to Explain Environmental Intensity (=Environmental Costs/Sales) in Python
* Using StatsModels Python Library
* This example is for Industry Fixed Effects. It can also be used for Country Fixed Effects and a combination of Industry and Country Fixed Effects

## Upload CSV file with data
* Note that the example CSV file contains only 2019 data
* The Industry and Country Labels have been converted to codes in this example (not absolutely necessary)

In [18]:
# Upload CSV file from your local machine to working directory in Colab
from google.colab import files
uploaded = files.upload()

Saving Regress8.csv to Regress8 (1).csv


## Import Necessary Libraries

In [19]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

## Put Data in Pandas Dataframe
Note 1: The data set is from 2019 only and was pre-processed to ensure that there were at least 3 companies in each industry
Note 2: Variable definitions (modified for this example)
* year = Year of data
* ctry = Country Code created to represent each unique country
* Ind = Industry Code (1-50) created to represent
* er = Environmental Intensity = Environmental Costs/Sales in 2019
* e = 2019 Total Environmental Costs in $US
* rev = Sales in US$ in 2019 (derived from "e" and "er")


In [24]:
df = pd.read_csv('Regress8.csv')
df


Unnamed: 0,year,ctry,ind,er,e,rev
0,2019,2,1,-0.006012,-4.620663e+06,7.686130e+08
1,2019,3,1,-0.000944,-1.143222e+07,1.210875e+10
2,2019,7,1,-0.000860,-1.263537e+06,1.469613e+09
3,2019,16,1,-0.006993,-3.997865e+07,5.717172e+09
4,2019,16,1,-0.003055,-1.703975e+06,5.577370e+08
...,...,...,...,...,...,...
1712,2019,51,50,-0.013514,-1.240473e+08,9.179334e+09
1713,2019,51,50,-0.003337,-2.412736e+07,7.230962e+09
1714,2019,57,50,-0.000596,-3.501477e+04,5.879700e+07
1715,2019,63,50,-0.009804,-2.096821e+08,2.138707e+10


## Baseline Regression
* Estimate a regression: *er = constant*
* The coefficient estimate will give the average Environmental Intenisty for firms in the sample in 2019

In [25]:
# add constant column to the original dataframe
df['constant'] = 1

# define x as a subset of original dataframe
x = df[['constant']]
# define y as a series
y = df['er']

# pass x as a dataframe, while pass y as a series
sm.OLS(y, x).fit().summary()

  return self.ess/self.df_model


0,1,2,3
Dep. Variable:,er,R-squared:,0.0
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,
Date:,"Fri, 23 Apr 2021",Prob (F-statistic):,
Time:,19:10:12,Log-Likelihood:,-360.9
No. Observations:,1717,AIC:,723.8
Df Residuals:,1716,BIC:,729.3
Df Model:,0,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
constant,-0.1233,0.007,-17.113,0.000,-0.137,-0.109

0,1,2,3
Omnibus:,1071.018,Durbin-Watson:,1.058
Prob(Omnibus):,0.0,Jarque-Bera (JB):,15212.781
Skew:,-2.695,Prob(JB):,0.0
Kurtosis:,16.549,Cond. No.,1.0


##Create Industry Indicators

In [26]:
df = pd.get_dummies(df, columns=['ctry'])
df

Unnamed: 0,year,ind,er,e,rev,constant,ctry_1,ctry_2,ctry_3,ctry_4,ctry_5,ctry_6,ctry_7,ctry_8,ctry_9,ctry_10,ctry_11,ctry_12,ctry_13,ctry_14,ctry_15,ctry_16,ctry_17,ctry_18,ctry_19,ctry_20,ctry_21,ctry_22,ctry_23,ctry_24,ctry_25,ctry_26,ctry_27,ctry_28,ctry_29,ctry_30,ctry_31,ctry_32,ctry_33,ctry_34,ctry_35,ctry_36,ctry_37,ctry_38,ctry_39,ctry_40,ctry_41,ctry_42,ctry_43,ctry_44,ctry_45,ctry_46,ctry_47,ctry_48,ctry_49,ctry_50,ctry_51,ctry_52,ctry_53,ctry_54,ctry_55,ctry_56,ctry_57,ctry_58,ctry_59,ctry_60,ctry_61,ctry_62,ctry_63
0,2019,1,-0.006012,-4.620663e+06,7.686130e+08,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,2019,1,-0.000944,-1.143222e+07,1.210875e+10,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,2019,1,-0.000860,-1.263537e+06,1.469613e+09,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,2019,1,-0.006993,-3.997865e+07,5.717172e+09,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,2019,1,-0.003055,-1.703975e+06,5.577370e+08,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1712,2019,50,-0.013514,-1.240473e+08,9.179334e+09,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
1713,2019,50,-0.003337,-2.412736e+07,7.230962e+09,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
1714,2019,50,-0.000596,-3.501477e+04,5.879700e+07,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
1715,2019,50,-0.009804,-2.096821e+08,2.138707e+10,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


## Create dataframe for regression that includes only Environmental Intensity and Industry Indicators

In [27]:

# define x as a subset of original dataframe
x = df.drop(columns=['year', 'ind', 'er', 'e','rev', 'constant'])
# define y as a series
y = df['er']

# pass x as a dataframe, while pass y as a series
sm.OLS(y, x).fit().summary()

0,1,2,3
Dep. Variable:,er,R-squared:,0.104
Model:,OLS,Adj. R-squared:,0.07
Method:,Least Squares,F-statistic:,3.092
Date:,"Fri, 23 Apr 2021",Prob (F-statistic):,4.09e-14
Time:,19:11:05,Log-Likelihood:,-266.76
No. Observations:,1717,AIC:,659.5
Df Residuals:,1654,BIC:,1003.0
Df Model:,62,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
ctry_1,-0.1137,0.166,-0.684,0.494,-0.440,0.212
ctry_2,-0.1927,0.035,-5.558,0.000,-0.261,-0.125
ctry_3,-0.0931,0.077,-1.210,0.227,-0.244,0.058
ctry_4,-1.4614,0.288,-5.075,0.000,-2.026,-0.897
ctry_5,-0.2049,0.109,-1.882,0.060,-0.418,0.009
ctry_6,0.9243,0.288,3.210,0.001,0.360,1.489
ctry_7,-0.1153,0.077,-1.498,0.134,-0.266,0.036
ctry_8,-0.1772,0.040,-4.437,0.000,-0.256,-0.099
ctry_9,-0.6686,0.102,-6.567,0.000,-0.868,-0.469

0,1,2,3
Omnibus:,961.396,Durbin-Watson:,1.146
Prob(Omnibus):,0.0,Jarque-Bera (JB):,13136.639
Skew:,-2.332,Prob(JB):,0.0
Kurtosis:,15.722,Cond. No.,15.9
