## Environmental Intensity Level Regressions and Dickey–Fuller test

In the notebook 'PredictingTimeSeries_&_PilotStock_CompDescription', we ran several linear regressions combining different features and we saw it yield a high R2 score. In this notebook, we will run the regressions but focus on their coefficients. 

Also, we are going to do some Dickey-Fuller test

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import missingno as msno
import warnings
from sklearn import linear_model
from sklearn import metrics
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import requests
from bs4 import BeautifulSoup
import os
import statsmodels.api as sm
warnings.filterwarnings('ignore')

  import pandas.util.testing as tm


In [None]:
df=pd.read_csv('/Users/maralinetorres/Documents/GitHub/Predicting-Environmental-and-Social-Actions/Datasets/Environmental_impact_cleaned.csv')
df.head()

Unnamed: 0,ISIN,Year,CompanyName,Country,Industry(Exiobase),EnvironmentalIntensity(Sales),EnvironmentalIntensity(OpInc),TotalEnvironmentalCost,WorkingCapacity,FishProductionCapacity,...,SDG14.c,SDG15.1,SDG15.2,SDG15.5,%Imputed,Env_intensity,industry_avg,industry_avg_year,Industry_indicator_year,Environmental_Growth
0,DE0005545503,2016,1&1 DRILLISCH AG,Germany,Post and telecommunications (64),-0.07%,-0.82%,-539318,-525027,-169,...,-6,67,67,-22,23%,-0.0007,-0.020506,-0.02074,1,
1,GB00B1YW4409,2010,3I GROUP PLC,United Kingdom,"Financial intermediation, except insurance and...",-0.12%,-0.11%,-1055812,-1032103,-277,...,-4,51,51,-43,10%,-0.0012,-0.028537,-0.006402,1,
2,GB00B1YW4409,2011,3I GROUP PLC,United Kingdom,"Financial intermediation, except insurance and...",-0.16%,-0.16%,-961875,-940402,-246,...,-3,38,38,-39,9%,-0.0016,-0.028537,-0.009838,1,33.333333
3,GB00B1YW4409,2012,3I GROUP PLC,United Kingdom,"Financial intermediation, except insurance and...",-0.15%,,-722999,-706893,-183,...,-2,27,27,-30,8%,-0.0015,-0.028537,-0.024437,1,-6.25
4,US88579Y1010,2010,3M COMPANY,United States,Activities of membership organisation n.e.c. (91),-7.90%,-35.45%,-2105919763,-1924672080,-439506,...,-423,3772,3772,-79722,1%,-0.079,-0.175838,-0.084583,1,


In [None]:
years = [2016, 2017, 2018]
df_industry = df.groupby('Industry(Exiobase)').count()['CompanyName'].reset_index()
industries = df_industry[df_industry['CompanyName'] > 3]['Industry(Exiobase)']
df_industry_count4 = df[df['Industry(Exiobase)'].isin(industries)]
df_c = df_industry_count4.copy()
def predictiveModel(outcomeYear, pastYears, df_c):
    years.sort()
    for year in years:
        data = df_c[df_c['Year'] == year]
        data = data.loc[:,['CompanyName','Env_intensity','industry_avg_year']]
        data.rename(columns={'Env_intensity': f'Env_intensity_{year}','industry_avg_year':f'industry_avg_year_{year}'}, inplace=True) 
        if(year == min(years)):
            data1 = pd.DataFrame(data)
        else:
            data2 = pd.merge(data1, data, on=["CompanyName"])
            data1 = data2.copy()
    data3 = df_c[df_c['Year'] == outcomeYear]
    data3 = data3[['CompanyName','Env_intensity','industry_avg_year']]
    data3.rename(columns={'Env_intensity': f'Env_intensity_{outcomeYear}','industry_avg_year':f'industry_avg_year_{outcomeYear}'}, inplace=True) 
    data3 = pd.merge(data3, data2, on=["CompanyName"])
    
    filter_col = [col for col in data3 if ((col.startswith('Env_intensity') and not(col.endswith(f'{outcomeYear}')))) or ((col.startswith('industry_avg_year') and not(col.endswith(f'{outcomeYear}'))))]
    outcome_col = [col for col in data3 if (col.startswith('Env_intensity') and col.endswith(f'{outcomeYear}'))]
    X=data3[filter_col]
    y=data3[outcome_col]
    
    x = sm.add_constant(X)
    print(sm.OLS(y, x).fit().summary())

In [None]:
predictiveModel(2019, years, df_c)

                            OLS Regression Results                            
Dep. Variable:     Env_intensity_2019   R-squared:                       0.884
Model:                            OLS   Adj. R-squared:                  0.883
Method:                 Least Squares   F-statistic:                     1340.
Date:                Thu, 15 Jul 2021   Prob (F-statistic):               0.00
Time:                        22:30:25   Log-Likelihood:                 1179.1
No. Observations:                1065   AIC:                            -2344.
Df Residuals:                    1058   BIC:                            -2309.
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
const                     -0

In [None]:
def predictiveModel_v2(outcomeYear, pastYears, df_c):
    years.sort()
    for year in years:
        data = df_c[df_c['Year'] == year]
        data = data.loc[:,['CompanyName','Env_intensity','industry_avg_year','Industry_indicator_year']]
        data.rename(columns={'Env_intensity': f'Env_intensity_{year}','industry_avg_year':f'industry_avg_year_{year}', 'Industry_indicator_year' : f'Industry_indicator_year_{year}'}, inplace=True) 
        if(year == min(years)):
            data1 = pd.DataFrame(data)
        else:
            data2 = pd.merge(data1, data, on=["CompanyName"])
            data1 = data2.copy()
    data3 = df_c[df_c['Year'] == outcomeYear]
    data3 = data3[['CompanyName','Env_intensity']]
    data3.rename(columns={'Env_intensity': f'Env_intensity_{outcomeYear}'}, inplace=True) 
    data3 = pd.merge(data3, data2, on=["CompanyName"])
    
    filter_col = [col for col in data3 if ((col.startswith('Env_intensity') and not(col.endswith(f'{outcomeYear}')))) or ((col.startswith('industry_avg_year') and not(col.endswith(f'{outcomeYear}')))) or ((col.startswith('Industry_indicator_year_') and not(col.endswith(f'{outcomeYear}'))))]            
    outcome_col = [col for col in data3 if (col.startswith('Env_intensity') and col.endswith(f'{outcomeYear}'))]
    X=data3[filter_col]
    y=data3[outcome_col]
    
    x = sm.add_constant(X)
    print(sm.OLS(y, x).fit().summary())

In [None]:
predictiveModel_v2(2019, years, df_c)

                            OLS Regression Results                            
Dep. Variable:     Env_intensity_2019   R-squared:                       0.884
Model:                            OLS   Adj. R-squared:                  0.883
Method:                 Least Squares   F-statistic:                     893.7
Date:                Thu, 15 Jul 2021   Prob (F-statistic):               0.00
Time:                        22:31:46   Log-Likelihood:                 1180.6
No. Observations:                1065   AIC:                            -2341.
Df Residuals:                    1055   BIC:                            -2291.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                                   coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------------
const           

In [None]:
def predictiveModel_v3(outcomeYear, pastYears, df_c):
    years.sort()
    for year in years:
        data = df_c[df_c['Year'] == year]
        data = data.loc[:,['CompanyName','Env_intensity','industry_avg_year','Industry_indicator_year','Environmental_Growth']]
        data.rename(columns={'Env_intensity': f'Env_intensity_{year}','industry_avg_year':f'industry_avg_year_{year}', 'Industry_indicator_year' : f'Industry_indicator_year_{year}', 'Environmental_Growth': f'Environmental_Growth_{year}'}, inplace=True) 
        if(year == min(years)):
            data1 = pd.DataFrame(data)
        else:
            data2 = pd.merge(data1, data, on=["CompanyName"])
            data1 = data2.copy()
    data2.dropna(inplace=True)
    data3 = df_c[df_c['Year'] == outcomeYear]
    data3 = data3[['CompanyName','Env_intensity']]
    data3.rename(columns={'Env_intensity': f'Env_intensity_{outcomeYear}'}, inplace=True) 
    data3 = pd.merge(data3, data2, on=["CompanyName"])
    
    filter_col = [col for col in data3 if ((col.startswith('Env_intensity') and not(col.endswith(f'{outcomeYear}')))) or ((col.startswith('industry_avg_year') and not(col.endswith(f'{outcomeYear}')))) or ((col.startswith('Industry_indicator_year_') and not(col.endswith(f'{outcomeYear}')))) or ((col.startswith('Environmental_Growth_') and not(col.endswith(f'{outcomeYear}'))))]            
    outcome_col = [col for col in data3 if (col.startswith('Env_intensity') and col.endswith(f'{outcomeYear}'))]
    X=data3[filter_col]
    y=data3[outcome_col]
    
    x = sm.add_constant(X)
    print(sm.OLS(y, x).fit().summary())

In [None]:
predictiveModel_v3(2019, years, df_c)

                            OLS Regression Results                            
Dep. Variable:     Env_intensity_2019   R-squared:                       0.878
Model:                            OLS   Adj. R-squared:                  0.876
Method:                 Least Squares   F-statistic:                     596.2
Date:                Thu, 15 Jul 2021   Prob (F-statistic):               0.00
Time:                        22:36:03   Log-Likelihood:                 1109.2
No. Observations:                1010   AIC:                            -2192.
Df Residuals:                     997   BIC:                            -2128.
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
                                   coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------------
const           

Summary: Using environmental intensity and industry average year have highest explanatory power. The coefficients for industry average of 2016 and 2018 and environmental intensity for 2018 are statistically significant. 2018 EI has highest(1.02) fixed effect on 2019 EI.

## Dickey–Fuller test

In [27]:
from statsmodels.tsa.stattools import adfuller
df=pd.read_csv('Environmental_impact_cleaned.csv')
ind = df.copy()
y2018=list(ind[ind['Year'] == 2018]['Env_intensity'])
y2018=pd.Series(y2018)
X = y2018.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))


mean1=-0.129262, mean2=-0.101468
variance1=0.094017, variance2=0.052920
ADF Statistic: -24.980342
p-value: 0.000000
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568


In [None]:
y2017=list(ind[ind['Year'] == 2017]['Env_intensity'])
y2017=pd.Series(y2017)
from statsmodels.tsa.stattools import adfuller
X = y2017.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.123985, mean2=-0.108138
variance1=0.089221, variance2=0.063000
ADF Statistic: -26.071015
p-value: 0.000000
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568


In [None]:
y2016=list(ind[ind['Year'] == 2016]['Env_intensity'])
y2016=pd.Series(y2016)
from statsmodels.tsa.stattools import adfuller
X = y2016.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

ADF Statistic: -38.998411
p-value: 0.000000
Critical Values:
mean1=-0.128953, mean2=-0.113759
variance1=0.084427, variance2=0.066553
	1%: -3.434
	5%: -2.863
	10%: -2.568


In [None]:
y2015=list(ind[ind['Year'] == 2015]['Env_intensity'])
y2015=pd.Series(y2015)
from statsmodels.tsa.stattools import adfuller
X = y2015.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.128073, mean2=-0.112539
variance1=0.072975, variance2=0.066954
ADF Statistic: -25.147693
p-value: 0.000000
Critical Values:
	1%: -3.435
	5%: -2.863
	10%: -2.568


In [None]:
y2014=list(ind[ind['Year'] == 2014]['Env_intensity'])
y2014=pd.Series(y2014)
from statsmodels.tsa.stattools import adfuller
X = y2014.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.113915, mean2=-0.100781
variance1=0.062142, variance2=0.055520
ADF Statistic: -15.509563
p-value: 0.000000
Critical Values:
	1%: -3.435
	5%: -2.864
	10%: -2.568


Summary: The environmental intensity from 2014 to 2018 for each year the data is stational

Next,let's exam the industry average for each year to see whether they are stational

In [None]:
ind.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14515 entries, 0 to 14514
Data columns (total 39 columns):
 #   Column                                                  Non-Null Count  Dtype  
---  ------                                                  --------------  -----  
 0   ISIN                                                    14515 non-null  object 
 1   Year                                                    14515 non-null  int64  
 2   CompanyName                                             14515 non-null  object 
 3   Country                                                 14515 non-null  object 
 4   Industry(Exiobase)                                      14515 non-null  object 
 5   EnvironmentalIntensity(Sales)                           14515 non-null  object 
 6   EnvironmentalIntensity(OpInc)                           13700 non-null  object 
 7   TotalEnvironmentalCost                                  14515 non-null  object 
 8   WorkingCapacity                     

Test whether the industry average for each year we used is stationary :


In [None]:
y2018=list(ind[ind['Year'] == 2018]['industry_avg_year'])
y2018=pd.Series(y2018)
from statsmodels.tsa.stattools import adfuller
X = y2018.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.121867, mean2=-0.108862
variance1=0.032617, variance2=0.026903
ADF Statistic: -10.463661
p-value: 0.000000
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568


In [None]:
y2017=list(ind[ind['Year'] == 2017]['industry_avg_year'])
y2017=pd.Series(y2017)
from statsmodels.tsa.stattools import adfuller
X = y2017.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.117744, mean2=-0.114378
variance1=0.030029, variance2=0.028428
ADF Statistic: -27.227921
p-value: 0.000000
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568


In [None]:
y2016=list(ind[ind['Year'] == 2016]['industry_avg_year'])
y2016=pd.Series(y2016)
from statsmodels.tsa.stattools import adfuller
X = y2016.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.125713, mean2=-0.117000
variance1=0.035538, variance2=0.033986
ADF Statistic: -38.943780
p-value: 0.000000
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568


In [None]:
y2015=list(ind[ind['Year'] == 2015]['industry_avg_year'])
y2015=pd.Series(y2015)
from statsmodels.tsa.stattools import adfuller
X = y2015.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.125579, mean2=-0.115032
variance1=0.038191, variance2=0.033880
ADF Statistic: -20.655671
p-value: 0.000000
Critical Values:
	1%: -3.435
	5%: -2.863
	10%: -2.568


Summary:Each year from 2015 to 2018, the industry average is stational. And the model we used which includes the the industry average is predictable.

Let's take a look at environmental growth

In [28]:
ind['Environmental_Growth']=ind['Environmental_Growth'].fillna(ind['Environmental_Growth'].mean())

In [29]:
y2016=list(ind[ind['Year'] == 2016]['Environmental_Growth'])
y2016=pd.Series(y2016)
from statsmodels.tsa.stattools import adfuller
X = y2016.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=5.411575, mean2=4.908355
variance1=1923.017251, variance2=5805.189898
ADF Statistic: -13.302222
p-value: 0.000000
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568


In [30]:
y2017=list(ind[ind['Year'] == 2017]['Environmental_Growth'])
y2017=pd.Series(y2017)
from statsmodels.tsa.stattools import adfuller
X = y2017.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=99.633708, mean2=-5.132693
variance1=5728066.542782, variance2=1756.053671
ADF Statistic: -41.477166
p-value: 0.000000
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568


In [31]:
y2018=list(ind[ind['Year'] == 2018]['Environmental_Growth'])
y2018=pd.Series(y2018)
from statsmodels.tsa.stattools import adfuller
X = y2018.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=2.993927, mean2=0.514992
variance1=5296.795823, variance2=4757.639775
ADF Statistic: -40.731334
p-value: 0.000000
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568


Summary: All of the environmental growths that we used is stational.

Let's take a look at top five industries

In [None]:
ind.groupby('Industry(Exiobase)')['Env_intensity'].count().sort_values()

Industry(Exiobase)
Cultivation of cereal grains nec                                                                                          1
Forestry, logging and related service activities (02)                                                                     5
Sea and coastal water transport                                                                                           6
Education (80)                                                                                                            6
Production of electricity by petroleum and other oil derivatives                                                         12
Mining of coal and lignite; extraction of peat (10)                                                                      15
Manufacture of tobacco products (16)                                                                                     22
Copper production                                                                                                

Top five industries:

Retail trade, except of motor vehicles and motorcycles; repair of personal and household goods (52)                    
Real estate activities(70)                                                                                    
Construction (45)                                                 
Manufacture of electrical machinery and apparatus n.e.c. (31)                                                 
Financial intermediation, except insurance and pension funding (65) 

In [None]:
listind=['Retail trade, except of motor vehicles and motorcycles; repair of personal and household goods (52)',
'Real estate activities(70)',
'Construction (45)',
'Manufacture of electrical machinery and apparatus n.e.c. (31)',
'Financial intermediation, except insurance and pension funding (65)']
num_order_new = ind[(ind['Industry(Exiobase)']=='Construction (45)')|(ind['Industry(Exiobase)'] == 'Financial intermediation, except insurance and pension funding (65)')|(ind['Industry(Exiobase)'] == 'Manufacture of electrical machinery and apparatus n.e.c. (31)')
|(ind['Industry(Exiobase)'] == 'Real estate activities(70)')|(ind['Industry(Exiobase)'] == 'Retail trade, except of motor vehicles and motorcycles; repair of personal and household goods (52)')]
num_order_new  



Unnamed: 0,ISIN,Year,CompanyName,Country,Industry(Exiobase),EnvironmentalIntensity(Sales),EnvironmentalIntensity(OpInc),TotalEnvironmentalCost,WorkingCapacity,FishProductionCapacity,CropProductionCapacity,MeatProductionCapacity,Biodiversity,AbioticResources,Waterproductioncapacity(Drinkingwater&IrrigationWater),WoodProductionCapacity,SDG1.5,SDG2.1,SDG2.2,SDG2.3,SDG2.4,SDG3.3,SDG3.4,SDG3.9,SDG6,SDG12.2,SDG14.1,SDG14.2,SDG14.3,SDG14.c,SDG15.1,SDG15.2,SDG15.5,%Imputed,Env_intensity,industry_avg,industry_avg_year,Industry_indicator_year,Environmental_Growth
1,GB00B1YW4409,2010,3I GROUP PLC,United Kingdom,"Financial intermediation, except insurance and...",-0.12%,-0.11%,-1055812,-1032103,-277,-13751,-3221,-47,-562,-5953,102,-463300,-295103,-294949,-3438,-3438,-47957,59044,-74,-5953,-562,-4,0,-133,-4,51,51,-43,10%,-0.0012,-0.028537,-0.006402,1,
2,GB00B1YW4409,2011,3I GROUP PLC,United Kingdom,"Financial intermediation, except insurance and...",-0.16%,-0.16%,-961875,-940402,-246,-12525,-2935,-42,-424,-5378,77,-421928,-264714,-264579,-3131,-3131,-42961,44515,-56,-5378,-424,-3,0,-119,-3,38,38,-39,9%,-0.0016,-0.028537,-0.009838,1,33.333333
3,GB00B1YW4409,2012,3I GROUP PLC,United Kingdom,"Financial intermediation, except insurance and...",-0.15%,,-722999,-706893,-183,-9414,-2206,-32,-295,-4030,54,-317104,-197859,-197760,-2354,-2354,-32095,30960,-39,-4030,-295,-2,0,-89,-2,27,27,-30,8%,-0.0015,-0.028537,-0.024437,1,-6.250000
50,DE0005408116,2012,AAREAL BANK AG,Germany,"Financial intermediation, except insurance and...",-0.10%,-0.69%,-1615657,-1540246,-504,-20570,-4810,-79,-1932,-47899,383,-693698,-498714,-498400,-5143,-5143,-81852,217411,-341,-47899,-1932,-13,-2,-233,-16,192,192,-66,20%,-0.0010,-0.028537,-0.024437,1,
51,DE0005408116,2013,AAREAL BANK AG,Germany,"Financial intermediation, except insurance and...",-0.11%,-0.56%,-1561584,-1483364,-469,-19802,-4632,-74,-1655,-51916,328,-667625,-469884,-469596,-4950,-4950,-76994,186258,-292,-51916,-1655,-11,-2,-219,-13,164,164,-63,19%,-0.0011,-0.028537,-0.025627,1,10.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14485,KYG989221000,2016,ZHEN DING TECHNOLOGY HOLDING LIMITED,Taiwan,Manufacture of electrical machinery and appara...,-8.73%,-155.74%,-222083216,-204825973,-47125,-2713781,-635954,-8532,-8129,-13843967,246,-91295951,-53183570,-53158871,-678445,-678445,-8572045,-630018,-1955,-13843967,-8129,-27,-61,-23440,-32,123,123,-8504,1%,-0.0873,-0.067427,-0.073788,-1,-5.825243
14486,KYG989221000,2017,ZHEN DING TECHNOLOGY HOLDING LIMITED,Taiwan,Manufacture of electrical machinery and appara...,-6.03%,-76.11%,-221429493,-203149289,-46856,-2686271,-629599,-8463,-11727,-14897562,273,-90384665,-52689302,-52665410,-671568,-671568,-8493012,-910285,-2820,-14897562,-11727,-36,-88,-23254,-43,136,136,-8426,1%,-0.0603,-0.067427,-0.068294,1,-30.927835
14487,KYG989221000,2018,ZHEN DING TECHNOLOGY HOLDING LIMITED,Taiwan,Manufacture of electrical machinery and appara...,-6.15%,-48.85%,-236938822,-217968519,-51373,-2886977,-676611,-9212,-12305,-15337583,3756,-97158892,-57405552,-57379583,-721744,-721744,-9265219,1097855,-2959,-15337583,-12305,-155,-92,-25366,-185,1878,1878,-9054,3%,-0.0615,-0.067427,-0.076812,1,1.990050
14488,KYG989221000,2019,ZHEN DING TECHNOLOGY HOLDING LIMITED,Taiwan,Manufacture of electrical machinery and appara...,-6.02%,-49.99%,-241827228,-223793936,-52466,-2966025,-694995,-9454,-12822,-14301445,3914,-99793897,-58835287,-58808848,-741506,-741506,-9494000,936891,-3084,-14301445,-12822,-162,-96,-25898,-193,1957,1957,-9290,3%,-0.0602,-0.067427,-0.075113,1,-2.113821


In [None]:
y2018=list(num_order_new[num_order_new['Year'] == 2018]['Env_intensity'])
y2018=pd.Series(y2018)
from statsmodels.tsa.stattools import adfuller
X = y2018.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.041328, mean2=-0.039877
variance1=0.010222, variance2=0.005468
ADF Statistic: -21.045903
p-value: 0.000000
Critical Values:
	1%: -3.445
	5%: -2.868
	10%: -2.570


In [None]:
y2017=list(num_order_new[num_order_new['Year'] == 2017]['Env_intensity'])
y2017=pd.Series(y2017)
from statsmodels.tsa.stattools import adfuller
X = y2017.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.041998, mean2=-0.035402
variance1=0.010698, variance2=0.002860
ADF Statistic: -5.443552
p-value: 0.000003
Critical Values:
	1%: -3.446
	5%: -2.868
	10%: -2.570


In [None]:
y2016=list(num_order_new[num_order_new['Year'] == 2016]['Env_intensity'])
y2016=pd.Series(y2016)
from statsmodels.tsa.stattools import adfuller
X = y2016.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.044473, mean2=-0.037712
variance1=0.011978, variance2=0.002783
ADF Statistic: -20.528414
p-value: 0.000000
Critical Values:
	1%: -3.446
	5%: -2.868
	10%: -2.570


In [None]:
y2015=list(num_order_new[num_order_new['Year'] == 2015]['Env_intensity'])
y2015=pd.Series(y2015)
from statsmodels.tsa.stattools import adfuller
X = y2015.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.043269, mean2=-0.028244
variance1=0.012533, variance2=0.019730
ADF Statistic: -18.508807
p-value: 0.000000
Critical Values:
	1%: -3.449
	5%: -2.870
	10%: -2.571


Let's check the top 3 industries

In [None]:
num_order_new = ind[(ind['Industry(Exiobase)']=='Construction (45)')|(ind['Industry(Exiobase)'] == 'Financial intermediation, except insurance and pension funding (65)')|(ind['Industry(Exiobase)'] == 'Manufacture of electrical machinery and apparatus n.e.c. (31)')]

In [None]:
y2018=list(num_order_new[num_order_new['Year'] == 2018]['Env_intensity'])
y2018=pd.Series(y2018)
from statsmodels.tsa.stattools import adfuller
X = y2018.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.045647, mean2=-0.045448
variance1=0.012042, variance2=0.006497
ADF Statistic: -19.134100
p-value: 0.000000
Critical Values:
	1%: -3.449
	5%: -2.870
	10%: -2.571


In [None]:
y2017=list(num_order_new[num_order_new['Year'] == 2017]['Env_intensity'])
y2017=pd.Series(y2017)
from statsmodels.tsa.stattools import adfuller
X = y2017.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.039995, mean2=-0.040480
variance1=0.003754, variance2=0.003352
ADF Statistic: -19.354295
p-value: 0.000000
Critical Values:
	1%: -3.448
	5%: -2.870
	10%: -2.571


In [None]:
y2016=list(num_order_new[num_order_new['Year'] == 2016]['Env_intensity'])
y2016=pd.Series(y2016)
from statsmodels.tsa.stattools import adfuller
X = y2016.values
result = adfuller(X)
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print('mean1=%f, mean2=%f' % (mean1, mean2))
print('variance1=%f, variance2=%f' % (var1, var2))
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
	print('\t%s: %.3f' % (key, value))

mean1=-0.048769, mean2=-0.043516
variance1=0.014048, variance2=0.003285
ADF Statistic: -19.089448
p-value: 0.000000
Critical Values:
	1%: -3.449
	5%: -2.870
	10%: -2.571


## Conclusion

Summary: All of the data that we used is stational.



Next, we will continue our analysis in the 'DistilBERT_CompaniesDescription' notebook