# Sprint 6 - Predicting future environmental intensity

In this notebook, we will predict future environmental intensity for all the companies in the 'Excel data'. 

First, we will create the following columns:

1) Industry Indicator
- 1 if above the industry average in 2020
- 0 if at industry average in 2020
- (-1) if below the industry average in 2020
2) Environmental Intensity Growth : ((Environmental Intensity in Current Year) / (Environmental Intensity Last Year) - 1) * 100

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import missingno as msno
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('/Users/maralinetorres/Documents/GitHub/Predicting-Environmental-and-Social-Actions/Datasets/Final-Sample-External-with-ISINs.csv')
column_list = []
for column in df.columns:
    column_list.append(column.replace(' ', ''))
df.columns = column_list
print(f'The dataset has {df.shape[0]} rows and {df.shape[1]} columns')
df.head(3)

The dataset has 14515 rows and 34 columns


Unnamed: 0,ISIN,Year,CompanyName,Country,Industry(Exiobase),EnvironmentalIntensity(Sales),EnvironmentalIntensity(OpInc),TotalEnvironmentalCost,WorkingCapacity,FishProductionCapacity,...,SDG6,SDG12.2,SDG14.1,SDG14.2,SDG14.3,SDG14.c,SDG15.1,SDG15.2,SDG15.5,%Imputed
0,GB00BMX64W89,2019,Saga plc,United Kingdom,Activities auxiliary to financial intermediati...,-2.89%,-13.03%,-31842309,-31150754,-7184,...,-170776,-1059,-5,-1,-3585,-6,71,71,-1297,1%
1,MYL1818OO003,2019,BURSA MALAYSIA BHD,Malaysia,Activities auxiliary to financial intermediati...,-1.68%,-3.47%,-1968379,-1924910,-451,...,-11502,-168,-1,-1,-222,-2,10,10,-79,4%
2,GB0031638363,2019,INTERTEK GROUP PLC,United Kingdom,Activities auxiliary to financial intermediati...,-1.53%,-9.49%,-60599272,-59281663,-13774,...,-324960,-3804,-17,-4,-6861,-20,254,254,-2470,1%


In [3]:
df = df.iloc[:,1:6]
df.head()

Unnamed: 0,Year,CompanyName,Country,Industry(Exiobase),EnvironmentalIntensity(Sales)
0,2019,Saga plc,United Kingdom,Activities auxiliary to financial intermediati...,-2.89%
1,2019,BURSA MALAYSIA BHD,Malaysia,Activities auxiliary to financial intermediati...,-1.68%
2,2019,INTERTEK GROUP PLC,United Kingdom,Activities auxiliary to financial intermediati...,-1.53%
3,2019,JSE LIMITED,South Africa,Activities auxiliary to financial intermediati...,-1.46%
4,2019,BUREAU VERITAS SA,France,Activities auxiliary to financial intermediati...,-0.70%


In [4]:
def percent_to_float(s):
    return float(s.strip('%')) / 100.0

replace_dict = {'(':'',')':'', ' ' : '', ',' : ''}
def paranthesis_to_minus(value):
    for i, j in replace_dict.items():
        value = value.replace(i, j)
    value = int(f'-{value}')
    return value

df['Env_intensity'] = df['EnvironmentalIntensity(Sales)'].apply(percent_to_float)

In [5]:
df.head()

Unnamed: 0,Year,CompanyName,Country,Industry(Exiobase),EnvironmentalIntensity(Sales),Env_intensity
0,2019,Saga plc,United Kingdom,Activities auxiliary to financial intermediati...,-2.89%,-0.0289
1,2019,BURSA MALAYSIA BHD,Malaysia,Activities auxiliary to financial intermediati...,-1.68%,-0.0168
2,2019,INTERTEK GROUP PLC,United Kingdom,Activities auxiliary to financial intermediati...,-1.53%,-0.0153
3,2019,JSE LIMITED,South Africa,Activities auxiliary to financial intermediati...,-1.46%,-0.0146
4,2019,BUREAU VERITAS SA,France,Activities auxiliary to financial intermediati...,-0.70%,-0.007


### Creating industry indicator

In [6]:
industry_avg = df.groupby('Industry(Exiobase)')[['Env_intensity']].mean().reset_index()
df['industry_avg'] = df['Env_intensity'].groupby(df['Industry(Exiobase)']).transform('mean')

In [19]:
def create_ind(df):
    if(df['Env_intensity'] > df['industry_avg']):
        return 1
    elif (df['Env_intensity'] == df['industry_avg']):
        return 0
    elif (df['Env_intensity'] < df['industry_avg']):
        return -1

df['Industry_indicator'] = df.apply(create_ind, axis = 1)
df.head()

### Creating Environmental growth