Link to Part 1: https://www.kaggle.com/kingkarth/esg-countries/edit

In Part 2, we will be looking at ESG metrics, specifically applied to companies. When investing in a company, investors want to invest in something both successful, but also ecologically and socially aware of their impact, and how their work affects others. Although there is both information on economics and information on the environment, there is a distinct lack of connection between the two - something that we looked to solve with this project. In this project, we now look towards companies, and creating a machine learning algorithm, that, when given both general metrics along with their ESG score, their revenue can be predicted, allowing for much more informed and beneficial investments.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/f500esg/f500esg.csv


The data for this project came from two distinct sources. First, to get the Fortune 500 data, I used this dataset (link: https://www.kaggle.com/shaz13/fotune500-2017). From here, I collected a lot of financial information for the countries, but failed to register any ESG data. For this, I used Refinitiv (link: https://www.refinitiv.com/en/financial-data/company-data/esg-data), where I used the API to request ESG scores from companies, before manually inserting them in. I did this for ~150 companies, disqualifying ones where the ESG score wasn't available.

In [2]:
f500_path = '../input/f500esg/f500esg.csv'
f500_data = pd.read_csv(f500_path) 
print(f500_data.columns)

Index(['Rank', 'Company Name', 'Number of Employees', 'Previous Rank',
       'Revenues', 'Revenue Change', 'Profits', 'Profit Change', 'Assets',
       'Market Value', 'ESG Score'],
      dtype='object')


In [3]:
f500_data = f500_data.dropna(axis=0)
f500_data['Revenues'] = f500_data['Revenues'].apply(lambda x: x.replace(',', ''))
f500_data['Revenues'] = f500_data['Revenues'].apply(lambda x: x.replace('$', ''))
f500_data = f500_data[f500_data.Profits != "-"]
f500_data = f500_data[f500_data.Assets != "-"]

In [4]:
revenues = f500_data.Revenues

In [5]:
data = [f500_data["Number of Employees"], f500_data["Profits"], f500_data["Assets"], f500_data["Market Value"], f500_data["ESG Score"]]
new_features = ['num_employees', 'profits', 'assets', 'market_value', 'esg']

new_x = pd.concat(data, axis=1, keys=new_features)

new_x['num_employees'] = new_x['num_employees'].apply(lambda x: x.replace(',', ''))
new_x['profits'] = new_x['profits'].apply(lambda x: x.replace(',', ''))
new_x['profits'] = new_x['profits'].apply(lambda x: x.replace('$', ''))
new_x['assets'] = new_x['assets'].apply(lambda x: x.replace(',', ''))
new_x['assets'] = new_x['assets'].apply(lambda x: x.replace('$', ''))
new_x['market_value'] = new_x['market_value'].apply(lambda x: x.replace(',', ''))
new_x['market_value'] = new_x['market_value'].apply(lambda x: x.replace('$', ''))

I eliminated the commas and the dollar signs to ensure that all of the values could be cast to floats, making the compilation of the code easier.

In [6]:
new_x.describe()

Unnamed: 0,esg
count,131.0
mean,68.152672
std,15.825721
min,9.0
25%,61.0
50%,71.0
75%,79.0
max,96.0


In [7]:
new_x.head()

Unnamed: 0,num_employees,profits,assets,market_value,esg
0,2300000,13643.0,198825.0,218619.0,85.0
1,367700,24074.0,620854.0,411035.0,26.0
2,116000,45687.0,321686.0,753718.0,73.0
3,72700,7840.0,330314.0,340056.0,71.0
4,68000,2258.0,56563.0,31439.0,62.0


In [8]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(new_x, revenues, test_size=0.25,random_state=0)
print ('Train set:', X_train.shape,  Y_train.shape)
print ('Test set:', X_test.shape,  Y_test.shape)

new_revenue_model = DecisionTreeRegressor(random_state=1)
new_revenue_model.fit(X_train, Y_train)

Train set: (98, 5) (98,)
Test set: (33, 5) (33,)


DecisionTreeRegressor(random_state=1)

This is where I created my Linear Regression model. For this, I used the DecisionTreeRegressor from the sklearn library. I split my data into train/test with a 75/25 split, and from that, created a DecisionTreeRegressor trained on the training data which the model created. This included 98 unique companies.

In [9]:
vals = new_revenue_model.predict(X_test)*1000 # Scaled to account for deflation of numbers in original data set
for val in vals:
    print("${:,.2f}".format(val))

$223,604,000.00
$49,247,000.00
$29,003,000.00
$70,166,000.00
$21,987,000.00
$184,840,000.00
$41,863,000.00
$71,726,000.00
$36,556,000.00
$49,247,000.00
$39,302,000.00
$184,840,000.00
$115,337,000.00
$48,158,000.00
$21,987,000.00
$60,906,000.00
$19,037,000.00
$23,825,000.00
$205,004,000.00
$23,441,000.00
$39,302,000.00
$19,941,000.00
$223,604,000.00
$27,519,000.00
$38,537,000.00
$24,508,000.00
$39,668,000.00
$39,639,000.00
$37,949,000.00
$65,017,000.00
$19,941,000.00
$21,609,000.00
$79,919,000.00


Finally, the results of my work. I have attached the predicted revenues of 33 unique companies here, which have a relatively high level of accuracy, based on the fact that there was a correlation between ESG score and revenue. Although there were some outliers, I believe that my model allows for investors to predict whether companies will be worth their while, especially given the importance of ESG in today's world.

I hope you enjoyed looking at my work - it was a fun project to tackle!