# ESG Score Prediction

## Notebook Outline :

1. Introduction (ESG Score, Calculation method, Factors - Summary based on the TR pdf)
2. Data Explanation (Features - what they are)
3. Data Processing - Outlier Detection, Feature Transformation 
4. EDA - Basic Insights
6. Feature Selection/Importance
5. Data Modelling

# Introduction

### ESG Score

Environmental, social and governance score is a way of measuring important parameters of a company to evaluate its sustainibility. It is available in both percentages and letter grades (D- to A+). Thomas Reuters ESG scores was designed to measure a company's ESG performance based on several themes such as emissions, environmental
product innovation, human rights, shareholders, etc. This uses publicly available data from around 6000 public companies and around 400 ESG metrics. 

### Factors 

The ESG score is based on a number of different factors.
<br> • Environmental factors include resource use, emissions, innovations, etc. 
<br> • Social factors include workforce, human rights, community etc. 
<br> • Governance factors include management, share holders, CSR strategy etc. 

### Calculation Method

The calculation of the overall ESG score is based on two kinds of ESG scores.


Thomas Reuters ESG score - Out of publicly reported company data, 400 ESG measures are calculatd. Out of this, 178 data points are selected for the scoring process. It is then grouped into 10 categories. 

Thomas Reuters ESG Controversy score - The controversy category score is based on a list of 23 controversy topics. It is a comprehensive measure of the company's ESG performance relative to negative media stories captured from global media. 

The ESG Score Calculation Methodology:

Percentile Rank Method is used to calculate the scores. It is based on three factors:
<br> • How many companies have the same value?
<br> • How many companies have a value at all?
<br> • How many companies are worse than the current one?



$$ Score = \frac{\text{No. of companies with a worst value} + \frac{\text{No. of companies with the same value included the current one}}{2}}{\text{No. of companies with a value}} $$



TODO :
1. Figure out Return and MC data values 
2. P/E Daily time series ration - what does this column mean? - Karthik
3. Read about KNN Imputation 
4. Null values and outlier 

'Total Current Assets', 'Total Current Liabilities', 'Total Debt', 'Total Assets, Reported' - Dev

'P/E (Daily Time Series Ratio)','CO2 Emissions','Total Revenue', 'Total Equity' - Karthik

 'Net Income - Actual','Revenue Per Share','Company Market Capitalization', 'PPE Total', - Sush

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df_firm = pd.read_csv("Firm_Data.csv")

FileNotFoundError: [Errno 2] No such file or directory: 'Firm_Data.csv'

In [None]:
df_firm.head(20)

In [None]:
df_firm.shape

In [None]:
df_firm.columns

In [None]:
# Lets rename some columns
df_firm.rename(columns={"Total CO2 Equivalent Emissions To Revenues USD in million": "CO2 Emissions", "Property Plant And Equipment, Total - Gross": "PPE Total"},inplace=True)

In [None]:
df_ID = pd.read_csv("ID_Data.csv")
df_ID.head()

In [None]:
df_return = pd.read_csv("Return_Data.csv")
df_return.head(20)

### Data Exploration

In [None]:
df_firm.info()

#### Share of Null Values in the data

In [None]:
df_firm.isnull().sum()/len(df_firm) *100

#### Outliers 

In [None]:
df_firm.describe([0.1,0.2,0.4,0.5,0.6,0.7,0.8,0.9,0.95,0.97,0.99])

In [None]:
df_firm['Total Current Assets'].quantile(0.8)

In [None]:
df_firm['Total Current Assets'].clip(upper=5000,inplace=True)

In [None]:
for i in columns :
    df_firm[i].clip(upper=df_firm[i].quantile(0.8),inplace=True)

#### Visualising the distribution 

In [None]:
columns = ['Total Current Assets', 'Total Current Liabilities',
       'Total Debt', 'Total Assets, Reported', 'Net Income - Actual',
       'Revenue Per Share', 'Total Revenue', 'Total Equity', 'CO2 Emissions',
       #'ESG Score', #'Social Pillar Score', 'Governance Pillar Score','Environmental Pillar Score',
        'Company Market Capitalization',
       'PPE Total', 'P/E (Daily Time Series Ratio)']

In [None]:
len(columns)

### Visualizations

In [None]:
import plotly.express as px

fig = px.scatter(df_firm, x="Total Current Assets", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig1 = px.scatter(df_firm, x="Total Current Liabilities", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig2 = px.scatter(df_firm, x="Total Debt", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig3 = px.scatter(df_firm, x="Total Assets, Reported", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig4 = px.scatter(df_firm, x="Net Income - Actual", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig5 = px.scatter(df_firm, x="Revenue Per Share", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig6 = px.scatter(df_firm, x="Total Revenue", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig7 = px.scatter(df_firm, x="Total Equity", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig8 = px.scatter(df_firm, x="CO2 Emissions", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig9 = px.scatter(df_firm, x="Company Market Capitalization", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig10 = px.scatter(df_firm, x="PPE Total", y="ESG Score", color= 'ESG Score',trendline= 'ols')
fig11 = px.scatter(df_firm, x="P/E (Daily Time Series Ratio)", y="ESG Score", color= 'ESG Score',trendline= 'ols')


fig.show()
fig1.show()
fig2.show()
fig3.show()
fig4.show()
fig5.show()
fig6.show()
fig7.show()
fig8.show()
fig9.show()
fig10.show()
fig11.show()

In [None]:
df_firm[columns].hist( figsize=(15,15))

plt.show()

In [None]:
cormat = df_firm[columns].corr()
sns.heatmap(cormat)
plt.figure(figsize=(20,20))
plt.show()

In [None]:
display(df_return)

### Adding yearly return data to firm data

In [None]:
df_yearlyreturndata = pd.read_csv("yearly_return.csv")

display(df_yearlyreturndata)

df_mergereturnandfirmdata = df_firm.merge(df_yearlyreturndata, on= ["RIC","Date"])

display(df_mergereturnandfirmdata)