```{r setup}
#| include: false
library(tidyverse)
library(skimr)
library(ggthemes)
library(hrbrthemes)

library(reticulate)
use_python("C:\\Users\\holej\\ANACON~1\\python.exe")

theme_set(theme_ipsum()+
          theme(strip.background =element_rect(fill="lightgray"),
                axis.title.x = 
                  element_text(angle = 0,
                               size = rel(1.33),
                               margin = margin(10,0,0,0)),
                axis.title.y = 
                  element_text(angle = 0,
                               size = rel(1.33),
                               margin = margin(0,10,0,0))
                )
          )
```



# Introduction

Looking at a companies ESG when making investment decisions is very beneficial to the investors because having a good record on ESG corresponds to a reduced investment risk. NYU Stern Center for Sustainable Business did an analysis of over 1,000 studies published since 2015 and showed that a strong management of ESG is linked to improved Return on Equity, Return on Assets, stock price and operational efficiency.

The specific issue that this project is going to address is the skepticism of some investors towards ESG integration. The analysis shown will provide evidence of the financial benefits of incorporating ESG metrics into investment strategies. 


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Data Collection

My python code for collecting this data has been turned in through Bright Space. Here I am importing my csv files to run analysis on the data. 

In [None]:
esg_cont_data_path = 'C:/Users/holej/Documents/hco21.github.io/data/esg_ratings_and_cont_level.csv'
balance_sheet_data_path = 'C:/Users/holej/Documents/hco21.github.io/data/balance_sheet_final.csv'
historical_data_path = 'C:/Users/holej/Documents/hco21.github.io/data/history_data_final.csv'
income_statements_data_path = 'C:/Users/holej/Documents/hco21.github.io/data/income_statements_final.csv'

esg_cont_data = pd.read_csv(esg_cont_data_path)
balance_sheet_data = pd.read_csv(balance_sheet_data_path)
historical_data = pd.read_csv(historical_data_path)
income_statements_data = pd.read_csv(income_statements_data_path)

esg_cont_data = esg_cont_data.rename(columns ={'symbol':'Symbol'})

# Descriptive Statistics


In [None]:
esg_cont_data.describe()

plt.figure(figsize = (8,6))
sns.histplot(data = esg_cont_data, x = 'total', bins = 20, kde = True)
plt.title('Distribution of Total ESG Scores')
plt.xlabel('Total ESG Score')
plt.ylabel('Frequency')
plt.show()

This graph shows that the distribution of Total ESG Scores are normally distributed across all companies. 


In [None]:
historical_stats = historical_data['open'].describe()
print(historical_stats)

# Create distribution plot for 'open' column in historical data
plt.figure(figsize=(8, 6))
sns.histplot(data=historical_data, x='Open', bins=20, kde=True)
plt.title('Distribution of Open Prices')
plt.xlabel('Open Price')
plt.ylabel('Frequency')
plt.show()

It can be seen here that most of the companies have the same open price every day. 

In [None]:
esg_corr = esg_cont_data.corr()
plt.figure(figsize=(8, 6))
sns.heatmap(esg_corr, annot=True, cmap='coolwarm', fmt=".2f", linewidths = .5)
plt.title('Correlation Heatmap for ESG Data')
plt.show()

In [None]:
balance_sheet_corr = balance_sheet_data.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(balance_sheet_corr, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap for Balance Sheet Data')
plt.show()

As seen above, I wrote code to generate heat maps however they were only showing up blank. 

In [None]:
# retrieves a list of the companies in the top 10% total ESG scores
ninety_percentile = esg_cont_data['total'].quantile(.90)
top_ten_per_esg = esg_cont_data[esg_cont_data['total'] > ninety_percentile]
top_ten = pd.DataFrame(top_ten_per_esg)

plt.figure(figsize = (8,6))
sns.histplot(data = top_ten, x = 'total', bins = 20, kde = True)
plt.title('Distribution of Top 10% ESG Scores')
plt.xlabel('Top 10% ESG Score')
plt.ylabel('Frequency')
plt.show()

This graph shows the portion of the companies that are in the top 10% of ESG scores.

In [None]:
retrieves a list of the companies in the lowest 10% total ESG scores
ten_percentile = esg_cont_data['total'].quantile(.10)
bot_ten_per_esg = esg_cont_data[esg_cont_data['total'] < ten_percentile]
bot_ten = pd.DataFrame(bot_ten_per_esg)

plt.figure(figsize = (7,6))
sns.histplot(data = bot_ten, x = 'total', bins = 20, kde = True)
plt.title('Distribution of Bottom 10% ESG Scores')
plt.xlabel('Bottom 10% ESG Score')
plt.ylabel('Frequency')

In [None]:
# Plot scatter plot for environmental (env) and social (soc) components

plt.figure(figsize=(10, 6))

plt.scatter(top_ten['env'], top_ten['soc'], color='blue', label='Top 10%')
plt.scatter(bot_ten['env'], bot_ten['soc'], color='red', label='Bottom 10%')


plt.xlabel('Environmental Score')
plt.ylabel('Social Score')
plt.title('Scatter Plot of Environmental vs. Social Scores')
plt.legend()
plt.grid(True)
plt.show()


Here it can be seen that the companies in the top 10% have both a higher Social and Environmental score than those companies in the bottom 10%. 

In [None]:
# creating new data frames of balance sheet data with only the top 10% and bottom
# 10% percent companies when looking at esg
top_ten_bs = pd.merge(top_ten, balance_sheet_data, on ='Symbol')
bottom_ten_bs = pd.merge(bot_ten, balance_sheet_data, on = 'Symbol')
# creating new data frames of income statements data with only the top 10% and bottom
# 10% percent companies when looking at esg
top_ten_is = pd.merge(top_ten, income_statements_data, on ='Symbol')
bottom_ten_is = pd.merge(bot_ten, income_statements_data, on ='Symbol')
# creating new data frames of historical data with only the top 10% and bottom
# 10% percent companies when looking at esg
top_ten_hist = pd.merge(top_ten, historical_data, on ='Symbol')
bottom_ten_hist = pd.merge(bot_ten, historical_data, on ='Symbol')

# Exploratory Data Analysis

I will now look to see if there is a correlation between a company having a high ESG score and also being more financially reliable. 

In [None]:
# Plot scatter plot for ESG score vs. Stockholders Equity
plt.figure(figsize=(10, 6))

plt.scatter(top_ten_bs['total'], top_ten_bs['Stockholders Equity'], color='blue', label='Top 10%')
plt.scatter(bottom_ten_bs['total'], bottom_ten_bs['Stockholders Equity'], color='red', label='Bottom 10%')

plt.xlabel('Total ESG Score')
plt.ylabel('Stockholders Equity')
plt.title('Scatter Plot of ESG Score vs. Stockholders Equity')
plt.legend()
plt.grid(True)
plt.show()

This scatter plot shows the relationship between ESG score and stockholders equity. Although many of the points are in the same range, the only companies that have both a high ESG score and high stockholders equity value are in the top 10% of companies. A higher stockholders equity indicates a company having more financial stability. 


In [None]:
plt.figure(figsize=(10, 6))

plt.scatter(top_ten_is['total'], top_ten_is['EBIT'], color='blue', label='Top 10%')
plt.scatter(bottom_ten_is['total'], bottom_ten_is['EBIT'], color='red', label='Bottom 10%')

plt.xlabel('Total ESG Score')
plt.ylabel('Total Operating Income')
plt.title('Scatter Plot of ESG Score vs. Operating Income')
plt.legend()
plt.grid(True)
plt.show()

In this graph we can see that only companies in the top 10% ESG scores have high operating incomes. This indicates the companies ability to profit from it's operations and tells someone looking at the data that the company will have longevity. 

# Significance of the Project

The data analysis done in this project can be applied to various real-world applications. For example someone looking to buy a house may use an analysis similar to this one to see the trends in the housing market, crime rate in different areas, housing price in different areas and so many more areas. Other areas where data analysis can be extremely beneifical is in athletics, say someone wanted to do really good in their fantasy football league, they could conduct a data analysis like this one to ensure they are picking the best people for their team. 

Those are two very different examples of how one could use data analysis but it shows how versatile and useful it really can be. 

# References 

For this project I used:
-Class Notes
-ChatGPT
-various Websites (listed below)

My main use for ChatGPT was in learning how ESG and financial analysis are connected and some ways that I could show that using the data I had collected. I also used it to debug my code as I was writing it. 

I also got help from a classmate to complete my data collection code. 

Links: 

https://concentricsolutions.com/images/os-resources/OS%20White%20Paper%20-%20Align%20ESG-Fin%20Reporting.pdf

https://www.wolterskluwer.com/en/expert-insights/the-importance-of-esg-as-a-key-drive-of-corporate-performance

https://www.sigmacomputing.com/resources/learn/data-analytics-applications
