In [None]:
import numpy as np
import pandas as pd 
import seaborn as sns
import keras
from keras import layers


In [None]:
sns.set(style="darkgrid") #configure seaborn

1. [Introduction](#1.-Introduction)  
    1.1 [Weaknesses in the Analysis](#1.1-Weaknesses-in-the-Analysis)  
2. [Import Data](#2.-Import-Data)  
    2.1 [InspectData](#2.1-Inspect-Data)  
3. [Return and ESG](#3.-Return-and-ESG)  
    3.1 [Independet Correlation](#3.1-Independet-Correlation)  
    3.1.1 [Correlation Between ESG](#3.1.1-Correlation-Between-ESG)  
    3.2 [Regression Model for the Fund Case](#3.2-Regression-Model-for-the-Fund-Case)  
    3.3 [Regression Model for the ETF Case](#3.3-Regression-Model-for-the-ETF-Case)  
    3.4 [Regression Model for the Combined Case](#3.4-Regression-Model-for-the-Combined-Case)  
4. [Conclusion](#4.-Conclusion)  

[Appendix: Data Comparison Between sustainability_score and ESG scores](#Appendix:-Data-Comparison-Between-sustainability_score-and-ESG-scores)


## 1. Introduction 

In this analysis I will try find out whether ESG (environment, social, governance) can be use for for predicting fund/ETF return. It is comprised of
* Analysis of the ESG data 
* Applying a neural net to see whether it can improve on optimal guessing
    - Separate analysis of funds, ETFs and the combined dataset

_Disclaimer: this is just an exploratory analysis made out of interest and curiosity and should not be used for any investment dicisions whatsoever._

### 1.1 Weaknesses in the Analysis
* There are several factors which could be adjusted for, e.g. sectors, regions etc
* No risk-adjustment
1. * Short time horizon: a more complete analysis would look into the correlation over a longer time, in order to consider:
    - Structural changes in the economy
    - Market trends
    - Changes before and after important events such as the financial crisis and covid-19 etc.

I might expand the analysis to include more features in the future to try to account for some of this.


## 2. Import Data

* Import ESG data
* Remove `NaN` values, we don't want to contaminate the analysis with missing data. I found several instances of `NaN` while inspecting the data.

In [None]:
etf_path = '../input/european-funds-dataset-from-morningstar/Morningstar - European ETFs.csv'
fund_path = '../input/european-funds-dataset-from-morningstar/Morningstar - European Mutual Funds.csv'
cols = ['isin','fund_trailing_return_ytd', 'social_score', 'environmental_score', 'governance_score', 'sustainability_score']
etf_return_esg = pd.read_csv(etf_path, usecols=cols).dropna()
fund_return_esg = pd.read_csv(fund_path, usecols=cols).dropna()

### 2.1 Inspect Data
We start off by having a quick look at the data to:
* Identify the format
* Spot potential errros
* Spot patterns or considerations we might have missed

In [None]:
fund_return_esg.head()

One of the first things to notice is that sustainability seems to be a sum of the ESG scores. We do a simple check below  

In [None]:
esg_cols = ['social_score', 'environmental_score', 'governance_score']
esg_sum = fund_return_esg[esg_cols].sum(axis=1)
df_esg_sum = esg_sum.to_frame('esg_sum')
df = fund_return_esg.join(df_esg_sum)

df[['isin', 'esg_sum','sustainability_score']].head()

And find that it is not true for all but they seem similar, enought that we look into it in the Appendix, but in short it does not seem to be the same data. Thus I keep it as a feature.

## 3. Return and ESG
In this section we try to find out whether ESG is useful in predicting the YTD return. I chose YTD return since I don't have historic ESG data, thus comparing returns for e.g. 2017 might not be relevant.

### 3.1 Independent Correlation
First we check the correlation matrix to see whether ther is any correlation between the ESG data and the YTD return. This is simply to spot any obvious correlation.

The YTD return seems independent for both funds and ETFs. It is close to zero for all. The only exception seems to be environmental score which had a weak _negative_ correlation for ETFs. 

However this is under the assumption that the variables are independent, which they are not as the matrix clearly shows. Thus we still might discover something if we try to dig further.

In [None]:
fund_return_esg.corr()

In [None]:
etf_return_esg.corr()

#### 3.1.1 Correlations Between ESG Scores

However we can note that the correlations between the ESG scores is quite high. Especially when using the fund data ($>0.8$ for all), for ETFs it's a bit more mixed, there the correlation is $0.4-0.8$.

We plot some correlations below, but first we must add a flag to make the ETF data distinct from the fund data. We also notice that there seems to be different biases in the data, ETFs have lower values.

In [None]:
etf_flag = pd.DataFrame(np.ones(len(etf_return_esg)), columns=['is_etf'], index=etf_return_esg.index)

df = pd.concat([etf_return_esg,etf_flag],axis=1) 
etf_flag_fund = pd.DataFrame(np.zeros(len(fund_return_esg)), columns=['is_etf'], index=fund_return_esg.index)
df2 =  pd.concat([fund_return_esg, etf_flag_fund], axis=1) 

return_esg_merged = df.append(df2)

sns.relplot(x='social_score', y='governance_score', hue='is_etf', data=return_esg_merged)

### 3.2 Regression Model for the Fund Case

We start out by doing a regression model for the fund case.

First convert the data to numpy arrays, `y` is what we want to predict, i.e. YTD returns. `x` are the features.

In [None]:
np_data = np.array(fund_return_esg)
x = np.float32(np_data[:,1:-1])
y = np.float32(np_data[:,-1])

Use optimal guessing as a benchmark

In [None]:
optimal_guessing = np.mean(y)
optimal_guessing_mean_accuracy = np.mean(abs(y-optimal_guessing))
print("Optimal guessing mean accuracy: ", optimal_guessing_mean_accuracy)

Create the regression model. I choose a pretty standard neural net model setup by using densly connected layers with 100 rectified linear units.

In [None]:
model = keras.Sequential(
    [
        layers.Dense(100, activation="relu", name="layer1"),
        layers.Dense(100, activation="relu", name="layer2"),
        layers.Dense(1, name="layer3")
    ]
)
model.compile(loss="mse", optimizer="adam", metrics=["mae"])


Run the model for 200 iterations*, using 10% of the examples for validation. 

\* _I chose the baseline of 100 but increased it to 200 since there was room for improvement in the convergence. There seems to be more room still, but I haven't increased it further, simply since I want to limit the execution time._

In [None]:
hist = model.fit(x,y, validation_split=0.1, epochs=200)

Plotting the result, comparing it with optimal guessing. It shows in improvement of ~27%. We have a nice correspondance between the training and validation plots.

In [None]:
df_hist = pd.DataFrame.from_dict(hist.history)
columns_map = {'mae' : 'training mean absolute error', 'val_mae' : 'validation mean absolute error'}
df_hist.rename(columns = columns_map, inplace=True)
df_optimal_guessing = pd.DataFrame(np.ones(len(df_hist))*optimal_guessing_mean_accuracy, columns=['optimal guessing mean accuracy'])
df_plot = df_hist[columns_map.values()].append(df_optimal_guessing)
sns.lineplot( data=df_plot)

### 3.3 Regression Model for the ETF Case
Create the same  model for the ETF case

In [None]:
np_data_etf = np.array(etf_return_esg)
x_etf = np.float32(np_data_etf[:,1:-1])
y_etf = np.float32(np_data_etf[:,-1])
optimal_guessing_etf = np.mean(y_etf)
optimal_guessing_mean_accuracy_etf = np.mean(abs(y_etf-optimal_guessing_etf))
print("Optimal guessing mean accuracy: ", optimal_guessing_mean_accuracy_etf)
model_etf = keras.Sequential(
    [
        layers.Dense(100, activation="relu", name="layer1"),
        layers.Dense(100, activation="relu", name="layer2"),
        layers.Dense(1, name="layer3")
    ]
)
model_etf.compile(loss="mse", optimizer="adam", metrics=["mae"])

Run the model

In [None]:
hist_etf = model_etf.fit(x_etf,y_etf, validation_split=0.1, epochs=200)

Plot the result. It shows in improvement of ~30%. However we don't have the same neat correspondance here. It seems like the model is overfitting the training data.

In [None]:
df_hist_etf = pd.DataFrame.from_dict(hist_etf.history)
columns_map = {'mae' : 'training mean absolute error', 'val_mae' : 'validation mean absolute error'}
df_hist_etf.rename(columns = columns_map, inplace=True)
df_optimal_guessing_etf = pd.DataFrame(np.ones(len(df_hist_etf))*optimal_guessing_mean_accuracy_etf, columns=['optimal guessing mean accuracy'])
df_plot_etf = df_hist_etf[columns_map.values()].append(df_optimal_guessing_etf)
sns.lineplot( data=df_plot_etf)

### Regression Model for the Combined Case

I combine the data for both funds and ETFs and perform the same analysis

In [None]:
np_data_combined = np.array(fund_return_esg.append(etf_return_esg))
x_combined = np.float32(np_data_combined[:,1:-1])
y_combined = np.float32(np_data_combined[:,-1])
optimal_guessing_combined = np.mean(y_combined)
optimal_guessing_mean_accuracy_combined = np.mean(abs(y_combined-optimal_guessing_combined))
print("Optimal guessing mean accuracy: ", optimal_guessing_mean_accuracy_combined)
model_combined = keras.Sequential(
    [
        layers.Dense(100, activation="relu", name="layer1"),
        layers.Dense(100, activation="relu", name="layer2"),
        layers.Dense(1, name="layer3")
    ]
)
model_combined.compile(loss="mse", optimizer="adam", metrics=["mae"])

Run model

In [None]:
hist_combined = model_combined.fit(x_etf,y_etf, validation_split=0.1, epochs=200)

Plot the result.  As for ETFs alone we have about ~30% and the same behavior of the validaiton and training sets. It seems like the model is overfitting the training data when we combine the data as well.

In [None]:
df_hist_combined = pd.DataFrame.from_dict(hist_combined.history)
columns_map = {'mae' : 'training mean absolute error', 'val_mae' : 'validation mean absolute error'}
df_hist_combined.rename(columns = columns_map, inplace=True)
df_optimal_guessing_combined = pd.DataFrame(np.ones(len(df_hist_combined))*optimal_guessing_mean_accuracy_combined, columns=['optimal guessing mean accuracy'])
df_plot_combined = df_hist_combined[columns_map.values()].append(df_optimal_guessing_combined)
sns.lineplot( data=df_plot_combined)

## 4. Conclusion

The major conclusion is that we could find improvements to optimal guessing using ESG data. The results were most consistent for funds.

For ETFs or funds+ETFs we there seems to be some overfitting, since the validation error was much larger than the training error. In order to address this would could include me features from the dataset. There is likely some hidden factor that would account for this, see the weaknesses discussed in the beginning. 

For future work I might look into using more features explore how this affects the predicts, and maybe to improve the performance of the ETF case. Trying to outperform a more advanced benchmark than optimal guessing would also be interesting.

## Appendix: Data Comparison Between `sustainability_score` and ESG scores


First we perform a regression scatter plot to view the data more completely. We see that the data  seem to correlate but is placed in two clusters.

In [None]:
df_esg_sum = esg_sum.to_frame('esg_sum')
df = fund_return_esg.join(df_esg_sum)
sns.relplot(x="esg_sum", y="sustainability_score", data=df)

If we split into two clusters we can check the correlation for those. We find that the correlation is very close for the cluster above but not the other. 

There seem to be some difference between the clusters beside size of the values. It is not enough info however in order to draw the conclusion that `esg_sum` is the same as `sustainability_score`, there is probably some weighting or some other factor involved.

In [None]:
df2 = fund_return_esg.join(df_esg_sum)

df_above = df2.loc[df2['esg_sum'] >= 100]
df_below = df2.loc[df2['esg_sum'] < 100]
sns.regplot(x='esg_sum', y='sustainability_score', data=df_above)
sns.regplot(x='esg_sum', y='sustainability_score', data=df_below)
print('Correlation for cluster above: {0}'.format(df_above['esg_sum'].corr(df_above['sustainability_score'])))
print('Correlation for cluster below: {0}'.format(df_below['esg_sum'].corr(df_below['sustainability_score'])))

Just for the fun of it I also check what the correlation would be if we use both clusters but divide the one above with `3` (since it seems to be roughly 3 times larger in value and there might be some weighting of the 3 factors in ESG I might not have considered). But as seen below it does not give any good result.

In [None]:
def divide_if_above_threshold(value, threshold = 100, divider = 3.0):
    return value/divider if value > threshold else value 

df_esg_sum_divided = df_esg_sum.applymap(divide_if_above_threshold)
df2 = fund_return_esg.join(df_esg_sum_divided)
sns.regplot(x='esg_sum', y='sustainability_score', data=df2)

print('Correlation: {0}'.format(df2['esg_sum'].corr(df2['sustainability_score'])))

If the correlation was close, e.g. $>0.95$. Then I would have checked if there might have been some rounding error using the code below. But the difference is far too large for this to be the reason, as seen below.

In [None]:
esg_cols = ['social_score', 'environmental_score', 'governance_score']

esg_sum = fund_return_esg[esg_cols].sum(axis=1)


comparison = round(fund_return_esg['sustainability_score'],0) == round(esg_sum,0)

print('{0} out of {1} are the same after rounding'.format(comparison.sum(), len(comparison)))