<h1>Industrial Project -- Environment Words</h1>

### <u>Table of Contents</u>
1. [Introduction](#1)
2. [Import Libraries](#2)
3. [Read CSV Data](#3)
4. [Commonly Used Methods](#4)
5. [Divided into Portfolios Based on "environment_word" ESG Indicator](#5)
6. [Example Views of Portfolio](#6)
7. [Obtain the Return of Each Portfolio](#7)
8. [Result](#8)

<h3><u>Introduction</u></h3> <a id='1'></a>
<p> This notebook will obtain the ESG data of different companies from 2017-2022 and separated the data by year. Subsequently, for every year, the notebook will divide the companies into 5 portfolios based on the companies' performances on "environment_word" ESG factor for that year. Portfolio 1 consists of the companies with best performance on "environment_word" indicator while Portfolio 5 consists of companies with worst performances on "environment_word" indicator. Next, the notebook will test the yearly return of each portfolio in next year, for example, regarding to the portfolios created based on the data in 2017, their yearly return will be calculated in 2018.</p>

<h3><u>Import Libraries</u></h3> <a id='2'></a>

In [1]:
import pandas as pd
import yfinance as yf
import re

<h3><u>Read CSV Data</u></h3> <a id='3'></a>

In [2]:
stock_data_2017 = pd.read_csv("stock_data_2017.csv")

stock_data_2018 = pd.read_csv("stock_data_2018.csv")

stock_data_2019 = pd.read_csv("stock_data_2019.csv")

stock_data_2020 = pd.read_csv("stock_data_2020.csv")

stock_data_2021 = pd.read_csv("stock_data_2021.csv")

stock_data_2022 = pd.read_csv("stock_data_2022.csv")

ESG_data_2017 = pd.read_csv("ESG_data_2017.csv")

ESG_data_2018 = pd.read_csv("ESG_data_2018.csv")

ESG_data_2019 = pd.read_csv("ESG_data_2019.csv")

ESG_data_2020 = pd.read_csv("ESG_data_2020.csv")

ESG_data_2021 = pd.read_csv("ESG_data_2021.csv")

ESG_data_2022 = pd.read_csv("ESG_data_2022.csv")

<h3><u>Commonly Used Methods</u></h3> <a id='4'></a>

In [3]:
def grouping(n: int, df):
    lst = []
    group = n// 5
    end_idx = 0
    for i in range(4):
        lst.append(group + end_idx)
        end_idx += group
    lst.append(n)
    return df[:lst[0]], df[lst[0]:lst[1]], df[lst[1]:lst[2]], df[lst[2]:lst[3]], df[lst[3]:lst[4]]

def getPortfolioDetail(portfolio, stock_df_year, ESG_data_year_group):
    for i in ESG_data_year_group.index:
        row = stock_df_year.loc[stock_df_year['Ticker'] == ESG_data_year_group.loc[i]['file_name']]
        portfolio = pd.concat([portfolio, row], ignore_index=True)
    return portfolio

def obainPorfolioGain(stock_df_year, ESG_data_year_g1, ESG_data_year_g2, ESG_data_year_g3, ESG_data_year_g4, ESG_data_year_g5):
    stock_df_columns = stock_df_year.columns.tolist()
    port_1_empty = pd.DataFrame(columns=stock_df_columns)
    port_2_empty = pd.DataFrame(columns=stock_df_columns)
    port_3_empty = pd.DataFrame(columns=stock_df_columns)
    port_4_empty = pd.DataFrame(columns=stock_df_columns)
    port_5_empty = pd.DataFrame(columns=stock_df_columns)
    port_1 = getPortfolioDetail(port_1_empty, stock_df_year, ESG_data_year_g1)
    port_2 = getPortfolioDetail(port_2_empty, stock_df_year, ESG_data_year_g2)
    port_3 = getPortfolioDetail(port_3_empty, stock_df_year, ESG_data_year_g3)
    port_4 = getPortfolioDetail(port_4_empty, stock_df_year, ESG_data_year_g4)
    port_5 = getPortfolioDetail(port_5_empty, stock_df_year, ESG_data_year_g5)
    return port_1, port_2, port_3, port_4, port_5

def getMean(stock_return_year_g1, stock_return_year_g2, stock_return_year_g3, stock_return_year_g4, stock_return_year_g5):
    mean_profit_g1 = stock_return_year_g1["Yearly Profit"].mean()
    mean_profit_g2 = stock_return_year_g2["Yearly Profit"].mean()
    mean_profit_g3 = stock_return_year_g3["Yearly Profit"].mean()
    mean_profit_g4 = stock_return_year_g4["Yearly Profit"].mean()
    mean_profit_g5 = stock_return_year_g5["Yearly Profit"].mean()
    return mean_profit_g1, mean_profit_g2,mean_profit_g3,mean_profit_g4,mean_profit_g5

<h3><u>Divided into Portfolios Based on "environment_word" ESG Indicator</u></h3> <a id='5'></a>

In [4]:
ESG_data_2017.sort_values('environment_word', ascending=False, inplace=True)
ESG_data_2018.sort_values('environment_word', ascending=False, inplace=True)
ESG_data_2019.sort_values('environment_word', ascending=False, inplace=True)
ESG_data_2020.sort_values('environment_word', ascending=False, inplace=True)
ESG_data_2021.sort_values('environment_word', ascending=False, inplace=True)
ESG_data_2022.sort_values('environment_word', ascending=False, inplace=True)

ESG_data_2017_g1,ESG_data_2017_g2,ESG_data_2017_g3,ESG_data_2017_g4,ESG_data_2017_g5, = grouping(len(ESG_data_2017), ESG_data_2017)
ESG_data_2018_g1,ESG_data_2018_g2,ESG_data_2018_g3,ESG_data_2018_g4,ESG_data_2018_g5, = grouping(len(ESG_data_2018), ESG_data_2018)
ESG_data_2019_g1,ESG_data_2019_g2,ESG_data_2019_g3,ESG_data_2019_g4,ESG_data_2019_g5, = grouping(len(ESG_data_2019), ESG_data_2019)
ESG_data_2020_g1,ESG_data_2020_g2,ESG_data_2020_g3,ESG_data_2020_g4,ESG_data_2020_g5, = grouping(len(ESG_data_2020), ESG_data_2020)
ESG_data_2021_g1,ESG_data_2021_g2,ESG_data_2021_g3,ESG_data_2021_g4,ESG_data_2021_g5, = grouping(len(ESG_data_2021), ESG_data_2021)
ESG_data_2022_g1,ESG_data_2022_g2,ESG_data_2022_g3,ESG_data_2022_g4,ESG_data_2022_g5, = grouping(len(ESG_data_2022), ESG_data_2022)

<h3><u>Example Views of Portfolio</u></h3> <a id='6'></a>

In [5]:
ESG_data_2017_g1

Unnamed: 0,file_name,positive,negative,net_value,total_words,business,cc_words,environment_word,gen_word,philanthrophy,...,health_value,evaluation_value,demo_value,diversity_value,attract_value,net_ratio,posi_ratio,nega_ratio,name,year
466,0992.HK,41,46,56,57,30,24,59,16,11,...,29,16,39,43,30,64,44,40,00992.pdf,44
6,0293.HK,54,32,89,47,21,19,41,24,33,...,18,24,32,42,21,81,71,34,00293.pdf,44
267,2196.HK,58,92,6,94,37,13,41,22,69,...,73,22,91,90,37,54,38,49,02196.pdf,44
327,0857.HK,47,41,69,46,29,41,40,5,0,...,41,5,40,41,29,69,62,44,00857.pdf,44
484,0486.HK,37,41,59,64,30,28,40,7,24,...,47,7,49,53,30,67,35,32,00486.pdf,44
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77,1316.HK,8,7,80,8,8,7,9,7,6,...,8,7,8,6,8,67,56,43,01316.pdf,44
175,0641.HK,5,8,76,9,5,2,9,0,0,...,9,0,12,6,5,54,35,47,00641.pdf,44
250,0670.HK,33,27,76,36,12,7,9,12,0,...,17,12,29,32,12,72,57,37,00670.pdf,44
357,2799.HK,51,59,46,64,28,6,9,13,35,...,14,13,70,71,28,61,49,46,02799.pdf,44


In [6]:
ESG_data_2017_g5

Unnamed: 0,file_name,positive,negative,net_value,total_words,business,cc_words,environment_word,gen_word,philanthrophy,...,health_value,evaluation_value,demo_value,diversity_value,attract_value,net_ratio,posi_ratio,nega_ratio,name,year
36,0274.HK,2,4,79,4,1,1,2,0,0,...,7,0,7,5,1,51,31,48,00274.pdf,44
35,0396.HK,1,2,82,2,1,1,2,0,1,...,2,0,2,2,1,64,41,39,00396.pdf,44
388,1348.HK,2,3,81,4,1,1,2,1,1,...,3,1,6,4,1,64,32,33,01348.pdf,44
389,0655.HK,1,3,80,3,1,1,2,2,0,...,2,2,5,3,1,54,31,44,00655.pdf,44
405,0126.HK,3,3,82,4,1,0,2,0,6,...,5,0,7,5,1,63,54,46,00126.pdf,44
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48,8270.HK,0,1,82,2,0,0,0,0,0,...,3,0,2,1,0,55,23,39,08270.pdf,44
478,8057.HK,3,3,83,3,0,0,0,0,3,...,2,0,7,4,0,72,66,42,08057.pdf,44
302,8310.HK,2,3,81,5,1,0,0,0,0,...,3,0,6,4,1,62,34,36,08310.pdf,44
179,1072.HK,0,0,84,0,0,0,0,0,0,...,0,0,0,0,0,78,0,0,01072.pdf,44


<h3><u>Obtain the Return of Each Portfolio</u></h3> <a id='7'></a>

In [7]:
stock_return_2017_g1, stock_return_2017_g2, stock_return_2017_g3, stock_return_2017_g4, stock_return_2017_g5 = obainPorfolioGain(stock_data_2017, ESG_data_2017_g1, ESG_data_2017_g2, ESG_data_2017_g3, ESG_data_2017_g4, ESG_data_2017_g5)
stock_return_2018_g1, stock_return_2018_g2, stock_return_2018_g3, stock_return_2018_g4, stock_return_2018_g5 = obainPorfolioGain(stock_data_2018, ESG_data_2018_g1, ESG_data_2018_g2, ESG_data_2018_g3, ESG_data_2018_g4, ESG_data_2018_g5)
stock_return_2019_g1, stock_return_2019_g2, stock_return_2019_g3, stock_return_2019_g4, stock_return_2019_g5 = obainPorfolioGain(stock_data_2019, ESG_data_2019_g1, ESG_data_2019_g2, ESG_data_2019_g3, ESG_data_2019_g4, ESG_data_2019_g5)
stock_return_2020_g1, stock_return_2020_g2, stock_return_2020_g3, stock_return_2020_g4, stock_return_2020_g5 = obainPorfolioGain(stock_data_2020, ESG_data_2020_g1, ESG_data_2020_g2, ESG_data_2020_g3, ESG_data_2020_g4, ESG_data_2020_g5)
stock_return_2021_g1, stock_return_2021_g2, stock_return_2021_g3, stock_return_2021_g4, stock_return_2021_g5 = obainPorfolioGain(stock_data_2021, ESG_data_2021_g1, ESG_data_2021_g2, ESG_data_2021_g3, ESG_data_2021_g4, ESG_data_2021_g5)
stock_return_2022_g1, stock_return_2022_g2, stock_return_2022_g3, stock_return_2022_g4, stock_return_2022_g5 = obainPorfolioGain(stock_data_2022, ESG_data_2022_g1, ESG_data_2022_g2, ESG_data_2022_g3, ESG_data_2022_g4, ESG_data_2022_g5)


In [8]:
ESG_data_2017_g5

Unnamed: 0,file_name,positive,negative,net_value,total_words,business,cc_words,environment_word,gen_word,philanthrophy,...,health_value,evaluation_value,demo_value,diversity_value,attract_value,net_ratio,posi_ratio,nega_ratio,name,year
36,0274.HK,2,4,79,4,1,1,2,0,0,...,7,0,7,5,1,51,31,48,00274.pdf,44
35,0396.HK,1,2,82,2,1,1,2,0,1,...,2,0,2,2,1,64,41,39,00396.pdf,44
388,1348.HK,2,3,81,4,1,1,2,1,1,...,3,1,6,4,1,64,32,33,01348.pdf,44
389,0655.HK,1,3,80,3,1,1,2,2,0,...,2,2,5,3,1,54,31,44,00655.pdf,44
405,0126.HK,3,3,82,4,1,0,2,0,6,...,5,0,7,5,1,63,54,46,00126.pdf,44
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48,8270.HK,0,1,82,2,0,0,0,0,0,...,3,0,2,1,0,55,23,39,08270.pdf,44
478,8057.HK,3,3,83,3,0,0,0,0,3,...,2,0,7,4,0,72,66,42,08057.pdf,44
302,8310.HK,2,3,81,5,1,0,0,0,0,...,3,0,6,4,1,62,34,36,08310.pdf,44
179,1072.HK,0,0,84,0,0,0,0,0,0,...,0,0,0,0,0,78,0,0,01072.pdf,44


In [9]:
stock_return_2017_g5

Unnamed: 0,Ticker,Price on 2018-01-02,Price on 2018-12-31,Yearly Profit
0,0274.HK,0.400,0.540,35.000003
1,0396.HK,0.670,0.295,-55.970152
2,1348.HK,0.500,0.600,20.000005
3,0655.HK,1.430,0.900,-37.062936
4,0126.HK,0.910,0.870,-4.395607
...,...,...,...,...
129,8270.HK,0.536,0.424,-20.895525
130,8057.HK,16.100,10.300,-36.024845
131,8310.HK,0.750,0.390,-48.000002
132,1072.HK,6.650,4.620,-30.526319


In [10]:
stock_return_2017_g1_mean, stock_return_2017_g2_mean, stock_return_2017_g3_mean, stock_return_2017_g4_mean, stock_return_2017_g5_mean = getMean(stock_return_2017_g1, stock_return_2017_g2, stock_return_2017_g3, stock_return_2017_g4, stock_return_2017_g5)
stock_return_2018_g1_mean, stock_return_2018_g2_mean, stock_return_2018_g3_mean, stock_return_2018_g4_mean, stock_return_2018_g5_mean = getMean(stock_return_2018_g1, stock_return_2018_g2, stock_return_2018_g3, stock_return_2018_g4, stock_return_2018_g5)
stock_return_2019_g1_mean, stock_return_2019_g2_mean, stock_return_2019_g3_mean, stock_return_2019_g4_mean, stock_return_2019_g5_mean = getMean(stock_return_2019_g1, stock_return_2019_g2, stock_return_2019_g3, stock_return_2019_g4, stock_return_2019_g5)
stock_return_2020_g1_mean, stock_return_2020_g2_mean, stock_return_2020_g3_mean, stock_return_2020_g4_mean, stock_return_2020_g5_mean = getMean(stock_return_2020_g1, stock_return_2020_g2, stock_return_2020_g3, stock_return_2020_g4, stock_return_2020_g5)
stock_return_2021_g1_mean, stock_return_2021_g2_mean, stock_return_2021_g3_mean, stock_return_2021_g4_mean, stock_return_2021_g5_mean = getMean(stock_return_2021_g1, stock_return_2021_g2, stock_return_2021_g3, stock_return_2021_g4, stock_return_2021_g5)
stock_return_2022_g1_mean, stock_return_2022_g2_mean, stock_return_2022_g3_mean, stock_return_2022_g4_mean, stock_return_2022_g5_mean = getMean(stock_return_2022_g1, stock_return_2022_g2, stock_return_2022_g3, stock_return_2022_g4, stock_return_2022_g5)

<h3><u>Result</u></h3> <a id='8'></a>

In [11]:
result_columns = ['Year/Portfolio', 'P1', 'P2', 'P3', 'P4', 'P5']

result_df = pd.DataFrame(columns=result_columns)
#2017
row = pd.DataFrame(
            [[
                '2017',
                stock_return_2017_g1_mean,
                stock_return_2017_g2_mean,
                stock_return_2017_g3_mean,
                stock_return_2017_g4_mean,
                stock_return_2017_g5_mean
            ]], columns = result_columns  
        )

result_df = pd.concat([result_df, row], ignore_index=True)
#2018
row = pd.DataFrame(
            [[
                '2018',
                stock_return_2018_g1_mean,
                stock_return_2019_g2_mean,
                stock_return_2020_g3_mean,
                stock_return_2021_g4_mean,
                stock_return_2022_g5_mean
            ]], columns = result_columns  
        )

result_df = pd.concat([result_df, row], ignore_index=True)
#2019
row = pd.DataFrame(
            [[
                '2019',
                stock_return_2019_g1_mean,
                stock_return_2019_g2_mean,
                stock_return_2019_g3_mean,
                stock_return_2019_g4_mean,
                stock_return_2019_g5_mean
            ]], columns = result_columns  
        )

result_df = pd.concat([result_df, row], ignore_index=True)
#2020
row = pd.DataFrame(
            [[
                '2020',
                stock_return_2020_g1_mean,
                stock_return_2020_g2_mean,
                stock_return_2020_g3_mean,
                stock_return_2020_g4_mean,
                stock_return_2020_g5_mean
            ]], columns = result_columns  
        )

result_df = pd.concat([result_df, row], ignore_index=True)
#2021
row = pd.DataFrame(
            [[
                '2021',
                stock_return_2021_g1_mean,
                stock_return_2021_g2_mean,
                stock_return_2021_g3_mean,
                stock_return_2021_g4_mean,
                stock_return_2021_g5_mean
            ]], columns = result_columns  
        )

result_df = pd.concat([result_df, row], ignore_index=True)
#2022
row = pd.DataFrame(
            [[
                '2022',
                stock_return_2022_g1_mean,
                stock_return_2022_g2_mean,
                stock_return_2022_g3_mean,
                stock_return_2022_g4_mean,
                stock_return_2022_g5_mean
            ]], columns = result_columns  
        )

result_df = pd.concat([result_df, row], ignore_index=True)

result_df

Unnamed: 0,Year/Portfolio,P1,P2,P3,P4,P5
0,2017,-16.574013,-21.786412,-17.82558,-10.252348,-11.928204
1,2018,5.372129,2.945443,7.550101,-14.233171,-13.035612
2,2019,3.449321,2.945443,6.612359,39.13762,-0.601552
3,2020,19.312471,6.42691,7.550101,16.798885,27.443999
4,2021,-17.467589,-12.710814,-18.087875,-14.233171,-22.605466
5,2022,-17.611334,-18.817641,-22.515313,-18.683464,-13.035612
