<a href="https://colab.research.google.com/github/julianikulski/director-experience/blob/main/csr-committee/csr_committees.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Identifying CSR board committees
In this notebook, I am preprocessing the committee data from Refinitiv Eikon and identify which companies have a CSR related committee and which do not. Refinitiv has two data sources for committee information. First, they have a separate data field which lists CSR related committees on the board level AND the executive level. Because I am only interested in board level committees, I cannot use this list. Second, the director data from Refinitiv Eikon also contains information on the type of board committees a company has and which director is a part of those committees (albeit this dataset is missing the time period of committee membership, which therefore still needs to be manually gathered from DEF 14As). However, this director data only contains CSR related committees if their sole purpose is CSR. In many cases, companies have nominating and/or governance committees which also oversee sustainability and CSR at companies. Therefore, I cannot use this list alone either because there are many relevant committees missing from this dataset.

Therefore, in this notebook, I am using the information from the two sources as an indication for the manual review I will be doing of the DEF 14As for the missing biographies and the missing committee information. The final excel file which I write in this notebook contains all relevant companies and directors with the additional information on potentially relevant CSR board committees. This list was then completed by manually researching missing biographies and committee information from DEF 14A filings.

In [1]:
# connecting to Google Drive to access files
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [2]:
import numpy as np
import pandas as pd
from glob import glob
import os
import tqdm.notebook as tqdm
import re


In [3]:
# change settings to display full width of column content
#pd.set_option('display.max_colwidth', None)
#pd.set_option('display.max_columns', None)


## Read in data

In [4]:
# read in all committee data which was previously preprocessed
committee_df = pd.read_csv('/content/drive/My Drive/director-csr/all_committees.csv')
committee_df.drop(columns=['Unnamed: 0'], inplace=True)
committee_df.head()


Unnamed: 0,name,committee,age,last_position,director_start,director_end,executive_start,executive_end,comp_name,ticker,missing_start_date,2011,2012,2013,2014,2015,current_position,dir_exec,in_position,isin
0,mr. bradbury (brad) anderson,"['audit', 'compensation', 'governance', 'nomin...",68.0,independent chairman of the board,2017.0,2018.0,2011.0,2018.0,waste management inc,wm,0.0,1,1,1,1,1,,,,us94106l1098
1,mr. w. robert reum,"['audit', 'compensation', 'governance', 'nomin...",,non-executive independent chairman of the board,0.0,0.0,2003.0,2017.0,waste management inc,wm,0.0,1,1,1,1,1,,,,us94106l1098
2,prof. pastora cafferty,"['audit', 'compensation', 'governance', 'nomin...",,independent director,0.0,0.0,1994.0,2013.0,waste management inc,wm,0.0,1,1,1,0,0,,,,us94106l1098
3,"mr. thomas weidemeyer , j.d.","['audit', 'governance', 'nominating']",72.0,,,,,,waste management inc,wm,,1,1,1,1,1,non-executive independent chairman of the board,2005.0,2018.0,us94106l1098
4,"mr. frank clark , jr., j.d.","['audit', 'compensation']",74.0,,,,,,waste management inc,wm,,1,1,1,1,1,independent director,2003.0,2003.0,us94106l1098


In [5]:
# how many entries are in this dataframe
committee_df.shape


(7271, 20)

In [6]:
# read in the csr sustainability committee data field by Reuters
csr_df = pd.read_excel('/content/drive/My Drive/director-csr/committees/CSR_scores_committee.xlsx')
csr_df.head()


Unnamed: 0,ISIN Code,Name,3M COMPANY - FISCAL PERIOD END DATE,3M COMPANY - Environment Pillar Score,3M COMPANY - Governance Pillar Score,3M COMPANY - Social Pillar Score,3M COMPANY - ESG Combined Score,3M COMPANY - ESG Controversies Score,3M COMPANY - ESG Score,3M COMPANY - CSR Sustainability Committee,ABBOTT LABORATORIES - FISCAL PERIOD END DATE,ABBOTT LABORATORIES - Environment Pillar Score,ABBOTT LABORATORIES - Governance Pillar Score,ABBOTT LABORATORIES - Social Pillar Score,ABBOTT LABORATORIES - ESG Combined Score,ABBOTT LABORATORIES - ESG Controversies Score,ABBOTT LABORATORIES - ESG Score,ABBOTT LABORATORIES - CSR Sustainability Committee,ABBVIE INC - FISCAL PERIOD END DATE,ABBVIE INC - Environment Pillar Score,ABBVIE INC - Governance Pillar Score,ABBVIE INC - Social Pillar Score,ABBVIE - ESG Combined Score,ABBVIE - ESG Controversies Score,ABBVIE - ESG Score,ABBVIE - CSR Sustainability Committee,ACCENTURE PLC - FISCAL PERIOD END DATE,ACCENTURE PLC - Environment Pillar Score,ACCENTURE PLC - Governance Pillar Score,ACCENTURE PLC - Social Pillar Score,ACCENTURE PLC - ESG Combined Score,ACCENTURE PLC - ESG Controversies Score,ACCENTURE PLC - ESG Score,ACCENTURE PLC - CSR Sustainability Committee,ACTIVISION BLIZZARD - FISCAL PERIOD END DATE,ACTIVISION BLIZZARD - Environment Pillar Score,ACTIVISION BLIZZARD - Governance Pillar Score,ACTIVISION BLIZZARD - Social Pillar Score,ACTIVISION BLIZZARD - ESG Combined Score,ACTIVISION BLIZZARD - ESG Controversies Score,...,QLOGIC CORP - FISCAL PERIOD END DATE,QLOGIC CORP - Environment Pillar Score,QLOGIC CORP - Governance Pillar Score,QLOGIC CORP - Social Pillar Score,QLOGIC CORP - ESG Combined Score,QLOGIC CORP - ESG Controversies Score,QLOGIC CORP - ESG Score,QLOGIC CORP - CSR Sustainability Committee,QWEST COMMUNICATIONS - FISCAL PERIOD END DATE,QWEST COMMUNICATIONS - Environment Pillar Score,QWEST COMMUNICATIONS - Governance Pillar Score,QWEST COMMUNICATIONS - Social Pillar Score,QWEST COMMUNICATIONS - ESG Combined Score,QWEST COMMUNICATIONS - ESG Controversies Score,QWEST COMMUNICATIONS - ESG Score,QWEST COMMUNICATIONS - CSR Sustainability Committee,RS LEGACY CORP - FISCAL PERIOD END DATE,RS LEGACY CORP - Environment Pillar Score,RS LEGACY CORP - Governance Pillar Score,RS LEGACY CORP - Social Pillar Score,RS LEGACY - ESG Combined Score,RS LEGACY - ESG Controversies Score,RS LEGACY - ESG Score,RS LEGACY - CSR Sustainability Committee,SUNEDISON INC - FISCAL PERIOD END DATE,SUNEDISON INC - Environment Pillar Score,SUNEDISON INC - Governance Pillar Score,SUNEDISON INC - Social Pillar Score,SUNEDISON INC - ESG Combined Score,SUNEDISON INC - ESG Controversies Score,SUNEDISON INC - ESG Score,SUNEDISON INC - CSR Sustainability Committee,TELLABS INC - FISCAL PERIOD END DATE,TELLABS INC - Environment Pillar Score,TELLABS INC - Governance Pillar Score,TELLABS INC - Social Pillar Score,TELLABS INC - ESG Combined Score,TELLABS INC - ESG Controversies Score,TELLABS INC - ESG Score,TELLABS INC - CSR Sustainability Committee
0,US88579Y1010,2010.0,2010-12-31,84.35,87.48,85.14,77.33,69.23,85.43,Y,2010-12-31,52.74,78.45,94.79,42.32,1900-01-02 18:43:12,1900-03-21 20:52:48,Y,2010-12-31,,,NaT,,,,,2010-08-31,1900-03-20 03:21:36,84.69,72.2,1900-03-19 01:55:12,100.0,79.08,Y,2010-12-31,0.0,34.28,1900-02-02 07:12:00,29.13,1900-02-08 09:21:36,...,2010-03-28,1.44,32.3,28.05,23.09,100.0,23.09,N,2010-12-31,49.14,29.05,68.75,54.24,81.43,54.24,N,2010-12-31,23.08,50.85,33.0,37.94,76.32,37.94,N,2010-12-31,33.3,60.12,69.91,52.43,83.33,52.43,N,2010-12-31,53.8,30.47,29.51,35.27,100.0,35.27,Y
1,US0028241000,2011.0,2011-12-31,84.27,84.25,87.94,53.28,20.83,85.72,Y,2011-12-31,54.22,87.35,91.63,47.27,1900-01-10 17:02:24,1900-03-23 19:40:48,Y,2011-12-31,,,NaT,,,,,2011-08-31,1900-03-17 06:57:36,72.54,80.49,1900-03-16 08:38:24,89.71,76.36,Y,2011-12-31,0.0,37.1,1900-01-16 15:36:00,23.8,1900-03-17 22:33:36,...,2011-04-03,1.13,31.29,38.66,27.97,100.0,27.97,Y,NaT,,,,,,,,2011-12-31,35.22,40.17,48.46,42.75,100.0,42.75,N,2011-12-31,30.94,52.22,69.19,48.63,100.0,48.63,N,2011-12-30,46.9,15.09,27.49,26.94,44.44,26.94,Y
2,US00287Y1091,2012.0,2012-12-31,83.89,77.73,89.77,84.72,85.0,84.72,Y,2012-12-31,52.13,83.35,88.95,80.09,1900-03-19 09:50:24,1900-03-20 18:14:24,Y,2012-12-31,,,NaT,,,,,2012-08-31,1900-03-17 22:04:48,65.0,80.16,1900-03-12 19:55:12,90.91,72.83,Y,2012-12-31,0.0,27.78,1900-01-09 05:31:12,16.54,1900-03-18 18:57:36,...,2012-04-01,0.53,32.01,43.23,30.29,100.0,30.29,Y,NaT,,,,,,,,2012-12-31,32.76,61.02,46.87,42.29,35.0,49.59,N,2012-12-31,35.39,80.82,72.05,60.88,100.0,60.88,Y,2012-12-28,,,,,,,
3,IE00B4BNMY34,2013.0,2013-12-31,83.89,85.68,89.43,86.53,100.0,86.53,Y,2013-12-31,52.98,64.85,87.67,48.08,1900-01-22 12:00:00,1900-03-13 15:50:24,Y,2013-12-31,40.65,35.47,1900-02-22 08:52:48,44.94,1900-04-09 00:00:00,44.94,N,2013-08-31,1900-03-16 00:43:12,64.15,83.06,1900-03-13 08:09:36,100.0,73.34,Y,2013-12-31,9.86,54.78,1900-01-28 12:00:00,38.08,1900-04-09 00:00:00,...,2013-03-31,0.95,28.01,46.18,30.73,100.0,30.73,Y,NaT,,,,,,,,2013-12-31,31.87,61.13,49.3,40.24,30.0,50.49,N,2013-12-31,32.94,41.17,53.6,41.38,100.0,41.38,Y,NaT,,,,,,,
4,US00507V1098,2014.0,2014-12-31,82.83,89.03,95.95,69.78,50.0,89.56,Y,2014-12-31,50.78,56.58,83.29,55.55,1900-02-11 20:38:24,1900-03-08 05:45:36,Y,2014-12-31,54.91,52.61,1900-03-10 09:07:12,61.36,1900-03-28 05:45:36,61.36,Y,2014-08-31,1900-03-16 10:19:12,58.89,83.72,1900-03-11 05:16:48,100.0,71.22,Y,2014-12-31,9.47,65.89,1900-01-25 02:24:00,41.81,1900-03-30 00:00:00,...,2014-03-30,1.03,39.23,45.07,33.31,100.0,33.31,N,NaT,,,,,,,,NaT,,,,,,,,2014-12-31,35.45,32.05,47.73,37.64,100.0,37.64,Y,NaT,,,,,,,


## Check whether these committees include csr committees or similiar

In [7]:
# remove brackets from committee column values and
# turn the strings in the committee column into lists
committee_df['committee'] = committee_df['committee'].apply(lambda x: x.strip('[]').replace('\'', '').replace(' ', '').split(','))
committee_df['committee']


0       [audit, compensation, governance, nominating]
1       [audit, compensation, governance, nominating]
2       [audit, compensation, governance, nominating]
3                     [audit, governance, nominating]
4                               [audit, compensation]
                            ...                      
7266                            [audit, compensation]
7267                                     [governance]
7268                                     [governance]
7269                            [audit, compensation]
7270                                     [governance]
Name: committee, Length: 7271, dtype: object

In [8]:
# check what types of committees there are
unique_coms = []

for row in committee_df['committee']:
    for com_type in row:
        if com_type not in unique_coms:
            unique_coms.append(com_type)
        else:
            pass

unique_coms


['audit',
 'compensation',
 'governance',
 'nominating',
 'finance',
 'humanresources',
 'executive',
 'remuneration',
 'environment',
 'csrresponsibility',
 'management',
 'social',
 'shareholdergrievance']

Out of these committees, the ones that are of interest to me are: environment, csr responsibility, social, and maybe human resources (although this is likely similar to the nominating committee). I will now review the csr sustainability committee data field by Reuters.

## Preprocess the csr sustainability committee data

In [9]:
# check how many companies are included
print(csr_df.shape)
# check how many variables are included per company
print((csr_df.shape[1]-2)/csr_df.shape[0])


(630, 5042)
8.0


In [10]:
# create a list of all ISINs that will be inserted as a row in the csr_df
all_isins_list = [item for isin in csr_df['ISIN Code'] for item in [isin]*8]

# drop the isin column
csr_df.drop(columns=['ISIN Code'], inplace=True)
# remove all empty rows
csr_df.dropna(subset=['Name'], inplace=True)

# turn the Name column which contains the years into the index
csr_df['Name'] = csr_df['Name'].astype(int)
csr_df.set_index('Name', inplace=True)

# add all ISINs to the table
csr_df.loc[0] = all_isins_list
csr_df


Unnamed: 0_level_0,3M COMPANY - FISCAL PERIOD END DATE,3M COMPANY - Environment Pillar Score,3M COMPANY - Governance Pillar Score,3M COMPANY - Social Pillar Score,3M COMPANY - ESG Combined Score,3M COMPANY - ESG Controversies Score,3M COMPANY - ESG Score,3M COMPANY - CSR Sustainability Committee,ABBOTT LABORATORIES - FISCAL PERIOD END DATE,ABBOTT LABORATORIES - Environment Pillar Score,ABBOTT LABORATORIES - Governance Pillar Score,ABBOTT LABORATORIES - Social Pillar Score,ABBOTT LABORATORIES - ESG Combined Score,ABBOTT LABORATORIES - ESG Controversies Score,ABBOTT LABORATORIES - ESG Score,ABBOTT LABORATORIES - CSR Sustainability Committee,ABBVIE INC - FISCAL PERIOD END DATE,ABBVIE INC - Environment Pillar Score,ABBVIE INC - Governance Pillar Score,ABBVIE INC - Social Pillar Score,ABBVIE - ESG Combined Score,ABBVIE - ESG Controversies Score,ABBVIE - ESG Score,ABBVIE - CSR Sustainability Committee,ACCENTURE PLC - FISCAL PERIOD END DATE,ACCENTURE PLC - Environment Pillar Score,ACCENTURE PLC - Governance Pillar Score,ACCENTURE PLC - Social Pillar Score,ACCENTURE PLC - ESG Combined Score,ACCENTURE PLC - ESG Controversies Score,ACCENTURE PLC - ESG Score,ACCENTURE PLC - CSR Sustainability Committee,ACTIVISION BLIZZARD - FISCAL PERIOD END DATE,ACTIVISION BLIZZARD - Environment Pillar Score,ACTIVISION BLIZZARD - Governance Pillar Score,ACTIVISION BLIZZARD - Social Pillar Score,ACTIVISION BLIZZARD - ESG Combined Score,ACTIVISION BLIZZARD - ESG Controversies Score,ACTIVISION BLIZZARD - ESG Score,ACTIVISION BLIZZARD - CSR Sustainability Committee,...,QLOGIC CORP - FISCAL PERIOD END DATE,QLOGIC CORP - Environment Pillar Score,QLOGIC CORP - Governance Pillar Score,QLOGIC CORP - Social Pillar Score,QLOGIC CORP - ESG Combined Score,QLOGIC CORP - ESG Controversies Score,QLOGIC CORP - ESG Score,QLOGIC CORP - CSR Sustainability Committee,QWEST COMMUNICATIONS - FISCAL PERIOD END DATE,QWEST COMMUNICATIONS - Environment Pillar Score,QWEST COMMUNICATIONS - Governance Pillar Score,QWEST COMMUNICATIONS - Social Pillar Score,QWEST COMMUNICATIONS - ESG Combined Score,QWEST COMMUNICATIONS - ESG Controversies Score,QWEST COMMUNICATIONS - ESG Score,QWEST COMMUNICATIONS - CSR Sustainability Committee,RS LEGACY CORP - FISCAL PERIOD END DATE,RS LEGACY CORP - Environment Pillar Score,RS LEGACY CORP - Governance Pillar Score,RS LEGACY CORP - Social Pillar Score,RS LEGACY - ESG Combined Score,RS LEGACY - ESG Controversies Score,RS LEGACY - ESG Score,RS LEGACY - CSR Sustainability Committee,SUNEDISON INC - FISCAL PERIOD END DATE,SUNEDISON INC - Environment Pillar Score,SUNEDISON INC - Governance Pillar Score,SUNEDISON INC - Social Pillar Score,SUNEDISON INC - ESG Combined Score,SUNEDISON INC - ESG Controversies Score,SUNEDISON INC - ESG Score,SUNEDISON INC - CSR Sustainability Committee,TELLABS INC - FISCAL PERIOD END DATE,TELLABS INC - Environment Pillar Score,TELLABS INC - Governance Pillar Score,TELLABS INC - Social Pillar Score,TELLABS INC - ESG Combined Score,TELLABS INC - ESG Controversies Score,TELLABS INC - ESG Score,TELLABS INC - CSR Sustainability Committee
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2010,2010-12-31 00:00:00,84.35,87.48,85.14,77.33,69.23,85.43,Y,2010-12-31 00:00:00,52.74,78.45,94.79,42.32,1900-01-02 18:43:12,1900-03-21 20:52:48,Y,2010-12-31 00:00:00,,,NaT,,,,,2010-08-31 00:00:00,1900-03-20 03:21:36,84.69,72.2,1900-03-19 01:55:12,100,79.08,Y,2010-12-31 00:00:00,0,34.28,1900-02-02 07:12:00,29.13,1900-02-08 09:21:36,29.13,N,...,2010-03-28 00:00:00,1.44,32.3,28.05,23.09,100,23.09,N,2010-12-31 00:00:00,49.14,29.05,68.75,54.24,81.43,54.24,N,2010-12-31 00:00:00,23.08,50.85,33,37.94,76.32,37.94,N,2010-12-31 00:00:00,33.3,60.12,69.91,52.43,83.33,52.43,N,2010-12-31 00:00:00,53.8,30.47,29.51,35.27,100,35.27,Y
2011,2011-12-31 00:00:00,84.27,84.25,87.94,53.28,20.83,85.72,Y,2011-12-31 00:00:00,54.22,87.35,91.63,47.27,1900-01-10 17:02:24,1900-03-23 19:40:48,Y,2011-12-31 00:00:00,,,NaT,,,,,2011-08-31 00:00:00,1900-03-17 06:57:36,72.54,80.49,1900-03-16 08:38:24,89.71,76.36,Y,2011-12-31 00:00:00,0,37.1,1900-01-16 15:36:00,23.8,1900-03-17 22:33:36,23.8,N,...,2011-04-03 00:00:00,1.13,31.29,38.66,27.97,100,27.97,Y,NaT,,,,,,,,2011-12-31 00:00:00,35.22,40.17,48.46,42.75,100,42.75,N,2011-12-31 00:00:00,30.94,52.22,69.19,48.63,100,48.63,N,2011-12-30 00:00:00,46.9,15.09,27.49,26.94,44.44,26.94,Y
2012,2012-12-31 00:00:00,83.89,77.73,89.77,84.72,85,84.72,Y,2012-12-31 00:00:00,52.13,83.35,88.95,80.09,1900-03-19 09:50:24,1900-03-20 18:14:24,Y,2012-12-31 00:00:00,,,NaT,,,,,2012-08-31 00:00:00,1900-03-17 22:04:48,65,80.16,1900-03-12 19:55:12,90.91,72.83,Y,2012-12-31 00:00:00,0,27.78,1900-01-09 05:31:12,16.54,1900-03-18 18:57:36,16.54,N,...,2012-04-01 00:00:00,0.53,32.01,43.23,30.29,100,30.29,Y,NaT,,,,,,,,2012-12-31 00:00:00,32.76,61.02,46.87,42.29,35,49.59,N,2012-12-31 00:00:00,35.39,80.82,72.05,60.88,100,60.88,Y,2012-12-28 00:00:00,,,,,,,
2013,2013-12-31 00:00:00,83.89,85.68,89.43,86.53,100,86.53,Y,2013-12-31 00:00:00,52.98,64.85,87.67,48.08,1900-01-22 12:00:00,1900-03-13 15:50:24,Y,2013-12-31 00:00:00,40.65,35.47,1900-02-22 08:52:48,44.94,1900-04-09 00:00:00,44.94,N,2013-08-31 00:00:00,1900-03-16 00:43:12,64.15,83.06,1900-03-13 08:09:36,100,73.34,Y,2013-12-31 00:00:00,9.86,54.78,1900-01-28 12:00:00,38.08,1900-04-09 00:00:00,38.08,N,...,2013-03-31 00:00:00,0.95,28.01,46.18,30.73,100,30.73,Y,NaT,,,,,,,,2013-12-31 00:00:00,31.87,61.13,49.3,40.24,30,50.49,N,2013-12-31 00:00:00,32.94,41.17,53.6,41.38,100,41.38,Y,NaT,,,,,,,
2014,2014-12-31 00:00:00,82.83,89.03,95.95,69.78,50,89.56,Y,2014-12-31 00:00:00,50.78,56.58,83.29,55.55,1900-02-11 20:38:24,1900-03-08 05:45:36,Y,2014-12-31 00:00:00,54.91,52.61,1900-03-10 09:07:12,61.36,1900-03-28 05:45:36,61.36,Y,2014-08-31 00:00:00,1900-03-16 10:19:12,58.89,83.72,1900-03-11 05:16:48,100,71.22,Y,2014-12-31 00:00:00,9.47,65.89,1900-01-25 02:24:00,41.81,1900-03-30 00:00:00,41.81,N,...,2014-03-30 00:00:00,1.03,39.23,45.07,33.31,100,33.31,N,NaT,,,,,,,,NaT,,,,,,,,2014-12-31 00:00:00,35.45,32.05,47.73,37.64,100,37.64,Y,NaT,,,,,,,
2015,2015-12-31 00:00:00,81.49,76.72,94.54,85.51,100,85.51,Y,2015-12-31 00:00:00,56.25,75.13,83.22,68.45,1900-03-01 02:38:24,75.79,Y,2015-12-31 00:00:00,60.44,55.36,1900-03-18 12:28:48,67.24,100,67.24,Y,2015-08-31 00:00:00,1900-03-19 10:04:48,84.25,90.38,86.02,100,86.02,Y,2015-12-31 00:00:00,11.16,76.68,51.04,57.37,1900-03-26 23:02:24,57.37,N,...,2015-03-29 00:00:00,0.36,24.38,44.4,28.71,100,28.71,N,NaT,,,,,,,,NaT,,,,,,,,NaT,,,,,,,,NaT,,,,,,,
2016,2016-12-31 00:00:00,84.17,80.18,95.28,77.14,66.67,87.62,Y,2016-12-31 00:00:00,58.1,81.52,85.98,64.85,1900-02-19 00:00:00,79.7,Y,2016-12-31 00:00:00,67.49,62.67,1900-03-26 14:24:00,74.85,93.18,74.85,Y,2016-08-31 00:00:00,1900-03-16 01:26:24,69.23,90.09,78.49,100,78.49,Y,2016-12-31 00:00:00,12.22,65.13,52.7,52.84,1900-04-09 00:00:00,52.84,N,...,2016-04-03 00:00:00,0.51,26.47,46.03,30.13,100,30.13,N,NaT,,,,,,,,NaT,,,,,,,,NaT,,,,,,,,NaT,,,,,,,
0,US88579Y1010,US88579Y1010,US88579Y1010,US88579Y1010,US88579Y1010,US88579Y1010,US88579Y1010,US88579Y1010,US0028241000,US0028241000,US0028241000,US0028241000,US0028241000,US0028241000,US0028241000,US0028241000,US00287Y1091,US00287Y1091,US00287Y1091,US00287Y1091,US00287Y1091,US00287Y1091,US00287Y1091,US00287Y1091,IE00B4BNMY34,IE00B4BNMY34,IE00B4BNMY34,IE00B4BNMY34,IE00B4BNMY34,IE00B4BNMY34,IE00B4BNMY34,IE00B4BNMY34,US00507V1098,US00507V1098,US00507V1098,US00507V1098,US00507V1098,US00507V1098,US00507V1098,US00507V1098,...,US7472771010,US7472771010,US7472771010,US7472771010,US7472771010,US7472771010,US7472771010,US7472771010,US7491211097,US7491211097,US7491211097,US7491211097,US7491211097,US7491211097,US7491211097,US7491211097,US74979E1010,US74979E1010,US74979E1010,US74979E1010,US74979E1010,US74979E1010,US74979E1010,US74979E1010,US86732Y1091,US86732Y1091,US86732Y1091,US86732Y1091,US86732Y1091,US86732Y1091,US86732Y1091,US86732Y1091,US8796641004,US8796641004,US8796641004,US8796641004,US8796641004,US8796641004,US8796641004,US8796641004


In [11]:
# drop any unnecessary columns from csr_df
cols = [col for col in csr_df.columns if any(['sustainability' in col.lower()])]
# create the new dataframe
csr_committee_df = csr_df[cols]

# change the column headers
changed_cols = [x.lower().split('-')[0].strip() for x in csr_committee_df.columns]
csr_committee_df.rename(columns=dict(zip(csr_committee_df.columns, changed_cols)), inplace=True)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [12]:
# drop the year 2016 from the dataframe
csr_committee_df.drop([2016], inplace=True)
csr_committee_df


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0_level_0,3m company,abbott laboratories,abbvie,accenture plc,activision blizzard,acuity brands inc,adobe inc,automatic data proc.,advance auto parts,american electric,aes corp,aetna inc,affiliated managers,aflac incorporated,agilent technologies,american int'l group,apt investment & mgt,air prds.& chems.,arthur j gallagher,akamai technologies,alaska air group inc,albemarle corp,alexion pharms.,allegion plc,allergan plc,alliance data system,alliant energy corp,allstate ord shs,alphabet inc,alphabet inc,altaba inc,altria group inc,amazon.com,ameren corporation,american airlines,american express co,american tower corp,american water works,ameriprise fin,amerisourcebergen,...,h.j. heinz company,leidos holdings inc,molex incorporated,nyse euronext,sprint nextel corp,t,teradyne inc,viavi solutions,adtalem global education,alpha natural resrc,compuware corp,constellation energy,cooper industries,el paso,sunoco inc,goodrich corporation,hillshire brands co,lexmark internatl,medco health sltn.,novellus systems inc,progress energy inc,rr donnelley & sons,sears holdings,supervalu inc.,titanium metals corp,ak steel holding,"cephalon, inc.",itt inc,janus capital group,meredith corp,"novell, inc.",monster worldwide,national semicon.,nicor inc.,prologis,qlogic corp,qwest communications,rs legacy,sunedison inc,tellabs inc
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2010,Y,Y,,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,Y,N,Y,N,Y,,Y,N,,N,N,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,N,Y,N,N,...,Y,Y,Y,Y,Y,N,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,N,Y,N,Y,N,N,N,N,Y
2011,Y,Y,,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,N,Y,,Y,N,,N,N,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,N,Y,N,N,...,Y,Y,Y,Y,Y,N,Y,Y,N,Y,Y,,,,,,Y,Y,Y,N,,Y,Y,Y,N,Y,,N,Y,Y,,N,,,,Y,,N,N,Y
2012,Y,Y,,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,N,Y,Y,N,Y,,Y,N,,N,N,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,Y,N,N,...,Y,Y,Y,Y,Y,N,Y,Y,N,Y,Y,,,,,,Y,Y,,,,Y,Y,Y,,Y,,N,Y,Y,,N,,,,Y,,N,Y,
2013,Y,Y,N,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,N,Y,Y,N,Y,,Y,N,N,N,Y,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,Y,N,N,...,Y,Y,Y,,,N,Y,Y,N,Y,Y,,,,,,Y,Y,,,,Y,Y,Y,,Y,,N,Y,Y,,N,,,,Y,,N,Y,
2014,Y,Y,Y,Y,N,N,Y,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,N,Y,,Y,N,N,N,Y,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,Y,N,N,...,,Y,,,,N,Y,N,N,N,Y,,,,,,,Y,,,,Y,Y,Y,,Y,,Y,N,Y,,N,,,,N,,,Y,
2015,Y,Y,Y,Y,N,N,Y,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,N,Y,N,Y,N,N,N,Y,Y,Y,Y,Y,N,Y,N,Y,N,Y,N,N,N,N,...,,Y,,,,N,N,N,N,,,,,,,,,Y,,,,Y,Y,N,,Y,,Y,N,Y,,N,,,,N,,,,
0,US88579Y1010,US0028241000,US00287Y1091,IE00B4BNMY34,US00507V1098,US00508Y1029,US00724F1012,US0530151036,US00751Y1064,US0255371017,US00130H1059,US00817Y1082,US0082521081,US0010551028,US00846U1016,US0268747849,US03748R7474,US0091581068,US3635761097,US00971T1016,US0116591092,US0126531013,US0153511094,IE00BFRT3W74,IE00BY9D5467,US0185811082,US0188021085,US0200021014,US02079K1079,US02079K3059,US0213461017,US02209S1033,US0231351067,US0236081024,US02376R1023,US0258161092,US03027X1000,US0304201033,US03076C1062,US03073E1055,...,US4230741039,US5253271028,US6085541018,US6294911010,US8520611000,US8725901040,US8807701029,US9255501051,US00737L1035,US02076X1028,US2056381096,US2103711006,IE00B40K9117,US28336L1098,US86764P1093,US3823881061,US4325891095,US5297711070,US58405U1025,US6700081010,US7432631056,US2578672006,US8123501061,US8685363017,US8883392073,US0015471081,US1567081096,US45073V1089,US47102X1054,US5894331017,US6700061053,US6117421072,US6376401039,US6540861076,US7434101025,US7472771010,US7491211097,US74979E1010,US86732Y1091,US8796641004


In [13]:
# remove any excess whitespace from values
csr_committee_df = csr_committee_df.applymap(lambda x: x.strip() if not pd.isna(x) else x)
csr_committee_df


Unnamed: 0_level_0,3m company,abbott laboratories,abbvie,accenture plc,activision blizzard,acuity brands inc,adobe inc,automatic data proc.,advance auto parts,american electric,aes corp,aetna inc,affiliated managers,aflac incorporated,agilent technologies,american int'l group,apt investment & mgt,air prds.& chems.,arthur j gallagher,akamai technologies,alaska air group inc,albemarle corp,alexion pharms.,allegion plc,allergan plc,alliance data system,alliant energy corp,allstate ord shs,alphabet inc,alphabet inc,altaba inc,altria group inc,amazon.com,ameren corporation,american airlines,american express co,american tower corp,american water works,ameriprise fin,amerisourcebergen,...,h.j. heinz company,leidos holdings inc,molex incorporated,nyse euronext,sprint nextel corp,t,teradyne inc,viavi solutions,adtalem global education,alpha natural resrc,compuware corp,constellation energy,cooper industries,el paso,sunoco inc,goodrich corporation,hillshire brands co,lexmark internatl,medco health sltn.,novellus systems inc,progress energy inc,rr donnelley & sons,sears holdings,supervalu inc.,titanium metals corp,ak steel holding,"cephalon, inc.",itt inc,janus capital group,meredith corp,"novell, inc.",monster worldwide,national semicon.,nicor inc.,prologis,qlogic corp,qwest communications,rs legacy,sunedison inc,tellabs inc
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2010,Y,Y,,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,Y,N,Y,N,Y,,Y,N,,N,N,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,N,Y,N,N,...,Y,Y,Y,Y,Y,N,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,N,Y,N,Y,N,N,N,N,Y
2011,Y,Y,,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,N,Y,,Y,N,,N,N,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,N,Y,N,N,...,Y,Y,Y,Y,Y,N,Y,Y,N,Y,Y,,,,,,Y,Y,Y,N,,Y,Y,Y,N,Y,,N,Y,Y,,N,,,,Y,,N,N,Y
2012,Y,Y,,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,N,Y,Y,N,Y,,Y,N,,N,N,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,Y,N,N,...,Y,Y,Y,Y,Y,N,Y,Y,N,Y,Y,,,,,,Y,Y,,,,Y,Y,Y,,Y,,N,Y,Y,,N,,,,Y,,N,Y,
2013,Y,Y,N,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,N,Y,Y,N,Y,,Y,N,N,N,Y,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,Y,N,N,...,Y,Y,Y,,,N,Y,Y,N,Y,Y,,,,,,Y,Y,,,,Y,Y,Y,,Y,,N,Y,Y,,N,,,,Y,,N,Y,
2014,Y,Y,Y,Y,N,N,Y,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,N,Y,,Y,N,N,N,Y,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,Y,N,N,...,,Y,,,,N,Y,N,N,N,Y,,,,,,,Y,,,,Y,Y,Y,,Y,,Y,N,Y,,N,,,,N,,,Y,
2015,Y,Y,Y,Y,N,N,Y,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,N,Y,N,Y,N,N,N,Y,Y,Y,Y,Y,N,Y,N,Y,N,Y,N,N,N,N,...,,Y,,,,N,N,N,N,,,,,,,,,Y,,,,Y,Y,N,,Y,,Y,N,Y,,N,,,,N,,,,
0,US88579Y1010,US0028241000,US00287Y1091,IE00B4BNMY34,US00507V1098,US00508Y1029,US00724F1012,US0530151036,US00751Y1064,US0255371017,US00130H1059,US00817Y1082,US0082521081,US0010551028,US00846U1016,US0268747849,US03748R7474,US0091581068,US3635761097,US00971T1016,US0116591092,US0126531013,US0153511094,IE00BFRT3W74,IE00BY9D5467,US0185811082,US0188021085,US0200021014,US02079K1079,US02079K3059,US0213461017,US02209S1033,US0231351067,US0236081024,US02376R1023,US0258161092,US03027X1000,US0304201033,US03076C1062,US03073E1055,...,US4230741039,US5253271028,US6085541018,US6294911010,US8520611000,US8725901040,US8807701029,US9255501051,US00737L1035,US02076X1028,US2056381096,US2103711006,IE00B40K9117,US28336L1098,US86764P1093,US3823881061,US4325891095,US5297711070,US58405U1025,US6700081010,US7432631056,US2578672006,US8123501061,US8685363017,US8883392073,US0015471081,US1567081096,US45073V1089,US47102X1054,US5894331017,US6700061053,US6117421072,US6376401039,US6540861076,US7434101025,US7472771010,US7491211097,US74979E1010,US86732Y1091,US8796641004


In [14]:
# get the duplicate column names
duplicate_cols = list(set([x for x in list(csr_committee_df.columns) if list(csr_committee_df.columns).count(x) > 1]))
print(duplicate_cols)
# remove these duplicate columns from the dataframe
dupes_df = csr_committee_df[duplicate_cols]
csr_committee_df.drop(columns=duplicate_cols, inplace=True)

# split the dupes_df into two different dfs so that I can rename the cols
new_df = dupes_df.iloc[:, ::2]
remain_dupes_df = dupes_df.iloc[:, 1::2]
# rename the columns
new_col_names = [x+'_diff isin' for x in new_df.columns]
new_df.rename(columns=dict(zip(new_df.columns, new_col_names)), inplace=True)

# add those two dataframes back to the overall dataframe
csr_committee_df = csr_committee_df.join([new_df, remain_dupes_df])
csr_committee_df


['discovery inc', 'chubb', 'alphabet inc', 'coc', 'under armour, inc.']


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0_level_0,3m company,abbott laboratories,abbvie,accenture plc,activision blizzard,acuity brands inc,adobe inc,automatic data proc.,advance auto parts,american electric,aes corp,aetna inc,affiliated managers,aflac incorporated,agilent technologies,american int'l group,apt investment & mgt,air prds.& chems.,arthur j gallagher,akamai technologies,alaska air group inc,albemarle corp,alexion pharms.,allegion plc,allergan plc,alliance data system,alliant energy corp,allstate ord shs,altaba inc,altria group inc,amazon.com,ameren corporation,american airlines,american express co,american tower corp,american water works,ameriprise fin,amerisourcebergen,ametek inc,amgen inc,...,compuware corp,constellation energy,cooper industries,el paso,sunoco inc,goodrich corporation,hillshire brands co,lexmark internatl,medco health sltn.,novellus systems inc,progress energy inc,rr donnelley & sons,sears holdings,supervalu inc.,titanium metals corp,ak steel holding,"cephalon, inc.",itt inc,janus capital group,meredith corp,"novell, inc.",monster worldwide,national semicon.,nicor inc.,prologis,qlogic corp,qwest communications,rs legacy,sunedison inc,tellabs inc,discovery inc_diff isin,chubb_diff isin,alphabet inc_diff isin,coc_diff isin,"under armour, inc._diff isin",discovery inc,chubb,alphabet inc,coc,"under armour, inc."
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2010,Y,Y,,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,Y,N,Y,N,Y,,Y,N,,N,N,Y,Y,Y,Y,N,Y,Y,Y,N,Y,N,N,N,Y,...,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,N,Y,N,Y,N,N,N,N,Y,N,N,Y,Y,N,N,Y,Y,Y,N
2011,Y,Y,,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,N,Y,,Y,N,,N,N,Y,Y,Y,Y,N,Y,Y,Y,N,Y,N,N,N,Y,...,Y,,,,,,Y,Y,Y,N,,Y,Y,Y,N,Y,,N,Y,Y,,N,,,,Y,,N,N,Y,N,Y,Y,Y,N,N,Y,Y,Y,N
2012,Y,Y,,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,N,Y,Y,N,Y,,Y,N,,N,N,Y,Y,N,Y,N,Y,Y,Y,N,Y,N,N,N,Y,...,Y,,,,,,Y,Y,,,,Y,Y,Y,,Y,,N,Y,Y,,N,,,,Y,,N,Y,,N,Y,Y,Y,N,N,Y,Y,Y,N
2013,Y,Y,N,Y,N,,Y,Y,N,Y,Y,Y,N,Y,Y,N,Y,Y,N,Y,,Y,N,N,N,Y,Y,Y,N,Y,N,Y,Y,Y,N,Y,N,N,N,Y,...,Y,,,,,,Y,Y,,,,Y,Y,Y,,Y,,N,Y,Y,,N,,,,Y,,N,Y,,N,Y,Y,Y,N,N,Y,Y,Y,N
2014,Y,Y,Y,Y,N,N,Y,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,N,Y,,Y,N,N,N,Y,Y,Y,N,Y,N,Y,Y,Y,N,Y,N,N,N,Y,...,Y,,,,,,,Y,,,,Y,Y,Y,,Y,,Y,N,Y,,N,,,,N,,,Y,,N,Y,Y,Y,N,N,N,Y,Y,N
2015,Y,Y,Y,Y,N,N,Y,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,N,Y,N,Y,N,N,N,Y,Y,Y,N,Y,N,Y,N,Y,N,N,N,N,N,Y,...,,,,,,,,Y,,,,Y,Y,N,,Y,,Y,N,Y,,N,,,,N,,,,,N,N,Y,Y,N,N,,Y,Y,N
0,US88579Y1010,US0028241000,US00287Y1091,IE00B4BNMY34,US00507V1098,US00508Y1029,US00724F1012,US0530151036,US00751Y1064,US0255371017,US00130H1059,US00817Y1082,US0082521081,US0010551028,US00846U1016,US0268747849,US03748R7474,US0091581068,US3635761097,US00971T1016,US0116591092,US0126531013,US0153511094,IE00BFRT3W74,IE00BY9D5467,US0185811082,US0188021085,US0200021014,US0213461017,US02209S1033,US0231351067,US0236081024,US02376R1023,US0258161092,US03027X1000,US0304201033,US03076C1062,US03073E1055,US0311001004,US0311621009,...,US2056381096,US2103711006,IE00B40K9117,US28336L1098,US86764P1093,US3823881061,US4325891095,US5297711070,US58405U1025,US6700081010,US7432631056,US2578672006,US8123501061,US8685363017,US8883392073,US0015471081,US1567081096,US45073V1089,US47102X1054,US5894331017,US6700061053,US6117421072,US6376401039,US6540861076,US7434101025,US7472771010,US7491211097,US74979E1010,US86732Y1091,US8796641004,US25470F1049,CH0044328745,US02079K1079,US1912161007,US9043111072,US25470F3029,US1712321017,US02079K3059,GB00BDCPN049,US9043112062


In [15]:
# drop any companies that have N for all years
csr_committee_no_df = csr_committee_df.copy()

for col in csr_committee_no_df.columns:
    if csr_committee_no_df[col][:-1].eq('Y').any():
        csr_committee_no_df.drop(columns=[col], inplace=True)

csr_committee_yes_df = csr_committee_df.drop(columns=csr_committee_no_df.columns)
csr_committee_yes_df


Unnamed: 0_level_0,3m company,abbott laboratories,abbvie,accenture plc,adobe inc,automatic data proc.,american electric,aes corp,aetna inc,aflac incorporated,agilent technologies,american int'l group,apt investment & mgt,air prds.& chems.,akamai technologies,albemarle corp,alliance data system,alliant energy corp,allstate ord shs,altaba inc,altria group inc,ameren corporation,american airlines,american express co,american water works,amgen inc,anadarko petroleum,andeavor,anthem inc,aon class a,apache corporation,apple inc,applied materials,at&t inc,autodesk inc,avalonbay communitie,avery dennison corp,baker hughes co,ball corporation,baxter intl.,...,advanced micro,big lots,dean foods co.,dell inc.,h.j. heinz company,leidos holdings inc,molex incorporated,nyse euronext,sprint nextel corp,teradyne inc,viavi solutions,alpha natural resrc,compuware corp,constellation energy,cooper industries,el paso,sunoco inc,goodrich corporation,hillshire brands co,lexmark internatl,medco health sltn.,progress energy inc,rr donnelley & sons,sears holdings,supervalu inc.,ak steel holding,itt inc,janus capital group,meredith corp,national semicon.,prologis,qlogic corp,sunedison inc,tellabs inc,chubb_diff isin,alphabet inc_diff isin,coc_diff isin,chubb,alphabet inc,coc
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2010,Y,Y,,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,...,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,N,Y,N,Y,Y,Y,Y,Y
2011,Y,Y,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,...,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,,Y,Y,Y,,Y,Y,Y,Y,N,Y,Y,,,Y,N,Y,Y,Y,Y,Y,Y,Y
2012,Y,Y,,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,N,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,...,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,,Y,Y,,,Y,Y,Y,Y,N,Y,Y,,,Y,Y,,Y,Y,Y,Y,Y,Y
2013,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,...,Y,Y,Y,Y,Y,Y,Y,,,Y,Y,Y,Y,,,,,,Y,Y,,,Y,Y,Y,Y,N,Y,Y,,,Y,Y,,Y,Y,Y,Y,Y,Y
2014,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,...,Y,Y,Y,,,Y,,,,Y,N,N,Y,,,,,,,Y,,,Y,Y,Y,Y,Y,N,Y,,,N,Y,,Y,Y,Y,N,Y,Y
2015,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,N,Y,N,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,...,Y,Y,Y,,,Y,,,,N,N,,,,,,,,,Y,,,Y,Y,N,Y,Y,N,Y,,,N,,,N,Y,Y,,Y,Y
0,US88579Y1010,US0028241000,US00287Y1091,IE00B4BNMY34,US00724F1012,US0530151036,US0255371017,US00130H1059,US00817Y1082,US0010551028,US00846U1016,US0268747849,US03748R7474,US0091581068,US00971T1016,US0126531013,US0185811082,US0188021085,US0200021014,US0213461017,US02209S1033,US0236081024,US02376R1023,US0258161092,US0304201033,US0311621009,US0325111070,US03349M1053,US0367521038,IE00BLP1HW54,US0374111054,US0378331005,US0382221051,US00206R1023,US0527691069,US0534841012,US0536111091,US05722G1004,US0584981064,US0718131099,...,US0079031078,US0893021032,US2423702032,US24702R1014,US4230741039,US5253271028,US6085541018,US6294911010,US8520611000,US8807701029,US9255501051,US02076X1028,US2056381096,US2103711006,IE00B40K9117,US28336L1098,US86764P1093,US3823881061,US4325891095,US5297711070,US58405U1025,US7432631056,US2578672006,US8123501061,US8685363017,US0015471081,US45073V1089,US47102X1054,US5894331017,US6376401039,US7434101025,US7472771010,US86732Y1091,US8796641004,CH0044328745,US02079K1079,US1912161007,US1712321017,US02079K3059,GB00BDCPN049


In [16]:
# transpose the dataframe
csr_committee_yes_df = csr_committee_yes_df.T
# rename the isin column from 0 to isin
csr_committee_yes_df.rename(columns={0: 'isin'}, inplace=True)
# turn everything in dataframe into lower case
csr_committee_yes_df = csr_committee_yes_df.applymap(lambda x: x.lower() if not pd.isna(x) else x)

csr_committee_yes_df


Name,2010,2011,2012,2013,2014,2015,isin
3m company,y,y,y,y,y,y,us88579y1010
abbott laboratories,y,y,y,y,y,y,us0028241000
abbvie,,,,n,y,y,us00287y1091
accenture plc,y,y,y,y,y,y,ie00b4bnmy34
adobe inc,y,y,y,y,y,y,us00724f1012
...,...,...,...,...,...,...,...
alphabet inc_diff isin,y,y,y,y,y,y,us02079k1079
coc_diff isin,y,y,y,y,y,y,us1912161007
chubb,y,y,y,y,n,,us1712321017
alphabet inc,y,y,y,y,y,y,us02079k3059


## Get relevant persons from committee_df

In [17]:
# show the unique committees included in committee_df
print(unique_coms)
# select only the relevant ones
rel_comms = ['environment', 'csrresponsibility', 'social']
rel_comms

['audit', 'compensation', 'governance', 'nominating', 'finance', 'humanresources', 'executive', 'remuneration', 'environment', 'csrresponsibility', 'management', 'social', 'shareholdergrievance']


['environment', 'csrresponsibility', 'social']

In [18]:
# get the relevant person who sits on a relevant committee
rel_person = []

for index, row in committee_df.iterrows():
    for com_type in row['committee']:
        if com_type in rel_comms:
            rel_person.append(index)
        else:
            pass

print(len(rel_person), 'people with relevant committee memberships')


510 people with relevant committee memberships


In [19]:
# put all relevant people in a dataframe
all_rel_persons_df = committee_df.iloc[rel_person, :]
all_rel_persons_df


Unnamed: 0,name,committee,age,last_position,director_start,director_end,executive_start,executive_end,comp_name,ticker,missing_start_date,2011,2012,2013,2014,2015,current_position,dir_exec,in_position,isin
335,mr. william (bill) hernandez,"[audit, environment, finance, governance, nomi...",72.0,independent director,0.0,0.0,2011.0,2020.0,albemarle corp,alb,0.0,1,1,1,1,1,,,,us0126531013
337,"dr. harriett (tee) taggart , ph.d.","[audit, compensation, environment, executive, ...",72.0,independent director,0.0,0.0,2007.0,2020.0,albemarle corp,alb,0.0,1,1,1,1,1,,,,us0126531013
338,mr. jimmy (jim) nokes,"[audit, compensation, environment, executive, ...",,lead independent director,0.0,0.0,2009.0,2018.0,albemarle corp,alb,0.0,1,1,1,1,1,,,,us0126531013
339,mr. barry perry,"[audit, compensation, environment, executive]",73.0,independent director,0.0,0.0,2010.0,2018.0,albemarle corp,alb,0.0,1,1,1,1,1,,,,us0126531013
341,mr. joseph mahady,"[compensation, environment]",66.0,independent director,0.0,0.0,2012.0,2015.0,albemarle corp,alb,0.0,0,1,1,1,1,,,,us0126531013
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7128,richard o'brien,"[audit, environment]",66.0,,,,,,xcel energy inc,xel,,0,1,1,1,1,independent director,2012.0,2012.0,us98389b1008
7129,mr. david owens,"[environment, finance]",71.0,,,,,,xcel energy inc,xel,,0,0,0,0,0,independent director,2018.0,2018.0,us98389b1008
7130,mr. james (jim) prokopanko,"[compensation, environment, governance, nomina...",66.0,,,,,,xcel energy inc,xel,,0,0,0,0,0,independent director,2016.0,2016.0,us98389b1008
7131,mr. james (joe) sheppard,"[environment, finance]",71.0,,,,,,xcel energy inc,xel,,1,1,1,1,1,independent director,2011.0,2011.0,us98389b1008


In [20]:
# write this dataframe to an excel file
all_rel_persons_df.to_excel('/content/drive/My Drive/director-csr/committees/people_rel_comms.xlsx',
                            sheet_name='rel_people')


In [21]:
# now I also need to add a column to the csr_committee_yes_df if there is definitely
# a board committee based on the info from all_rel_persons_df
comps_with_comm = all_rel_persons_df['isin'].unique()
csr_committee_yes_df['board_committee'] = csr_committee_yes_df['isin'].apply(lambda x: 'Yes' if x in comps_with_comm else 'No')
csr_committee_yes_df


Name,2010,2011,2012,2013,2014,2015,isin,board_committee
3m company,y,y,y,y,y,y,us88579y1010,Yes
abbott laboratories,y,y,y,y,y,y,us0028241000,No
abbvie,,,,n,y,y,us00287y1091,No
accenture plc,y,y,y,y,y,y,ie00b4bnmy34,No
adobe inc,y,y,y,y,y,y,us00724f1012,No
...,...,...,...,...,...,...,...,...
alphabet inc_diff isin,y,y,y,y,y,y,us02079k1079,No
coc_diff isin,y,y,y,y,y,y,us1912161007,Yes
chubb,y,y,y,y,n,,us1712321017,No
alphabet inc,y,y,y,y,y,y,us02079k3059,No


The categorizing whether a company has a board committee related to CSR is not perfect because it relies on director information, therefore, directors sitting on multiple boards will cause false positives. However, this will be dealt with during manual review.

In [22]:
# write this dataframe to an excel file
csr_committee_yes_df.to_excel('/content/drive/My Drive/director-csr/committees/comps_rel_comms.xlsx',
                                sheet_name='rel_comps')


## Add committee

In [23]:
# read in the biography director file
dir_bio_df = pd.read_excel('/content/drive/My Drive/director-csr/director_bios_all.xlsx')
dir_bio_df.drop(columns=['Unnamed: 0'], inplace=True)
# rename the isin_x column to just isin to prepare for merging
dir_bio_df.rename(columns={'isin_x': 'isin'}, inplace=True)
dir_bio_df.head()


Unnamed: 0,name,age_x,last_position,director_start,director_end,executive_start,executive_end,comp_name_x,ticker_x,missing_start_date,2011,2012,2013,2014,2015,current_position,dir_exec,in_position,isin,org_name_x,qualification,last_name_x,unique_dir_id,all_years,first_name,middle_name,nick_name,last_name_y,comp_name_y,ticker_y,education,prim_comp,biographies,age_y,isin_y,all_tickers,org_name_y
0,christina gold,72.0,independent director,0.0,0.0,1997.0,2020.0,itt inc,itt,0.0,1,1,1,1,1,,,,us45073v1089,ms. christina gold,,gold,7917,no,Christina,A.,-,Gold,The Western Union Company (NYSE:WU),NYSE:WU,Ecole des Hautes Etudes Commerciales de Montre...,First Data Merchant Services Corporation,Mrs. Christina A. Gold has been the Chief Exec...,67.0,US9598021098,NYSE:WU; BMV:WU *; BOVESPA:WUNI34; DB:W3U,"Gold, Christina A. (Prior Board)"
1,frank macinnis,72.0,independent chairman of the board,2011.0,2020.0,2001.0,2020.0,itt inc,itt,0.0,1,1,1,1,1,,,,us45073v1089,mr. frank macinnis,,macinnis,3325,no,Frank,T.,-,MacInnis,EMCOR Group Inc. (NYSE:EME),NYSE:EME,University of Alberta; University Of Alberta L...,MES Holdings Corporation,Mr. Frank T. MacInnis serves as the President ...,68.0,US29084Q1004,NYSE:EME; DB:EM4,"MacInnis, Frank T. (Prior Board)"
2,denise ramos,63.0,"president, chief executive officer, director",2011.0,2019.0,2011.0,2019.0,itt inc,itt,0.0,1,1,1,1,1,,,,us45073v1089,ms. denise ramos,,ramos,7996,no,Denise,L.,-,Ramos,ITT Corporation (NYSE:ITT),NYSE:ITT,The University of Chicago; Purdue University,ITT Corporation (NYSE:ITT) (Board),Ms. Denise L. Ramos serves as the Chief Execut...,58.0,US4509112011,NYSE:ITT; DB:ITTA,"Ramos, Denise L. (Board)"
3,orlando ashford,51.0,,,,,,itt inc,itt,,0,1,1,1,1,independent director,2012.0,2012.0,us45073v1089,mr. orlando ashford,,ashford,5733,no,,,,,,,,,,,,,
4,donald defosset,72.0,,,,,,itt inc,itt,,0,1,1,1,1,independent director,2012.0,2012.0,us45073v1089,"mr. donald (don) defosset , jr.",jr.,defosset,2984,no,Donald,-,Don,DeFosset,"Walter Energy, Inc.",-,Harvard Business School; Purdue University,ATL Partners,"Mr. Donald DeFosset, Jr., also known as Don, B...",66.0,US93317Q1058,OTCPK:WLTG.Q,"DeFosset, Donald (Prior Board)"


In [24]:
# shape of the dataframe
dir_bio_df.shape


(7409, 37)

In [25]:
# merge both dataframes
dir_bio_comm_df = pd.merge(csr_committee_yes_df, dir_bio_df, how='right', on='isin')
dir_bio_comm_df.head()


Unnamed: 0,2010,2011,2012,2013,2014,2015,isin,board_committee,name,age_x,last_position,director_start,director_end,executive_start,executive_end,comp_name_x,ticker_x,missing_start_date,2011.1,2012.1,2013.1,2014.1,2015.1,current_position,dir_exec,in_position,org_name_x,qualification,last_name_x,unique_dir_id,all_years,first_name,middle_name,nick_name,last_name_y,comp_name_y,ticker_y,education,prim_comp,biographies,age_y,isin_y,all_tickers,org_name_y
0,y,n,n,n,y,y,us45073v1089,No,christina gold,72.0,independent director,0.0,0.0,1997.0,2020.0,itt inc,itt,0.0,1,1,1,1,1,,,,ms. christina gold,,gold,7917,no,Christina,A.,-,Gold,The Western Union Company (NYSE:WU),NYSE:WU,Ecole des Hautes Etudes Commerciales de Montre...,First Data Merchant Services Corporation,Mrs. Christina A. Gold has been the Chief Exec...,67.0,US9598021098,NYSE:WU; BMV:WU *; BOVESPA:WUNI34; DB:W3U,"Gold, Christina A. (Prior Board)"
1,y,n,n,n,y,y,us45073v1089,No,frank macinnis,72.0,independent chairman of the board,2011.0,2020.0,2001.0,2020.0,itt inc,itt,0.0,1,1,1,1,1,,,,mr. frank macinnis,,macinnis,3325,no,Frank,T.,-,MacInnis,EMCOR Group Inc. (NYSE:EME),NYSE:EME,University of Alberta; University Of Alberta L...,MES Holdings Corporation,Mr. Frank T. MacInnis serves as the President ...,68.0,US29084Q1004,NYSE:EME; DB:EM4,"MacInnis, Frank T. (Prior Board)"
2,y,n,n,n,y,y,us45073v1089,No,denise ramos,63.0,"president, chief executive officer, director",2011.0,2019.0,2011.0,2019.0,itt inc,itt,0.0,1,1,1,1,1,,,,ms. denise ramos,,ramos,7996,no,Denise,L.,-,Ramos,ITT Corporation (NYSE:ITT),NYSE:ITT,The University of Chicago; Purdue University,ITT Corporation (NYSE:ITT) (Board),Ms. Denise L. Ramos serves as the Chief Execut...,58.0,US4509112011,NYSE:ITT; DB:ITTA,"Ramos, Denise L. (Board)"
3,y,n,n,n,y,y,us45073v1089,No,orlando ashford,51.0,,,,,,itt inc,itt,,0,1,1,1,1,independent director,2012.0,2012.0,mr. orlando ashford,,ashford,5733,no,,,,,,,,,,,,,
4,y,n,n,n,y,y,us45073v1089,No,donald defosset,72.0,,,,,,itt inc,itt,,0,1,1,1,1,independent director,2012.0,2012.0,"mr. donald (don) defosset , jr.",jr.,defosset,2984,no,Donald,-,Don,DeFosset,"Walter Energy, Inc.",-,Harvard Business School; Purdue University,ATL Partners,"Mr. Donald DeFosset, Jr., also known as Don, B...",66.0,US93317Q1058,OTCPK:WLTG.Q,"DeFosset, Donald (Prior Board)"


In [26]:
# this function is taken from biography_matching.ipynb
def clean_names(df, bio=True):
    '''
    Function to clean up the director names so that they can be matched
    Args: df = dataframe; containing director names
          bio = bool; True if the biographies dataframe is added, False otherwise 
    Returns: df = dataframe
    '''

    # change the strings to lower case
    df.iloc[:,0] = df.iloc[:,0].apply(lambda x: x.lower())

    # check if the names contain anything in parentheses and if so remove them and their content
    df.iloc[:,0] = df.iloc[:,0].apply(lambda x: re.sub(r'\([^()]*\)', '', x))

    # check if the names contain a title like ms. and mr. and if so remove them
    df.iloc[:,0] = df.iloc[:,0].apply(lambda x: re.sub(r'^\w{2,3}\. ?', '', x))

    # do two different things with the commas for the different dataframes
    if bio:
        # move the last name in the front of the comma to the back of the string and remove the comma
        df.iloc[:,0] = df.iloc[:,0].apply(lambda x: ' '.join([x.split(',')[1], x.split(',')[0]]))
    else:
        # create a new column that contains all the words after a comma at the end
        df['qualification'] = df.iloc[:,0].apply(lambda x: x.split(',')[-1] if len(x.split(',')) > 1 else None)
        df.iloc[:,0] = df.iloc[:,0].apply(lambda x: x.split(',')[0])

    # remove any initials or titles because they might be distracting when matching names
    df.iloc[:,0] = df.iloc[:,0].apply(lambda x: ' '.join([name if '.' not in name else '' for name in x.split()]))

    # remove 'the' substring from names
    df.iloc[:,0] = df.iloc[:,0].apply(lambda x: re.sub(r'^the\s', '', x))

    # ensure that all white space is stripped
    df.iloc[:,0] = df.iloc[:,0].apply(lambda x: re.sub(' +', ' ', x).strip())

    return df
    

In [27]:
# clean the names in the all_rel_persons_df
all_rel_persons_df = clean_names(all_rel_persons_df, bio=False)
all_rel_persons_df.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(ilocs[0], value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,name,committee,age,last_position,director_start,director_end,executive_start,executive_end,comp_name,ticker,missing_start_date,2011,2012,2013,2014,2015,current_position,dir_exec,in_position,isin,qualification
335,william hernandez,"[audit, environment, finance, governance, nomi...",72.0,independent director,0.0,0.0,2011.0,2020.0,albemarle corp,alb,0.0,1,1,1,1,1,,,,us0126531013,
337,harriett taggart,"[audit, compensation, environment, executive, ...",72.0,independent director,0.0,0.0,2007.0,2020.0,albemarle corp,alb,0.0,1,1,1,1,1,,,,us0126531013,ph.d.
338,jimmy nokes,"[audit, compensation, environment, executive, ...",,lead independent director,0.0,0.0,2009.0,2018.0,albemarle corp,alb,0.0,1,1,1,1,1,,,,us0126531013,
339,barry perry,"[audit, compensation, environment, executive]",73.0,independent director,0.0,0.0,2010.0,2018.0,albemarle corp,alb,0.0,1,1,1,1,1,,,,us0126531013,
341,joseph mahady,"[compensation, environment]",66.0,independent director,0.0,0.0,2012.0,2015.0,albemarle corp,alb,0.0,0,1,1,1,1,,,,us0126531013,


In [28]:
# manual sanity check
for name in dir_bio_comm_df.sort_values(by='name', ascending=True)['name']:
    if 'fuller' in name:
        print(name)


joseph fuller
laurance fuller


During my manual review of DEF 14A reports for biographies and committee memberships, I noticed several inconsistencies in the name files and therefore, I will be doing some name replacements

In [29]:
# merge this cleaned dataframe with the dir_bio_df
dir_bio_comm_all = pd.merge(all_rel_persons_df[['name','committee']], dir_bio_comm_df, how='right', on='name')
dir_bio_comm_all.shape


(7457, 45)

In [30]:
# example of the merged dataframes
dir_bio_comm_df[dir_bio_comm_df['name'] == 'george buckley']


Unnamed: 0,2010,2011,2012,2013,2014,2015,isin,board_committee,name,age_x,last_position,director_start,director_end,executive_start,executive_end,comp_name_x,ticker_x,missing_start_date,2011.1,2012.1,2013.1,2014.1,2015.1,current_position,dir_exec,in_position,org_name_x,qualification,last_name_x,unique_dir_id,all_years,first_name,middle_name,nick_name,last_name_y,comp_name_y,ticker_y,education,prim_comp,biographies,age_y,isin_y,all_tickers,org_name_y
539,,,,,,,us0394831020,,george buckley,73.0,independent director,0.0,0.0,2008.0,2013.0,archer daniels midland co,adm,0.0,1,1,1,0,0,,,,"dr. george buckley , ph.d.",ph.d.,buckley,583,no,George,W.,-,Buckley,Brunswick Corporation (NYSE:BC),NYSE:BC,Kirklees College; University of Southampton; U...,Arle Capital Partners Limited (Board),"Sir George W. Buckley, Ph.D. has been a Partne...",68,US1170431092,NYSE:BC; DB:BWI,"Buckley, George W. (Prior Board)"
4646,y,y,y,y,y,y,us88579y1010,Yes,george buckley,73.0,executive chairman of the board,2005.0,2012.0,2005.0,2012.0,3m co,mmm,0.0,1,1,0,0,0,,,,"mr. george buckley , ph.d.",ph.d.,buckley,3493,no,George,W.,-,Buckley,Brunswick Corporation (NYSE:BC),NYSE:BC,Kirklees College; University of Southampton; U...,Arle Capital Partners Limited (Board),"Sir George W. Buckley, Ph.D. has been a Partne...",68,US1170431092,NYSE:BC; DB:BWI,"Buckley, George W. (Prior Board)"
5394,y,y,y,y,y,y,us7134481081,Yes,george buckley,73.0,independent director,0.0,0.0,2012.0,2019.0,pepsico inc,pep,0.0,0,1,1,1,1,,,,"mr. george buckley , ph.d.",ph.d.,buckley,3493,no,George,W.,-,Buckley,Brunswick Corporation (NYSE:BC),NYSE:BC,Kirklees College; University of Southampton; U...,Arle Capital Partners Limited (Board),"Sir George W. Buckley, Ph.D. has been a Partne...",68,US1170431092,NYSE:BC; DB:BWI,"Buckley, George W. (Prior Board)"
6299,y,y,y,y,y,n,us8545021011,No,george buckley,73.0,,,,,,stanley black & decker inc,swk,,1,1,1,1,1,independent chairman of the board,2010.0,2017.0,"dr. george buckley , ph.d.",ph.d.,buckley,583,no,George,W.,-,Buckley,Brunswick Corporation (NYSE:BC),NYSE:BC,Kirklees College; University of Southampton; U...,Arle Capital Partners Limited (Board),"Sir George W. Buckley, Ph.D. has been a Partne...",68,US1170431092,NYSE:BC; DB:BWI,"Buckley, George W. (Prior Board)"


In [31]:
# sort this dataframe by company name and not by director name
dir_bio_comm_all.sort_values(by='comp_name_x', ascending=True, inplace=True)
# rearrange the order of the dataframe
cols = list(dir_bio_comm_all.columns)
new_cols = ['name', 'org_name_x', 'org_name_y', 'board_committee', 'committee', 'comp_name_x', 'isin', 'biographies']
remove_cols = [col for col in cols if col not in new_cols]
new_cols.extend(remove_cols)
dir_bio_comm_all = dir_bio_comm_all[new_cols]

dir_bio_comm_all.head()


Unnamed: 0,name,org_name_x,org_name_y,board_committee,committee,comp_name_x,isin,biographies,2010,2011,2012,2013,2014,2015,age_x,last_position,director_start,director_end,executive_start,executive_end,ticker_x,missing_start_date,2011.1,2012.1,2013.1,2014.1,2015.1,current_position,dir_exec,in_position,qualification,last_name_x,unique_dir_id,all_years,first_name,middle_name,nick_name,last_name_y,comp_name_y,ticker_y,education,prim_comp,age_y,isin_y,all_tickers
2130,thomas brown,mr. thomas (tony) brown,"Brown, Thomas C. (Prior Board)",Yes,,3m co,us88579y1010,Thomas C. Brown served as Chairman and Chief E...,y,y,y,y,y,y,64.0,,,,,,mmm,,0,0,0,1,1,independent director,2014.0,2014.0,,brown,7163,no,Thomas,C.,-,Brown,TMBR/Sharp Drilling Inc.,-,-,TMBR/Sharp Drilling Inc. (Prior Board),88,US87257P1012,-
2131,thomas brown,mr. thomas (tony) brown,"Brown, Thomas R. (Board)",Yes,,3m co,us88579y1010,"Mr. Thomas R. Brown, also known as Tom, has be...",y,y,y,y,y,y,64.0,,,,,,mmm,,0,0,0,1,1,independent director,2014.0,2014.0,,brown,7163,no,Thomas,R.,Tom,Brown,LRAD Corporation (NasdaqCM:LRAD),NasdaqCM:LRAD,"Rutgers, The State University of New Jersey",LRAD Corporation (NasdaqCM:LRAD) (Board),65,US50213V1098,NasdaqCM:LRAD
2903,inge thulin,mr. inge thulin,"Thulin, Inge G. (Board)",Yes,,3m co,us88579y1010,Mr. Inge G. Thulin has been the Chief Executiv...,y,y,y,y,y,y,66.0,"executive chairman of the board,",2003.0,2019.0,2012.0,2019.0,mmm,0.0,0,1,1,1,1,,,,,thulin,3836,no,Inge,G.,-,Thulin,3M Company (NYSE:MMM),NYSE:MMM,Göteborgs universitet; IHM Business School,3M Company (NYSE:MMM) (Board),62,US88579Y1010,NYSE:MMM; XTRA:MMM; BMV:MMM *; SWX:MMM; LSE:0Q...
1284,vance coffman,"dr. vance coffman , ph.d.","Coffman, Vance D. (Prior Board)",Yes,,3m co,us88579y1010,Dr. Vance D. Coffman served as the Chief Execu...,y,y,y,y,y,y,74.0,independent director,0.0,0.0,2002.0,2018.0,mmm,0.0,1,1,1,1,1,,,,ph.d.,coffman,1013,no,Vance,D.,-,Coffman,Lockheed Martin Corporation (NYSE:LMT),NYSE:LMT,Stanford University; Iowa State University; Em...,Amgen Inc. (NasdaqGS:AMGN) (Board),71,US5398301094,NYSE:LMT; XTRA:LOM; SWX:LMT; BASE:LMT; BMV:LMT...
641,edward liddy,mr. edward (ed) liddy,"Liddy, Edward M. (Prior Board)",Yes,,3m co,us88579y1010,"Mr. Edward M. Liddy, also known as Ed, serves ...",y,y,y,y,y,y,73.0,independent director,0.0,0.0,2000.0,2020.0,mmm,0.0,1,1,1,1,1,,,,,liddy,3134,no,Edward,M.,Ed,Liddy,The Allstate Corporation (NYSE:ALL),NYSE:ALL,The George Washington University; Catholic Uni...,"Clayton, Dubilier & Rice, Inc.",69,US0200021014,NYSE:ALL; NYSE:ALL.PRB; NYSE:ALL.PRA; NYSE:ALL...


In [32]:
# what is the size of this final dataframe
dir_bio_comm_all.shape


(7457, 45)

In [33]:
# write this dataframe to an excel file
dir_bio_comm_all.to_excel('/content/drive/My Drive/director-csr/dir_bio_comm_all.xlsx',
                                sheet_name='dir_bio_comm')


## Please note the following
I had already started the manual review of the `dir_bio_comm_all` excel file when I tried some additional techniques to match directors to biographies in the `biography_matching` notebook. This additional code has now resulted in a different review list that was generated at the end of this notebook and saved as `dir_bio_comm_all.xlsx`. The manual review has been completed with the previous version of this excel list. However, it is not possible for me under the current time constraints, to revert those changes in the biography matching notebook to again generate the original excel list. Therefore, we will all have to live with this slight discrepancy in the lists generated for manual review.

**This will not change anything in relation to the overall results or calculations of this thesis.** My manually reviewed list will be compared to this newly generated excel file to ensure that no director was missed. So this can just be considered an FYI and a sorry for the inconsistency.