# Centrality of C-Suite Executives

*Abstract: In this Jupyter notebook, we used network analysis and regression techniques to explore the relationship between executive centrality measures and financial performance of companies in the Dow Jones Industrial Average (DJIA) index. We began by calculating two centrality measures for each executive in the dataset: Betweenness Centrality and Eigenvector Centrality. We then grouped these centrality measures by company and year, and merged the resulting dataframe with financial data for each company. We then performed Ordinary Least Squares (OLS) regression to determine if there was any significant relationship between the centrality measures and financial performance, as measured by Return on Assets (ROA) and Return on Equity (ROE).*

In [1]:
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

In [2]:
stock_df = pd.read_csv('fin_ratios.csv')

  stock_df = pd.read_csv('fin_ratios.csv')


In [3]:
# Cleanup and format stock_df
stock_df.columns = map(str.lower, stock_df.columns)
stock_df = stock_df.rename(columns={'tic': 'ticker'})
stock_df['public_date'] = pd.to_datetime(stock_df['public_date'], format='%m/%d/%Y')
stock_df['year'] = stock_df['public_date'].dt.year

# Load the executive data into exec_df
exec_df = pd.read_csv('Dow30Bio.csv')
exec_df.columns = map(str.lower, exec_df.columns)


In [4]:
# Curently only data from 2016 onwards
stock_df = stock_df[stock_df['year'] >= 2016]


In [5]:
stock_df_copy = stock_df.copy()

In [6]:
stock_df.head()

Unnamed: 0,gvkey,permno,adate,qdate,public_date,capei,bm,evm,pe_op_basic,pe_op_dil,...,rd_sale,adv_sale,staff_sale,accrual,ptb,peg_trailing,divyield,ticker,cusip,year
715,1004,54594,05/31/2015,11/30/2015,2016-01-31,16.46,1.123,26.688,-26.936,-26.936,...,0.0,0.0,0.0,0.015,0.762,,1.43%,AIR,36110,2016
716,1004,54594,05/31/2015,11/30/2015,2016-02-29,16.68,1.123,26.688,-27.295,-27.295,...,0.0,0.0,0.0,0.015,0.772,,1.41%,AIR,36110,2016
717,1004,54594,05/31/2015,11/30/2015,2016-03-31,18.077,1.123,26.688,-29.833,-29.833,...,0.0,0.0,0.0,0.015,0.837,,1.29%,AIR,36110,2016
718,1004,54594,05/31/2015,02/29/2016,2016-04-30,21.459,1.268,19.001,-38.774,-38.159,...,0.0,0.0,0.0,0.006,0.89,,1.25%,AIR,36110,2016
719,1004,54594,05/31/2015,02/29/2016,2016-05-31,21.789,1.268,19.001,-39.371,-38.746,...,0.0,0.0,0.0,0.006,0.904,,1.23%,AIR,36110,2016


In [7]:
exec_df.head()

Unnamed: 0,year,rt_id,company_id,director_detail_id,ticker,cusip,name,meetingdate,first_name,last_name,...,former_employee_yn,designated,business_transaction,relative_yn,interlocking,charity,otherlink,financial_expert,succ_comm,non_ceo_leader
0,2022,,641517,81058,AA,13872106,ALCOA CORPORATION,05/05/2022,MARY,CITRINO,...,,,,,,,,,Yes,
1,2022,,641517,383221,AA,13872106,ALCOA CORPORATION,05/05/2022,PASQUALE,FIORE,...,,,,,,,,Yes,Yes,
2,2022,,641517,151881,AA,13872106,ALCOA CORPORATION,05/05/2022,THOMAS,GORMAN,...,,,,,,,,,Yes,
3,2022,,641517,250676,AA,13872106,ALCOA CORPORATION,05/05/2022,ROY,HARVEY,...,,,,,,,,,Yes,
4,2022,,641517,62197,AA,13872106,ALCOA CORPORATION,05/05/2022,JAMES,HUGHES,...,,,,,,,,Yes,Yes,


### Freeman Betweenness Centrality

Value of importance based on the number of shortest paths between that node and other nodes in the network.

Given an graph with N nodes, represented by an N x N adjacency matrix A, and a vector b representing the betweenness centrality values for each node,

Calculate the shortest path lengths between all pairs of nodes using an algorithm such as Dijkstra's algorithm.

For each node i, count the number of shortest paths that pass through it, and divide by the total number of shortest paths between all pairs of nodes in the network:

$$b_i = \frac{1}{(N-1)(N-2)}\sum_{s \neq i \neq t} \frac{\sigma_{st}(i)}{\sigma_{st}}$$

where $\sigma_{st}$ is the total number of shortest paths between nodes s and t, and $\sigma_{st}(i)$ is the number of shortest paths between s and t that pass through node i.

In [8]:
G = nx.Graph()
# Iterate through rows
for _, row in exec_df.iterrows():
    director_id = row['director_detail_id']
    company_id = row['company_id']
    G.add_node(director_id)
    other_directors = exec_df[(exec_df['company_id']==company_id) & (exec_df['director_detail_id']!=director_id)]['director_detail_id'].tolist()
    for other_director in other_directors:
        G.add_edge(director_id, other_director)

# Calculate the betweenness centrality measure for each executive
betweenness_centrality = nx.betweenness_centrality(G)
exec_df['betweenness_centrality'] = exec_df['director_detail_id'].apply(lambda x: betweenness_centrality[x])


In [9]:
exec_df = exec_df.sort_values(by=['betweenness_centrality'], ascending = False)

In [10]:
exec_df.head()

Unnamed: 0,year,rt_id,company_id,director_detail_id,ticker,cusip,name,meetingdate,first_name,last_name,...,designated,business_transaction,relative_yn,interlocking,charity,otherlink,financial_expert,succ_comm,non_ceo_leader,betweenness_centrality
235,2022,,6437,63762,AMGN,031162100,AMGEN INC.,05/17/2022,ELLEN,KULLMAN,...,,,,,Yes,,Yes,Yes,,0.086625
1502,2017,,515391,63762,GS,38141G104,"THE GOLDMAN SACHS GROUP, INC.",04/28/2017,ELLEN,KULLMAN,...,,Yes,,,,,,Yes,,0.086625
3590,2017,,157996,63762,UTX,913017109,UNITED TECHNOLOGIES CORPORATION,04/24/2017,ELLEN,KULLMAN,...,,Yes,,,,,,Yes,,0.086625
1513,2018,,515391,63762,GS,38141G104,"THE GOLDMAN SACHS GROUP, INC.",05/02/2018,ELLEN,KULLMAN,...,,,,,,,,Yes,,0.086625
3601,2018,,157996,63762,UTX,913017109,UNITED TECHNOLOGIES CORP.,04/30/2018,ELLEN,KULLMAN,...,,Yes,,,,,,Yes,Lead Dir,0.086625


### Eigenvector Centrality

Value from one axis of greatest commonality/centrality amongst the elements of our feature space. Think of it like the eigenvector associated with the largest eigenvalue of the adjacency matrix

Given an undirected graph with N nodes, represented by an N x N adjacency matrix A, and a vector x representing the centrality values for each node,

Normalize the adjacency matrix A by dividing each row by the sum of its elements, so that each row represents the probability of moving from one node to another through a random walk:

$$P_{ij} = \frac{A_{ij}}{\sum_k A_{ik}}$$

Calculate the eigenvector v of the normalized adjacency matrix A corresponding to the largest eigenvalue λ, using an iterative algorithm such as the power method:

$$A v = \lambda v$$

where $v$ is the eigenvector and $\lambda$ is the corresponding eigenvalue.

The eigenvector centrality values for each node are given by the corresponding elements of the eigenvector v:

$$c_i = v_i$$

In other words, the eigenvector centrality of a node is proportional to the sum of the eigenvector centrality values of its neighbors, weighted by the strength of the connections between them:

$$c_i = \sum_j A_{ij} \frac{c_j}{\sum_k A_{jk}}$$

In [12]:
G = nx.Graph()

for _, row in exec_df.iterrows():
    director_id = row['director_detail_id']
    company_id = row['company_id']
    G.add_node(director_id)
    other_directors = exec_df[(exec_df['company_id']==company_id) & (exec_df['director_detail_id']!=director_id)]['director_detail_id'].tolist()
    for other_director in other_directors:
        G.add_edge(director_id, other_director)

# Calculate the eigenvector centrality measure for each executive
eigenvector_centrality = nx.eigenvector_centrality(G)
exec_df['eigenvector_centrality'] = exec_df['director_detail_id'].apply(lambda x: eigenvector_centrality[x])

In [36]:
exec_df.columns

Index(['year', 'rt_id', 'company_id', 'director_detail_id', 'ticker', 'cusip',
       'name', 'meetingdate', 'first_name', 'last_name', 'fullname', 'age',
       'ethnicity', 'indexname', 'primary_employer', 'prititle',
       'employment_subsidiary', 'othertitle', 'country_of_empl', 'dirsince',
       'classification', 'grandfath', 'priorserv', 'type_of_services',
       'relation', 'mtgmonth', 'year_term_ends', 'ownless1', 'exchange_type',
       'nominee', 'num_of_shares', 'pcnt_ctrl_votingpower',
       'outside_public_boards', 'female', 'nom_membership', 'cg_membership',
       'comp_membership', 'audit_membership', 'year_of_termination',
       'employment_ceo', 'employment_chairman', 'employment_president',
       'employment_vicechairman', 'employment_treasurer', 'employment_cfo',
       'employment_coo', 'employment_secretary', 'employment_evp',
       'employment_svp', 'employment_vp', 'attend_less75_pct',
       'prof_services_yn', 'former_employee_yn', 'designated',
       

### Closeness Centrality

This value is how easily it is able to reach any other node on the network. Essentially, the the reciprocal of the sum of the shortest distances from that node to all other nodes in the graph.

$$C_i = \frac{1}{\sum_{j\neq i} d_{ij}}$$

where $C_i$ is the closeness centrality of node $i$, $d_{ij}$ is the shortest path between nodes $i$ and $j$, and the sum is taken over all nodes $j$ that are not equal to node $i$.

This one didn't really help our model much.

In [33]:
G = nx.Graph()

for _, row in exec_df.iterrows():
    director_id = row['director_detail_id']
    company_id = row['company_id']
    G.add_node(director_id)
    other_directors = exec_df[(exec_df['company_id']==company_id) & (exec_df['director_detail_id']!=director_id)]['director_detail_id'].tolist()
    for other_director in other_directors:
        G.add_edge(director_id, other_director)

# Calculate the closeness centrality measure
closeness_centrality = nx.closeness_centrality(G)
exec_df['closeness_centrality'] = exec_df['director_detail_id'].apply(lambda x: closeness_centrality[x])

In [15]:
# aggregate centrality values
comp_centrality = exec_df.groupby(['year', 'ticker'])[['betweenness_centrality', 'eigenvector_centrality', 'closeness_centrality']].mean().reset_index()

In [16]:
comp_centrality.head()

Unnamed: 0,year,ticker,betweenness_centrality,eigenvector_centrality,closeness_centrality
0,2016,AAPL,0.017648,0.034512,0.324897
1,2016,AIG,0.005334,0.026385,0.252304
2,2016,AMGN,0.005564,0.004469,0.288048
3,2016,AXP,0.006697,0.007273,0.298905
4,2016,BA,0.00898,0.005654,0.298271


#### Merge with stock_df data

In [17]:
merged_df = stock_df.merge(comp_centrality, on = ['year', 'ticker'], how = 'left')

In [18]:
merged_df = merged_df.dropna(subset=['betweenness_centrality', 'eigenvector_centrality', 'closeness_centrality'])


In [19]:
merged_df.head()

Unnamed: 0,gvkey,permno,adate,qdate,public_date,capei,bm,evm,pe_op_basic,pe_op_dil,...,accrual,ptb,peg_trailing,divyield,ticker,cusip,year,betweenness_centrality,eigenvector_centrality,closeness_centrality
1549,1300,10145,12/31/2014,09/30/2015,2016-01-31,25.062,0.25,11.342,17.086,17.258,...,-0.016,4.356,0.727,2.31%,HON,43851610,2016,0.004561,0.004019,0.249157
1550,1300,10145,12/31/2015,12/31/2015,2016-02-29,22.007,0.236,11.16,15.811,16.011,...,-0.014,4.138,0.915,2.35%,HON,43851610,2016,0.004561,0.004019,0.249157
1551,1300,10145,12/31/2015,12/31/2015,2016-03-31,24.078,0.236,11.16,17.48,17.701,...,-0.014,4.528,1.012,2.12%,HON,43851610,2016,0.004561,0.004019,0.249157
1552,1300,10145,12/31/2015,12/31/2015,2016-04-30,24.58,0.236,11.16,17.827,18.052,...,-0.014,4.622,1.032,2.08%,HON,43851610,2016,0.004561,0.004019,0.249157
1553,1300,10145,12/31/2015,03/31/2016,2016-05-31,25.144,0.22,10.913,17.352,17.566,...,-0.009,4.609,1.061,2.09%,HON,43851610,2016,0.004561,0.004019,0.249157


In [20]:
merged_df.shape

(4136, 70)

In [21]:
stock_df.columns

Index(['gvkey', 'permno', 'adate', 'qdate', 'public_date', 'capei', 'bm',
       'evm', 'pe_op_basic', 'pe_op_dil', 'pe_exi', 'pe_inc', 'ps', 'pcf',
       'dpr', 'npm', 'opmbd', 'opmad', 'gpm', 'ptpm', 'cfm', 'roa', 'roe',
       'roce', 'efftax', 'aftret_eq', 'aftret_invcapx', 'aftret_equity',
       'pretret_noa', 'pretret_earnat', 'gprof', 'equity_invcap',
       'debt_invcap', 'totdebt_invcap', 'capital_ratio', 'int_debt',
       'int_totdebt', 'cash_lt', 'invt_act', 'rect_act', 'debt_at',
       'debt_ebitda', 'short_debt', 'curr_debt', 'lt_debt', 'profit_lct',
       'ocf_lct', 'cash_debt', 'fcf_ocf', 'lt_ppent', 'dltt_be', 'debt_assets',
       'debt_capital', 'de_ratio', 'sale_invcap', 'sale_equity', 'sale_nwc',
       'rd_sale', 'adv_sale', 'staff_sale', 'accrual', 'ptb', 'peg_trailing',
       'divyield', 'ticker', 'cusip', 'year'],
      dtype='object')

In [22]:
merged_df.to_csv('dow_centrality.csv', index=False)

### OLS of Centralities with Return on Asset('roa') and Return on Equity('roe')

In [23]:
import statsmodels.api as sm

In [24]:
# Fill nan values with the mean 'roe' of the year
merged_df['roe'] = merged_df.groupby(['year', 'ticker'])['roe'].apply(lambda x: x.fillna(x.mean()))

nan_rows = merged_df[merged_df['roe'].isna()]['ticker']
nan_rows.unique()

#drop the rest of the nan
merged_df = merged_df[~merged_df['ticker'].isin(['BA', 'HPQ', 'MCD'])]

In [25]:
# Create variables for OLS
X = merged_df[['betweenness_centrality', 'eigenvector_centrality']]
X = sm.add_constant(X)
y_roa = merged_df['roa']
y_roe = merged_df['roe']





In [26]:
# Regress centralities on return on asset
model_roa = sm.OLS(y_roa, X).fit()
print(model_roa.summary())

                            OLS Regression Results                            
Dep. Variable:                    roa   R-squared:                       0.048
Model:                            OLS   Adj. R-squared:                  0.047
Method:                 Least Squares   F-statistic:                     96.93
Date:                Thu, 02 Mar 2023   Prob (F-statistic):           8.38e-42
Time:                        12:03:58   Log-Likelihood:                 4468.2
No. Observations:                3884   AIC:                            -8930.
Df Residuals:                    3881   BIC:                            -8912.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
const                      0

In [27]:
# Regress centralities on return on equity
model_roe = sm.OLS(y_roe, X).fit()
print(model_roe.summary())

                            OLS Regression Results                            
Dep. Variable:                    roe   R-squared:                       0.003
Model:                            OLS   Adj. R-squared:                  0.002
Method:                 Least Squares   F-statistic:                     5.832
Date:                Thu, 02 Mar 2023   Prob (F-statistic):            0.00296
Time:                        12:03:58   Log-Likelihood:                -5410.6
No. Observations:                3884   AIC:                         1.083e+04
Df Residuals:                    3881   BIC:                         1.085e+04
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
const                      0

Tried a OLS with 'closeness_centrality' included but there was no improvement

Overall, these two OLS regressions provide some evidence of relationship between executive centrality and financial performance, but the explanatory power of the models are limited due to the low R2 scores.

The regression of roa on betweenness_centrality and eigenvector_centrality found that both independent variables are statistically significant predictors of roa, with betweenness_centrality having a positive effect and eigenvector_centrality having a negative effect. However, the R-squared value of the model is low (0.048), so only a small proportion of variance can be explained by these two variables.

The regression of roe on betweenness_centrality and eigenvector_centrality shows that only eigenvector_centrality is a statistically significant predictor of roe, with a negative correlation. However, the R2 for this model is extremely low at (0.003), so it explains little variation in roe.

However, we can conclude and say that executive centrality is associated with financial performance is still a weak association. Other factors still need to be considered.