This is code example for my article on Medium. 
Fulltext is available here:

https://medium.com/exness-blog/from-employee-voices-to-expert-stars-uncovering-the-most-valuable-assets-in-organizations-7ae0654df092


# Experts Rank project

### General idea: 

We have several levels of employees: 
* experts, 
* managers, 
* top-manegers (C-levels). 

We want to know which experts are the most respectful among all the other categories of employees. To find out the answer we ask all the employees which experts they respect the most. 

### Approach

Answer of every employee looks like that: 
```sql 
'user_1': ['user_2', 'user_3', 'user_19']
```

where 
* user_1 - employee who filled the questionnaire. 
* user_2, user_3, user_19 - experts, mentioned by user_1

### Suggested algorithm

We can use Pagerank algorithm to analyze all the amount of answers.

The algorithm of PageRank works by counting the number and quality of links to an expert to determine a rough estimate of how important the expert is. The underlying assumption is that more important experts are likely to receive more links from other employees and then could partially transfer their value to the experts they choose.

<div>
<img src="img/pagerank.png" width="700"/>
</div>

Experts as nodes and mentiones as edges, taking into consideration 'authority hubs'. The rank value indicates the importance of a particular expert. A link to another employee counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it. An employee that is linked to by employee with high PageRank receives a higher rank itself (figure 'C').

In [1]:
import pandas as pd
import numpy as np
import networkx as nx

np.set_printoptions(suppress=True,
   formatter={'float_kind':'{:0.3f}'.format})  # float, 3 units precision

import scipy
scipy.__version__  # must be higher than 1.8.0

'1.10.0'

In [2]:
# Nodes could be digits
edges = pd.DataFrame({
        "source": [1, 1, 1, 2, 2, 3, 4, 4],
        "target": [2, 3, 4, 1, 3, 2, 2, 3],
        "weight": [1, 1, 1, 1, 1, 1, 1, 1],})

# Or text
edges = pd.DataFrame({
        "source": ['user_1', 'user_1', 'user_1', 'user_2', 
                   'user_2', 'user_3', 'user_4', 'user_4'],
        "target": ['user_2', 'user_3', 'user_4', 'user_1', 
                   'user_3', 'user_2', 'user_2', 'user_3'],
        "weight": [1, 1, 1, 1, 1, 1, 1, 1],})

edges

Unnamed: 0,source,target,weight
0,user_1,user_2,1
1,user_1,user_3,1
2,user_1,user_4,1
3,user_2,user_1,1
4,user_2,user_3,1
5,user_3,user_2,1
6,user_4,user_2,1
7,user_4,user_3,1


In [3]:
# Building Directed Graph with self loops: 

G = nx.from_pandas_edgelist(edges, 
                            edge_attr=True, 
                            create_using=nx.DiGraph)

In [4]:
for i in G: 
    print(G[i])

{'user_2': {'weight': 1}, 'user_3': {'weight': 1}, 'user_4': {'weight': 1}}
{'user_1': {'weight': 1}, 'user_3': {'weight': 1}}
{'user_2': {'weight': 1}}
{'user_2': {'weight': 1}, 'user_3': {'weight': 1}}


In [5]:
pr = nx.pagerank(G, 
                 alpha=0.85, 
                 max_iter=100)

for i in pr:
    print(i, round(pr[i], 3))

user_1 0.205
user_2 0.395
user_3 0.304
user_4 0.096


Let's check ourselves with Page Rank Simulator: 
https://computerscience.chemeketa.edu/cs160Reader/_static/pageRankApp/index.html


<div>
<img src="img/toy_ex.png" width="700"/>
</div>


In [6]:
# Let's check several cases:

d = {'user_1': ['user_2', 'user_3', 'user_19',  None], # Imagine this is CEO, 
                                                       # he mention (links) 2 his C-levels (user_2, user_3) and 1 expert
     
      # C-levels
     'user_2': ['user_1', 'user_5', 'user_7',   None], # C-level 1, links CEO and 2 his his managers down below
     'user_3': ['user_1', 'user_14','user_13',  None], # C-level 2, links CEO and 2 his his managers down below
     'user_4': ['user_1', 'user_19','user_15',  None], # C-level 3, links CEO and 2 his his managers down below

     
      # Department of C-level 1
     'user_5': ['user_2', 'user_6',  'user_7',  None], # Manager of C-level 1, links his 'C-level 1' and 2 experts
     'user_6': ['user_1', 'user_9',  'user_2',  None], # Expert  of C-level 1, links CEO and 1 expert
     'user_7': ['user_1', 'user_9',  None,      None], # Expert  of C-level 1, links CEO and 1 expert
     'user_8': ['user_5', None,      None,      None], # Expert  of C-level 1, links his manager and 1 expert  
                                                       # === Never mentioned user
     'user_9': ['user_5', 'user_6',  None,      None], # Expert  of C-level 1, links his manager and 1 expert
                                                       # === Hidden expert, C-levels never mentioned him

      # Department of C-level 2
     'user_10': ['user_3', 'user_11',  None,    None], # Manager of C-level 2, links his 'C-level 1' and 1 expert
     'user_11': ['user_10','user_12', 'user_9', None], # Expert  of C-level 2, links his manager and 1 expert
     'user_12': ['user_10','user_14', 'user_3', None], # Expert  of C-level 2, links his manager and 1 expert
     'user_13': ['user_10','user_14',  None,    None], # Expert  of C-level 2, links his manager and 1 expert
     'user_14': ['user_10', 'user_9',  None,    None], # Expert  of C-level 2, links (!) Expert of another Dep
     
      # Department of C-level 3
     'user_15': ['user_4','user_19',   None,    None], # Manager of C-level 3, links his 'C-level 1' and 1 expert
     'user_16': ['user_15','user_17',  None,    None], # Casual employee of C-level 3, links his manager and 1 expert
     'user_17': ['user_15','user_16',  None,    None], # Casual employee of C-level 3, links his manager and 1 expert
     'user_18': ['user_15', None ,     None,    None], # Casual employee of C-level 3, links his manager and 1 expert
     'user_19': ['user_18', 'user_17',
                                   'user_9','user_1']  # Expert  of C-level 3, link to CEO and linked by CEO 
                                                       # === Well-known expert case
    }

df = pd.DataFrame(data=d)
df.head()

Unnamed: 0,user_1,user_2,user_3,user_4,user_5,user_6,user_7,user_8,user_9,user_10,user_11,user_12,user_13,user_14,user_15,user_16,user_17,user_18,user_19
0,user_2,user_1,user_1,user_1,user_2,user_1,user_1,user_5,user_5,user_3,user_10,user_10,user_10,user_10,user_4,user_15,user_15,user_15,user_18
1,user_3,user_5,user_14,user_19,user_6,user_9,user_9,,user_6,user_11,user_12,user_14,user_14,user_9,user_19,user_17,user_16,,user_17
2,user_19,user_7,user_13,user_15,user_7,user_2,,,,,user_9,user_3,,,,,,,user_9
3,,,,,,,,,,,,,,,,,,,user_1


In [7]:
df_source = df.T.sort_index()
df_source = df_source.reset_index()
df_source = df_source.rename(columns={0: "target"})

df_edges_1 = df_source[['index', 1]]
df_edges_1 = df_edges_1.rename(columns={1: "target"})

df_edges_2 = df_source[['index', 2]]
df_edges_2 = df_edges_2.rename(columns={2: "target"})

df_edges_3 = df_source[['index', 3]]
df_edges_3 = df_edges_3.rename(columns={3: "target"})

df_unioned = df_source[['index', 'target']].append(df_edges_1, ignore_index=True)
df_unioned = df_unioned.append(df_edges_2, ignore_index=True)
df_unioned = df_unioned.append(df_edges_3, ignore_index=True)

df_unioned = df_unioned[df_unioned['target'].notna()]
df_unioned['weight'] = 1
df_unioned = df_unioned.rename(columns={'index':'source'})

print('amount of edges: ', len(df_unioned))

df_unioned.head()

amount of edges:  46


Unnamed: 0,source,target,weight
0,user_1,user_2,1
1,user_10,user_3,1
2,user_11,user_10,1
3,user_12,user_10,1
4,user_13,user_10,1


In [8]:
G = nx.from_pandas_edgelist(df_unioned, source='source', 
                            target='target', 
                            edge_attr=True, create_using=nx.DiGraph)

pr = nx.pagerank(G, alpha=0.85, max_iter=100) 

df_pr = pd.DataFrame(data=pr, index=[0]).T
df_pr = df_pr.rename(columns={0:'weight'})
df_pr = df_pr.sort_values(by=['weight'], ascending=False)

df_pr.head(40)

Unnamed: 0,weight
user_1,0.12017
user_9,0.094091
user_2,0.084014
user_5,0.078397
user_19,0.078248
user_6,0.070095
user_3,0.068164
user_15,0.06246
user_7,0.05391
user_10,0.050861


# Next steps: 

We can upload result of that algo to external tool like Gephi to create simple map in 2-dimentional space: 

<div>
<img src="img/gephi_viz.png" width="850"/>
</div>

Thick of line and size of dot is based on rank of expert.

### Useful links

* PageRank on wiki: 
https://en.wikipedia.org/wiki/PageRank

* PageRank simulator: 
https://computerscience.chemeketa.edu/cs160Reader/_static/pageRankApp/index.html

* Identifying Potential Managerial Personnel Using PageRank and Social Network Analysis: The Case Study of a European IT Company:
https://www.mdpi.com/2076-3417/11/15/6985

* Professional Evaluation. Application of the PageRank Algorithm in Employee Rating: http://archive.sciendo.com/EAM/eam.2014.15.issue-2/eam-2014-0015/eam-2014-0015.pdf

* Description of project idea: 
https://medium.com/exness-blog/from-employee-voices-to-expert-stars-uncovering-the-most-valuable-assets-in-organizations-7ae0654df092
