# Employee Network Analysis

## 💾 The data

The company has six months of information on inter-employee communication. For privacy reasons, only sender, receiver, and message length information are available [(source)](https://snap.stanford.edu/data/CollegeMsg.html). 

**Messages has information on the sender, receiver, and time.**
- "sender" - represents the employee id of the employee sending the message.
- "receiver" - represents the employee id of the employee receiving the message.
- "timestamp" - the date of the message.
- "message_length" - the length in words of the message.

**Employees has information on each employee;**
- "id" - represents the employee id of the employee.
- "department" - is the department within the company. 
- "location" - is the country where the employee lives.
- "age" - is the age of the employee.

_**Acknowledgments:** Pietro Panzarasa, Tore Opsahl, and Kathleen M. Carley. "Patterns and dynamics of users' behavior and interaction: Network analysis of an online community." Journal of the American Society for Information Science and Technology 60.5 (2009): 911-932._

## 💪 Competition challenge

Create a report that covers the following:  
  1. Which departments are the most/least active?
  2. Which employee has the most connections? 
  3. Identify the most influential departments and employees.
  4. Using the network analysis, in which departments would you recommend the HR team focus to boost collaboration?

# Data Exploration

**Importing modules**

In [97]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
plt.style.use('ggplot')

**Reading Employee Network Data:**

In [98]:
import pandas as pd

messages = pd.read_csv('data/messages.csv', parse_dates= ['timestamp'])
messages

Unnamed: 0,sender,receiver,timestamp,message_length
0,79,48,2021-06-02 05:41:34,88
1,79,63,2021-06-02 05:42:15,72
2,79,58,2021-06-02 05:44:24,86
3,79,70,2021-06-02 05:49:07,26
4,79,109,2021-06-02 19:51:47,73
...,...,...,...,...
3507,469,1629,2021-11-24 05:04:57,75
3508,1487,1543,2021-11-26 00:39:43,25
3509,144,1713,2021-11-28 18:30:47,51
3510,1879,1520,2021-11-29 07:27:52,58


In [99]:
employees = pd.read_csv('data/employees.csv')
employees

Unnamed: 0,id,department,location,age
0,3,Operations,US,33
1,6,Sales,UK,50
2,8,IT,Brasil,54
3,9,Admin,UK,32
4,12,Operations,Brasil,51
...,...,...,...,...
659,1830,Admin,UK,42
660,1839,Admin,France,28
661,1879,Engineering,US,40
662,1881,Sales,Germany,57


## Data Understanding

1. Check the shape of the data.
2. Generate data overview.
3. Check for null values.
4. Check for duplicate rows.

In [100]:
#checking 'employee' shape
employees.head()
employees.shape

(664, 4)

In [101]:
#checking 'messages' shape
messages.head()
messages.shape

(3512, 4)

In [102]:
#'employees' and 'messages' information

In [103]:
messages.info()
employees.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3512 entries, 0 to 3511
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   sender          3512 non-null   int64         
 1   receiver        3512 non-null   int64         
 2   timestamp       3512 non-null   datetime64[ns]
 3   message_length  3512 non-null   int64         
dtypes: datetime64[ns](1), int64(3)
memory usage: 109.9 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 664 entries, 0 to 663
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          664 non-null    int64 
 1   department  664 non-null    object
 2   location    664 non-null    object
 3   age         664 non-null    int64 
dtypes: int64(2), object(2)
memory usage: 20.9+ KB


In [104]:
#getting the number of null values per column - 'messages'
messages.isnull().sum()

sender            0
receiver          0
timestamp         0
message_length    0
dtype: int64

In [105]:
#getting the number of null values per column - 'employees'
employees.isnull().sum()

id            0
department    0
location      0
age           0
dtype: int64

Observation: there are no null value within the 2 datasets.

# **Question 1:** Which department was the most active?


In [106]:
employee_df = employees.rename(columns={'id': 'receiver'})
employee_df
all_rows_receiver = pd.merge(messages, employee_df, on='receiver')
all_rows_receiver = all_rows_receiver.rename({'department': 'receiver_department', 'location': 'receiver_location',  'age': 'receiver_age'}, axis=1)  # new method
employee_df = employees.rename(columns={'id': 'sender'})
all_rows = pd.merge(all_rows_receiver, employee_df, on = 'sender')
all_rows = all_rows.rename({'department': 'sender_department', 'location': 'sender_location',  'age': 'sender_age'}, axis=1)

all_rows

Unnamed: 0,sender,receiver,timestamp,message_length,receiver_department,receiver_location,receiver_age,sender_department,sender_location,sender_age
0,79,48,2021-06-02 05:41:34,88,IT,France,34,Sales,France,33
1,79,63,2021-06-02 05:42:15,72,Sales,France,38,Sales,France,33
2,79,58,2021-06-02 05:44:24,86,Sales,Germany,40,Sales,France,33
3,79,58,2021-06-03 01:12:11,37,Sales,Germany,40,Sales,France,33
4,79,70,2021-06-02 05:49:07,26,Operations,France,47,Sales,France,33
...,...,...,...,...,...,...,...,...,...,...
3507,1879,1520,2021-11-29 07:27:52,58,Admin,US,45,Engineering,US,40
3508,1879,1543,2021-11-29 07:37:49,17,Operations,US,48,Engineering,US,40
3509,1879,1543,2021-11-29 07:37:49,56,Operations,US,48,Engineering,US,40
3510,1802,1801,2021-10-06 05:43:06,56,Admin,Germany,26,Admin,Germany,46


There are 2 ways to measure activity: measuring the number of letters per a given time period, or measuing the total number of letters sent by each department. I shall start by looking at the total number of letters sent by each department.

Total number of messages recieved:

In [107]:
all_rows['receiver'].count()

3512

## Ranking by total messages sent:

Total messages by sent by each department:

In [108]:
messages_per_dept = all_rows.groupby('sender_department').count()
total_messages_per_dept = messages_per_dept.sort_values(by=['sender'], ascending = False)
total_messages_per_dept

Unnamed: 0_level_0,sender,receiver,timestamp,message_length,receiver_department,receiver_location,receiver_age,sender_location,sender_age
sender_department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Sales,1551,1551,1551,1551,1551,1551,1551,1551,1551
Operations,1013,1013,1013,1013,1013,1013,1013,1013,1013
Admin,857,857,857,857,857,857,857,857,857
IT,49,49,49,49,49,49,49,49,49
Engineering,26,26,26,26,26,26,26,26,26
Marketing,16,16,16,16,16,16,16,16,16


## The department ranking (1 being most active):
1. 💸 Sales
2. 🏎 Operations
3. 👸🏽 Admin
4. 💻 IT 
5. 🛠 Engineering
6. 🔮 Marketing

## Ranking by total messages sent per user:

Total messages sent per user in each department:

In [109]:
senders_df = all_rows.groupby(['sender_department'], as_index = False)['timestamp'].count()
senders_df = senders_df.rename(columns={'timestamp': 'num_of_messages'})
employees_per_dept_df = employees.groupby(['department'], as_index = False)['id'].count()
employees_per_dept_df = employees_per_dept_df.rename(columns={'id': 'count_of_employees','department': 'sender_department'})

sent_per_user = pd.merge(senders_df, employees_per_dept_df, on='sender_department')
sent_per_user['sent_per_user'] = sent_per_user['num_of_messages'] /sent_per_user['count_of_employees']
sent_per_user = sent_per_user.sort_values(by=['sent_per_user'], ascending = False)

sent_per_user

Unnamed: 0,sender_department,num_of_messages,count_of_employees,sent_per_user
5,Sales,1551,161,9.63354
4,Operations,1013,134,7.559701
0,Admin,857,140,6.121429
2,IT,49,77,0.636364
3,Marketing,16,52,0.307692
1,Engineering,26,100,0.26


## Timeline of messages sent

Now let's see which department was most active in every given time period.

In [110]:
# finding the total time range of the dataset
all_rows.max(axis=0)['timestamp'] 

Timestamp('2021-11-29 07:37:49')

In [111]:
# finding the total time range of the dataset
all_rows.min(axis=0)['timestamp']

Timestamp('2021-06-02 05:41:34')

So the data was recorded between the 2nd of June and 29th of November 2022. Now, I will see the number of messages sent by each department per month, week and day. Starting with month:

### Monthly trends

Let's see the monthly trend for each department:

In [112]:
# Getting the month from the date and deleting rows where timestamp is null
rslt_df = all_rows[all_rows['timestamp'] != 'null']
rslt_df['month_number'] = pd.DatetimeIndex(rslt_df['timestamp']).month
rslt_df['week_number'] = pd.DatetimeIndex(rslt_df['timestamp']).week
rslt_df['day_number'] = pd.DatetimeIndex(rslt_df['timestamp']).strftime('%j')

rslt_df

Unnamed: 0,sender,receiver,timestamp,message_length,receiver_department,receiver_location,receiver_age,sender_department,sender_location,sender_age,month_number,week_number,day_number
0,79,48,2021-06-02 05:41:34,88,IT,France,34,Sales,France,33,6,22,153
1,79,63,2021-06-02 05:42:15,72,Sales,France,38,Sales,France,33,6,22,153
2,79,58,2021-06-02 05:44:24,86,Sales,Germany,40,Sales,France,33,6,22,153
3,79,58,2021-06-03 01:12:11,37,Sales,Germany,40,Sales,France,33,6,22,154
4,79,70,2021-06-02 05:49:07,26,Operations,France,47,Sales,France,33,6,22,153
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3507,1879,1520,2021-11-29 07:27:52,58,Admin,US,45,Engineering,US,40,11,48,333
3508,1879,1543,2021-11-29 07:37:49,17,Operations,US,48,Engineering,US,40,11,48,333
3509,1879,1543,2021-11-29 07:37:49,56,Operations,US,48,Engineering,US,40,11,48,333
3510,1802,1801,2021-10-06 05:43:06,56,Admin,Germany,26,Admin,Germany,46,10,40,279


In [113]:
data_viz_month = rslt_df.groupby(['sender_department','month_number'], as_index=False)['receiver'].count()
data_viz_week = rslt_df.groupby(['sender_department','week_number'], as_index=False)['receiver'].count()
data_viz_day = rslt_df.groupby(['sender_department','day_number'], as_index=False)['receiver'].count()


# plotting the line 2 points 
data_viz_day

Unnamed: 0,sender_department,day_number,receiver
0,Admin,159,7
1,Admin,160,1
2,Admin,161,1
3,Admin,162,22
4,Admin,163,3
...,...,...,...
296,Sales,321,24
297,Sales,322,4
298,Sales,325,2
299,Sales,330,2


In [114]:
list_scode = list(set(data_viz_month['sender_department']))
list_district = list(set(data_viz_month['sender_department']))

import plotly.graph_objects as go

#extract color palette, the palette can be changed
pal = list(sns.color_palette(palette='viridis', n_colors=len(list_scode)).as_hex())

fig = go.Figure()
for d,p in zip(list_district, pal):
    fig.add_trace(go.Scatter(x = data_viz_month[data_viz_month['sender_department']==d]['month_number'],
                             y = data_viz_month[data_viz_month['sender_department']==d]['receiver'],
                             name = d,
                             line_color = p, 
                             fill=None))   #tozeroy
    
fig.update_layout(
    title="Number of Messages per Department per Month",
    xaxis_title="Month Number",
    yaxis_title="Number of Receivers",
    legend_title="Legend"
    )

fig.show()

### Weekly trends

In [115]:
list_scode = list(set(data_viz_week['sender_department']))
list_district = list(set(data_viz_week['sender_department']))

import plotly.graph_objects as go

#extract color palette, the palette can be changed
pal = list(sns.color_palette(palette='viridis', n_colors=len(list_scode)).as_hex())

fig = go.Figure()
for d,p in zip(list_district, pal):
    fig.add_trace(go.Scatter(x = data_viz_week[data_viz_week['sender_department']==d]['week_number'],
                             y = data_viz_week[data_viz_week['sender_department']==d]['receiver'],
                             name = d,
                             line_color = p, 
                             fill=None))   #tozeroy
    
fig.update_layout(
    title="Number of Messages per Department per Week",
    xaxis_title="Week Number",
    yaxis_title="Number of Receivers",
    legend_title="Legend"
    )

fig.show()

From the graphs above, we can see that the sales department was the highest for the most weeks and months. If you want to take a close look, you can press on the legend on the right and see the trend for each department. What would also be interesting to see is the average number of messages per person sent in each department.

# Question 2: Which employee has the most connections?

I will measure the employee with the most connections by 2 metrics:
1. Most sent letters
2. Most received letters

In [116]:
employee = all_rows.receiver.unique()
number_of_letters_received = []
employee_num = []
receivers = list(all_rows['receiver'])
for i in range (0, len(employee)):
    element = employee[i]
    count_people = 0
    count = 0 
    for n in range (0, 3512):
        receiver_count = receivers[n]
        if receiver_count == element:
            count = count + 1
    number_of_letters_received.append(count)
    employee_num.append(element)

most_connections_df = pd.DataFrame()
most_connections_df['number_of_letters_received'] = number_of_letters_received
most_connections_df['employee_num'] = employee_num
most_letters = most_connections_df.max(axis=0)['number_of_letters_received']
most_connections_df[most_connections_df['number_of_letters_received'] == most_letters]

Unnamed: 0,number_of_letters_received,employee_num
30,60,281


## Employee with received

📥 Employee with the most received letters

In [117]:
employees[employees['id'] == 281]

Unnamed: 0,id,department,location,age
110,281,Sales,France,35


In [118]:
employee = all_rows.sender.unique()
number_of_letters_sent = []
employee_num = []
senders = list(all_rows['sender'])
for i in range (0, len(employee)):
    element = employee[i]
    count_people = 0
    count = 0 
    for n in range (0, 3512):
        sender_count = receivers[n]
        if sender_count == element:
            count = count + 1
    number_of_letters_sent.append(count)
    employee_num.append(element)

most_connections_df = pd.DataFrame()
most_connections_df['number_of_letters_sent'] = number_of_letters_sent
most_connections_df['employee_num'] = employee_num
most_letters = most_connections_df.max(axis=0)['number_of_letters_sent']
most_connections_df[most_connections_df['number_of_letters_sent'] == most_letters]

Unnamed: 0,number_of_letters_sent,employee_num
13,37,605


## Employee with most sent

📤 Employee with the most sent

In [119]:
employees[employees['id'] == 605]

Unnamed: 0,id,department,location,age
280,605,Admin,France,31


# Question 3: Identify the most influential departments and employees

I will be defining an 'influential department' by the following parameters - the number of employees in the department, the number of messages that the department sends and receives in total and per employee. 
In order to find the most influential employee, I will find an employee with the the highest number of messages sent, received, as well as sent and received to unique people within the company.

## Influential Department by Number of Employees

In [120]:
employees_per_dept = employees.groupby(['department'], as_index = False)['id'].count()
employees_per_dept_sort = employees_per_dept.sort_values(by = ['id'], ascending = False)
employees_per_dept_sort = employees_per_dept_sort.rename(columns={'id': 'count_of_employees'})

employees_per_dept_sort

Unnamed: 0,department,count_of_employees
5,Sales,161
0,Admin,140
4,Operations,134
1,Engineering,100
2,IT,77
3,Marketing,52


## Ranking by total members
1. 💸 Sales
2. 👸🏽 Admin
3. 🏎 Operations
4. 🛠 Engineering
5. 💻 IT 
6. 🔮 Marketing

Visualising the number of employees from each department as a percentage of the total number of employees

In [121]:
import plotly.express as px
fig = px.pie(employees_per_dept_sort, values='count_of_employees', names='department', title='Number of Employees in Each Department as a % of Total')
fig.update_traces(pull=[0.1, 0, 0, 0])
fig.show()

## Ranking by number of received messages

In [122]:
received_dept = all_rows.groupby(['receiver_department'], as_index = False)['receiver'].count()
received_dept_sort = received_dept.sort_values(by = ['receiver'], ascending = False)
received_dept_sort = received_dept_sort.rename(columns={'receiver': 'number_received'})

received_dept_sort

Unnamed: 0,receiver_department,number_received
5,Sales,1229
4,Operations,845
0,Admin,797
1,Engineering,252
2,IT,249
3,Marketing,140


In [123]:
data_frame = px.data.tips()
pie_chart = px.pie(received_dept_sort,
             names='receiver_department',
             values='number_received',
             title='Number of Messages Received per Department',
             template='gridon')
pie_chart.update_traces(pull=[0.1, 0, 0, 0])
pie_chart.show()

## Ranking by number of received messages per user

In [124]:
received_dept_sort = received_dept_sort.rename(columns={'receiver_department': 'department'})
df1 = pd.merge(received_dept_sort, employees_per_dept, on = 'department')
df1['received_per_user'] = df1['number_received']/df1['id']
df1

Unnamed: 0,department,number_received,id,received_per_user
0,Sales,1229,161,7.63354
1,Operations,845,134,6.30597
2,Admin,797,140,5.692857
3,Engineering,252,100,2.52
4,IT,249,77,3.233766
5,Marketing,140,52,2.692308


## Ranking by the number of sent messages

In [125]:
sender_dept = all_rows.groupby(['sender_department'], as_index = False)['sender'].count()
sent_dept_sort = sender_dept.sort_values(by = ['sender'], ascending = False)
sent_dept_sort = sent_dept_sort.rename(columns={'sender': 'number_sent'})

sent_dept_sort

Unnamed: 0,sender_department,number_sent
5,Sales,1551
4,Operations,1013
0,Admin,857
2,IT,49
1,Engineering,26
3,Marketing,16


In [126]:
data_frame = px.data.tips()
pie_chart = px.pie(sent_dept_sort,
             names='sender_department',
             values='number_sent',
             title='Number of Messages Sent per Department',
             template='gridon')
pie_chart.update_traces(pull=[0.1, 0, 0, 0])
pie_chart.show()

## Ranking by the number of sent messages per user

In [127]:
sent_dept_sort = sent_dept_sort.rename(columns={'sender_department': 'department'})
df2 = pd.merge(sent_dept_sort, employees_per_dept, on = 'department')
df2['sent_per_user'] = df2['number_sent']/df2['id']

df2

Unnamed: 0,department,number_sent,id,sent_per_user
0,Sales,1551,161,9.63354
1,Operations,1013,134,7.559701
2,Admin,857,140,6.121429
3,IT,49,77,0.636364
4,Engineering,26,100,0.26
5,Marketing,16,52,0.307692


## Most influential users in each department

By the number of sent messages:

In [128]:
senders = all_rows.groupby(['sender', 'sender_department'], as_index = False)['timestamp'].count()
senders = senders.rename(columns={'timestamp': 'number_of_messages'})

senders

Unnamed: 0,sender,sender_department,number_of_messages
0,79,Sales,13
1,128,Sales,266
2,144,Sales,221
3,162,Sales,11
4,173,Sales,10
...,...,...,...
80,1800,Admin,4
81,1802,Admin,2
82,1807,Admin,16
83,1879,Engineering,4


Seeing the overall trend in sent messages:

In [129]:
import plotly.express as px
import plotly.graph_objects as go

df = px.data.gapminder()

fig = px.scatter(senders, x="number_of_messages", y="number_of_messages",
	         size="number_of_messages", color="sender_department",
                 hover_name="sender", log_x=True, size_max=60)

fig.show()

So the top senders for each department were:
1. 👸🏽 Admin 
User 605 with 459 sent messages

2. 💸 Sales 
User 128 with 288 sent messages

3. 🏎 Operations 
User 598 with 184 sent messages

4. 💻 IT 
User 221 with 23 sent messages

5. 🔮 Marketing
User 1500 with 12 sent messages

6. 🛠 Engineering
User 516 with 11 sent messages

By the number of received messages:

In [130]:
receivers = all_rows.groupby(['receiver', 'receiver_department'], as_index = False)['timestamp'].count()
receivers = receivers.rename(columns={'timestamp': 'number_of_messages'})

receivers

Unnamed: 0,receiver,receiver_department,number_of_messages
0,3,Operations,11
1,6,Sales,10
2,8,IT,1
3,9,Admin,22
4,12,Operations,12
...,...,...,...
612,1796,IT,2
613,1801,Admin,4
614,1830,Admin,2
615,1839,Admin,8


In [131]:
import plotly.express as px
import plotly.graph_objects as go

df = px.data.gapminder()

fig = px.scatter(receivers, x="number_of_messages", y="number_of_messages",
	         size="number_of_messages", color="receiver_department",
                 hover_name="receiver", log_x=True, size_max=60)

fig.show()

So the top receivers for each department were:
1. 💸 Sales 
User 281 with 60 sent messages

2. 🏎 Operations 
User 704 with 54 sent messages

3. 👸🏽 Admin 
User 454 with 46 sent messages

4. 🛠 Engineering
User 191 with 12 sent messages

4. 💻 IT 
User 48 with 15 sent messages

5. 🔮 Marketing
User 1280 with 11 sent messages

By the number of sent messages to unique users:

In [132]:
senders_people = all_rows.groupby(['sender', 'sender_department'], as_index = False)['receiver'].nunique()
senders_people = senders_people.rename(columns={'receiver': 'number_of_distinct_messages'})

senders_people

Unnamed: 0,sender,sender_department,number_of_distinct_messages
0,79,Sales,11
1,128,Sales,71
2,144,Sales,75
3,162,Sales,5
4,173,Sales,8
...,...,...,...
80,1800,Admin,2
81,1802,Admin,1
82,1807,Admin,4
83,1879,Engineering,2


In [133]:
import plotly.express as px
import plotly.graph_objects as go

fig = px.scatter(senders_people, x="number_of_distinct_messages", y="number_of_distinct_messages",
	         size="number_of_distinct_messages", color="sender_department",
                 hover_name="sender", log_x=True, size_max=60)

fig.show()

So the top sender for each department were:
1. 🏎 Operations 
User 598 with 77 sent messages to distinct receivers

2. 💸 Sales 
User 144 with 75 sent messages to distinct receivers

3. 👸🏽 Admin 
User 605 with 68 sent messages to distinct receivers

3. 💻 IT 
User 221 with 12 sent messages to distinct receivers

5. 🛠 Engineering
User 516 with 9 sent messages to distinct receivers

6. 🔮 Marketing
User 150 with 5 sent messages to distinct receivers

By the number of received messages to unique users:

In [134]:
receivers_people = all_rows.groupby(['receiver', 'receiver_department'], as_index = False)['sender'].nunique()
receivers_people = receivers_people.rename(columns={'sender': 'number_of_distinct_messages'})

receivers_people

Unnamed: 0,receiver,receiver_department,number_of_distinct_messages
0,3,Operations,5
1,6,Sales,3
2,8,IT,1
3,9,Admin,3
4,12,Operations,2
...,...,...,...
612,1796,IT,1
613,1801,Admin,2
614,1830,Admin,1
615,1839,Admin,1


In [135]:
import plotly.express as px
import plotly.graph_objects as go

fig = px.scatter(receivers_people, x="number_of_distinct_messages", y="number_of_distinct_messages",
	         size="number_of_distinct_messages", color="receiver_department",
                 hover_name="receiver", log_x=True, size_max=60)

fig.show()

So the top receivers for each department were:
1. 👸🏽 Admin 
User 194 with 13 received messages from distinct senders

2. 💸 Sales 
User 32 with 11 received messages from distinct senders

3. 💻 IT 
User 105 with 7 received messages from distinct senders

4. 🏎 Operations 
User 598 with 7 received messages from distinct senders

5. 🛠 Engineering
User 482 with 5 received messages from distinct senders

6. 🔮 Marketing
User 701 with 3 received messages from distinct senders

# **Question 4:** Using the network analysis, in which departments would you recommend the HR team focus to boost collaboration?

In [136]:
import plotly
import networkx as nx
from plotly.offline import iplot

A = list(rslt_df["sender"].unique())
B = list(rslt_df["receiver"].unique())
node_list = list(set(A+B))
G = nx.Graph()
for i in node_list:
    G.add_node(i)

for i,j in rslt_df.iterrows():
	G.add_edges_from([(j["sender"],j["receiver"])])

pos = nx.spring_layout(G, k=0.5, iterations=50)
for n, p in pos.items():
    G.nodes[n]['pos'] = p

edge_trace = go.Scatter(
    x=[],
    y=[],
    line=dict(width=0.5,color='#888'),
    hoverinfo='none',
    mode='lines')
for edge in G.edges():
    x0, y0 = G.nodes[edge[0]]['pos']
    x1, y1 = G.nodes[edge[1]]['pos']
    edge_trace['x'] += tuple([x0, x1, None])
    edge_trace['y'] += tuple([y0, y1, None])
node_trace = go.Scatter(
    x=[],
    y=[],
    mode='markers',
    #hoverinfo='text',
    marker=dict(
        showscale=True,
        colorscale='RdBu',
        reversescale=True,
        color=[],
        size=15,
        colorbar=dict(
            thickness=10,
            title='Number of connections',
            xanchor='left',
            titleside='right'
        ),
        line=dict(width=0)))
for node in G.nodes():
    x, y = G.nodes[node]['pos']
    node_trace['x'] += tuple([x])
    node_trace['y'] += tuple([y])

for node, adjacencies in enumerate(G.adjacency()):
    node_trace['marker']['color']+=tuple([len(adjacencies[1])])
    
fig = go.Figure(data=[edge_trace, node_trace],
             layout=go.Layout(
                title='Employee connections ',
                titlefont=dict(size=16),
                showlegend=False,
                hovermode='closest',
                margin=dict(b=20,l=5,r=5,t=40),
                annotations=[ dict(
                    text="No. of connections",
                    showarrow=False,
                    xref="paper", yref="paper") ],
                xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
                yaxis=dict(showgrid=False, zeroline=False, showticklabels=False)))
iplot(fig)

## In my opinion, the following groups should receive additional aid from the HR team:
1. **Marketing department** 
This department has the smallest number of employees - 52, yet it also has the second smallest number of messages per user (0.3), this means that, one average, the members of the marketing department are least collaborative when compared to their peers accross other departments. This is why the marketing department would require HR help to increase collaboration.

2. **Engineering department** 
This department has the smallest number of messages sent per user, and also is one of the smallest departments with 100 people (putting it 4th). Same as the marketing department, the Engineering department would require HR help. We can also draw a conclusion that the smaller the department is, in general, the less active it is. This could be due to the logical consequence of employees having fewer others in their department to message, but it could also be due to external factors, such as differences in timing of the implementation of the communcation system.

3. **Marketing leaders (perhaps project managers)**
If we look at the leaders in each department, the marketing department's leader has the smallest number of sent messages to distinct users - 5. The leader recipient has the lowest number of received messages from distinct senders among other leaders - 3. This means that despite the small size of the marketing department, it's leaders are still not as active as their colleagues. Therefore, HR could conduct trainings specifically for the leaders within the marketing department to boost collaboration. 

4. **Engineering leaders (perhapshead of engineering and project managers)**
The engineering department's leaders would also benefit from HR intervention, as they are currently in 5th place by the number of messages sent to/received from distinct users.

5. **All**
From the time series, we can see that since August (8th month), collaboration has declined accross all departments rapidly. This could be due to there curretly being less projects that require communication, or due to collaboration issues. In any case, the HR should investigate this decline and take action on it.