## Visualize your LinkedIn Network with Python

- LinkedIn is an awesome place to connect with all kinds of people from various backgrounds. It’s the place where networking happens.
- As an aspiring Data Scientist, I personally use LinkedIn to connect with Data Scientists around the world and find out what their day-to-day looks like, what tools they use, what problems they’re solving, etc. to learn more about the field.

- I’ve been using LinkedIn for quite some time now, and connected with quite a few people, and I’ve always been curious about how my connections “look” like, especially what companies they are from, what positions they hold, etc.

- On LinkedIn, you only see a list of your connections, so it’s hard to visualize the entire network of your connections.

**What’s the best way we can visualize networks? Using graphs!**

Graphs are one of the most important data structures that is applied in many real-world events. One of them is social networks.
In our case, it won’t be a complex network like the one you see above.
What we’ll be doing is creating a network that connects you to all the companies from your connections.

### Download data

- First, we need the data.
- Here’s a step-by-step guide for getting a copy of your data on LinkedIn:
- Click on your Me drop down in the homepage
- Head over to “Settings & Privacy”
- Click on “Get a copy of your data”

## Install Dependecies

In [None]:
!pip install pyjanitor pyvis --quiet

In [None]:
import pandas as pd
import janitor
import datetime

from IPython.core.display import display, HTML
from pyvis import network as net
import networkx as nx

In [None]:
data = pd.read_csv("Connections.csv", skiprows=2)

In [None]:
data.info()

## Cleaning Data

In [None]:
df = (
    data.clean_names() # remove spacing and capitalization
    .drop(columns=['first_name', 'last_name', 'email_address']) # drop for privacy
    .dropna(subset=['company', 'position']) # drop missing values in company and position
    .to_datetime('connected_on', format='%d %b %Y')
  )
df.head()

## EDA

In [None]:
df['company'].value_counts().head(10).plot(kind="barh").invert_yaxis();

In [None]:
df['position'].value_counts().head(10).plot(kind="barh").invert_yaxis();

In [None]:
df['connected_on'].hist(xrot=35, bins=15);

##### Remove freelance and self-employed titles

In [None]:
pattern = "freelance|self-employed"
df = df[~df['company'].str.contains(pattern, case=False)]

#### Aggregate sum of connections for companies

In [None]:
df_company = df['company'].value_counts().reset_index()
df_company.columns = ['company', 'count']
df_company = df_company.sort_values(by="count", ascending=False)
df_company.head(10)

#### Aggregate sum of connections for positions

In [None]:
df_position = df['position'].value_counts().reset_index()
df_position.columns = ['position', 'count']
df_position = df_position.sort_values(by="count", ascending=False)
df_position.head(10)

#### Example of simple network

In [None]:
nt = net.Network(notebook=True)

g = nx.Graph()
g.add_node(0, label = "root") # intialize yourself as central node
g.add_node(1, label = "Company 1", size=10, title="info1")
g.add_node(2, label = "Company 2", size=40, title="info2")
g.add_node(3, label = "Company 3", size=60, title="info3")
g.add_edge(0, 1)
g.add_edge(0, 2)
g.add_edge(0, 3)

nt.from_nx(g)
nt.show('nodes.html')
# display(HTML('nodes.html'))

In [None]:
print(f"number of nodes: {g.number_of_nodes()}")
print(f"number of edges: {g.number_of_edges()}")

In [None]:
for _, row in df_company.head(5).iterrows():
    print(row['company'] + "-" + str(row['count']))

In [None]:
print(df_company.shape)
df_company_reduced = df_company.loc[df_company['count']>=5]
print(df_company_reduced.shape)

In [None]:
print(df_position.shape)
df_position_reduced = df_position.loc[df_position['count']>=5]
print(df_position_reduced.shape)

In [None]:
# initialize graph
g = nx.Graph()
g.add_node('root') # intialize yourself as central

# use iterrows tp iterate through the data frame
for _, row in df_company_reduced.iterrows():
    # store company name and count
    company = row['company']
    count = row['count']

    title = f"<b>{company}</b> – {count}"
    positions = set([x for x in df[company == df['company']]['position']])
    positions = ''.join('<li>{}</li>'.format(x) for x in positions)

    position_list = f"<ul>{positions}</ul>"
    hover_info = title + position_list

    g.add_node(company, size=count*2, title=hover_info, color='#3449eb')
    g.add_edge('root', company, color='grey')

# generate the graph
nt = net.Network(height='700px', width='700px', bgcolor="black", font_color='white')
nt.from_nx(g)
nt.hrepulsion()
# more customization https://tinyurl.com/yf5lvvdm
nt.show('company_graph.html')
# display(HTML('company_graph.html'))

In [None]:
# initialize graph
g = nx.Graph()
g.add_node('root') # intialize yourself as central

# use iterrows tp iterate through the data frame
for _, row in df_position_reduced.iterrows():
    count = f"{row['count']}"
    position= row['position']
  
    g.add_node(position, size=count, color='#3449eb', title=count)
    g.add_edge('root', position, color='grey')

# generate the graph
nt = net.Network(height='700px', width='700px', bgcolor="black", font_color='white')
nt.from_nx(g)
nt.hrepulsion()
# more customization https://tinyurl.com/yf5lvvdm
nt.show('position_graph.html')
# display(HTML('position_graph.html'))

Thank you