# LinkedIn Network Analysis
Author: [Richard Cornelius Suwandi](https://github.com/richardcsuwandi)

As an active user on [LinkedIn](https://www.linkedin.com/in/richardcsuwandi/) with more than 1000 connections, I was curious about the statistics of my connections. In this project, I will utilize exploratory analysis with data visualizations to gain insights from my own LinkedIn data.

## Data Preparation
First, let's import the necessary libraries for this project:

In [None]:
# Import the libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

Next, we can load the data that is already downloaded as a `.csv` file. To download your own data, you can go [here](https://www.linkedin.com/help/linkedin/answer/50191/downloading-your-account-data?lang=en)

Note: Due to privacy issues, the data that is shown in this project might be slightly different from the original data.

In [None]:
# Load the data
df = pd.read_csv("../input/dataset/linkedin_data.csv")
df.head(10)

The DataFrame above displays only my 10 latest connections on LinkedIn. The `Connected On` column indicates the date that I connect to that person.

## Date Connected

Let's take a closer look on the `Connected On` column. But before that, we need to convert that column into a datetime format .

In [None]:
# Convert the 'Connected On' column to datetime format
df["Connected On"] = pd.to_datetime(df["Connected On"])
df["Connected On"]

Now, we can visualize the number of connections on a given date using Plotly's line plot.

In [None]:
# Create a line plot to visualize the number of connections on a given date
fig1 = px.line(df.groupby(by="Connected On").count().reset_index(), 
                           x="Connected On", 
                           y="First Name", 
                           labels={"First Name": "Count"},
                           title="My Connections")
fig1.show()

From the line plot above, we can see that there is a peak in the number of connections per day on 26 August 2020. It also seems that August 2020 is the period when I was the most active on LinkedIn.

## Company

> Which companies/organizations do the people in my network mainly come from?

To answer that question, we need to first group and sort the data based on the companies

In [None]:
# Group and sort the data by company 
df_by_company = df.groupby(by="Company").count().reset_index().sort_values(by="First Name", ascending=False).reset_index(drop=True)
df_by_company

Now that we have our data grouped and sorted based on the companies, we can visualize it using Plotly's bar plot

In [None]:
# Create a bar plot for the top companies
fig = px.bar(df_by_company[:20],
             x="Company",
             y="First Name",
             labels={"First Name": "Count"},
             title="Top 20 Companies/Organizations in my Connections")
fig

It worked just fine, but perhaps Plotly's [treemap](https://plotly.com/python/treemaps/) will do a better job in visualizing the companies in this case. 

In [None]:
# Create a treemap for the top companies
fig = px.treemap(df_by_company[:100], path=["Company", "Position"],
                 values="First Name",
                 labels={"First Name": "Count"})
fig

Using the treemap above, it is easier to compare the proportion of one company/organization to the others. It looks like the largest proportion of my network are from my university.

## Position

> What are the top common positions of people in my network?

To answer that question, we can create similar visualizations for the `Position` column

In [None]:
# Group and sort the data by position 
df_by_position = df.groupby(by="Position").count().reset_index().sort_values(by="First Name", ascending=False).reset_index(drop=True)
df_by_position

In [None]:
# Create a bar plot for the top 20 positions
fig = px.bar(df_by_position[:20],
             x="Position",
             y="First Name",
             labels={"First Name": "Count"},
             title="Top 20 Positions in my Connections")
fig

In [None]:
# Create a treemap for the top 100 positions
fig = px.treemap(df_by_position[:100], path=["Position", "Company"],
                 values="First Name",
                 labels={"First Name": "Count"})
fig

Wow! I didn't expect to see that many data scientists in my network, followed by machine learning engineers and data analysts. It is great to know that the top common positions in my network are my target group for networking.

In [None]:
# Find all positions that contains 'Data Scientist'
df["Position"].str.contains("Data Scientist").sum()