## Download data

On LinkedIn, in the option ["Settings and Privacy"](https://www.linkedin.com/psettings/member-data) we download a copy of the data, selecting the "Contacts" data. In a few minutes we will receive an email with the link to download the csv file with our data.

In this notebook, I modified the downloaded file  removing the last names and emails of the contacts before uploading.

## Loading the data

We load the data into a Data Frame. These are the first records

In [16]:
#Importamos las librerías que usaremos en el proyecto
import pandas as pd
import plotly.express as px

In [17]:
#cargamos el fichero de conexiones a un dataframe
df = pd.read_csv("ConnectionsInigo.csv",",")
#mostramos los primeros registros
#df.head(10)

  exec(code_obj, self.user_global_ns, self.user_ns)


In [18]:
#descripción del dataframe
#df.describe()

In [19]:
#para trabajar correctamente transformamos el campo Connected On en un campo fecha
df['Connected On'] = pd.to_datetime(df['Connected On'])

In [20]:
# añadimos una columna Año-Mes para hacer agrupaciones por mes del contacto
df['mes_año'] = pd.to_datetime(df['Connected On']).dt.to_period('M').astype(str)

## Visualizations using plotty xpress

We are going to show several graphs to analyze the LinkedIn connections.

### 1. View connections by date

We start with a basic graph with the connections by date

In [21]:
#visualizamos las conexiones por fecha
fig1 = px.line(df.groupby(by="Connected On").count().reset_index(), x="Connected On", y="First Name", 
               labels={"First Name":"Nº de contactos"},
               template="plotly_dark",
               title="Número de conexiones por fecha")
fig1.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(count=2, label="2y", step="year", stepmode="backward"),
            dict(step="all")
        ]),
        bgcolor="#ff0000"
    )
)
fig1.show()

Now we are going to use the histogram, and we will customize the graph, adding buttons and slider to improve the visualization in such a long period. We also do the grouping per month, not per day

In [22]:
fig2 = px.histogram(df, x="Connected On", y="First Name", histfunc="count", title="Histograma", 
                    template="plotly_dark")
fig2.update_traces(xbins_size="M1")
fig2.update_layout(bargap=0.1)
fig2.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(count=2, label="2y", step="year", stepmode="backward"),
            dict(step="all")
        ]),
        bgcolor="#ff0000"
    )
)
fig2.show()

Now we show it again, but with data accumulated by date

In [23]:
fig3 = px.histogram(df, x="Connected On", y="First Name", histfunc="count", title="Histograma", cumulative=True,
                    template="plotly_dark")
fig3.update_traces(xbins_size="M1")
fig3.update_layout(bargap=0.1)
fig3.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(count=2, label="2y", step="year", stepmode="backward"),
            dict(step="all")
        ]),
        bgcolor="#ff0000"
    )
)
fig3.show()

Now we add a bar graph, also customized with buttons to select the date range

In [24]:
#visualizamos las conexiones por fecha
fig4 = px.bar(df.groupby("mes_año").count().reset_index(), x="mes_año", y="First Name", text="First Name",
               labels={"First Name":"Nº de contactos"},
               template="plotly_dark",
               title="Número de conexiones por fecha")
fig4.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(count=2, label="2y", step="year", stepmode="backward"),
            dict(step="all")
        ]),
        bgcolor="#ff0000"
    )
)
fig4.show()

### 2. View contacts by company

We are going to show the contacts by current company, and we will do it with two types of graphs

In [25]:
# En primer lugar agrupamos la información por compañía y ordenamos

In [26]:
df_by_company = df.groupby(by="Company").count().reset_index().sort_values(by="First Name", ascending=False).reset_index(drop=True)

The first chart to display is a bar chart of the top 20 companies. 

And to see another visualization that can provide another vision, we are going to use the Treemap graphic

In [27]:
fig6 = px.treemap(df_by_company[:100], path=["Company"], values="First Name", hover_data=["Company","First Name"], 
                  color="First Name", 
                  template="plotly_dark",
                   labels={"First Name":"Nº de Contactos", "Company":"Empresa"})
fig6.show()

In [28]:
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
import networkx as nx
from pyvis import network as net

# initialize graph
G=nx.Graph()



df_company = df["Company"].value_counts().reset_index()
df_company.columns = ["Company", "Count"]
df_company = df_company.sort_values(by="Count", ascending=False)
df_company.head(10)

G.add_node('me', color='red',size=1000, font_color='white') # intialize yourself as central
df_company_reduced = df_company.loc[df_company['Count']>=3]
df_company_reduced.head()

# use iterrows tp iterate through the data frame
for _, row in df_company_reduced.iterrows():

  # store company name and count
  company = row['Company']
  count = row['Count']

 #print('Loading ',company, ' with ', count, 'edges')# Print graph

  title = f"<b>{company}</b> – {count}"
  positions = set([x for x in df[company == df['Company']]['Position']])
  positions = ''.join('<li>{}</li>'.format(x) for x in positions)

  position_list = f"<ul>{positions}</ul>"
  hover_info = title + position_list

  G.add_node(company,color='red',size=count*300, title=hover_info)

  G.add_edge('me', company, color='grey')


node_color=[]
node_size=[]
for node in G.nodes():
    color = G.nodes[node]['color']
    node_color.append(color)
    size = G.nodes[node]['size']
    node_size.append(size)

#Draw the network digram assigning node_color and node_size using the lists established in the previous step and lavish in your new found network diagram greatness.
fig = plt.figure(figsize=(10,10))
plt.margins(x=0.1, y=0.1)
nx.draw(G, with_labels=True, node_size=node_size, edge_color='white', font_size=12, font_color="whitesmoke")
fig.set_facecolor("#00000F")

ModuleNotFoundError: No module named 'networkx'

### 3. Job analysis

Now we do an analysis of the positions of all the contacts

In [None]:
# agrupamos por puesto
df_by_position = df.groupby(by="Position").count().reset_index().sort_values(by="First Name", ascending=False).reset_index(drop=True)


and now we show the information in graphs. First the bar chart:

In [None]:
#creamos el gráfico de barras y lo mostramos
fig7 = px.bar(df_by_position[:20],
              x="Position", y="First Name", 
              labels={"First Name":"Nº de Contactos", "Position":"Puesto"},
              template="plotly_dark", 
              text='First Name',
              hover_data=["Position", "First Name"], color="First Name",title="Contactos por puesto")

fig7.update_layout(xaxis_tickangle=-45, 
                  yaxis={'visible': True, 'showticklabels': True, 'showgrid': True},
                  font=dict(size=13),
                 )
fig7.show()

And now we show a TreeMap that gives us another view of the same data

In [None]:
fig8 = px.treemap(df_by_position[:100], path=["Position"], values="First Name",
                  hover_data=["Position", "First Name"], color="First Name",
                  template="plotly_dark", 
                  labels={"First Name":"Nº de contactos"}, title="Mapa de contactos por posición")
fig8.show()

### 4. Analysis of contact Companies

We will do an analysis of the Companies of the LinkedIn contacts showing them in a TreeMap

In [None]:
#agrupamos y ordenamos
df_by_Name = df.groupby(by="Company").count().sort_values(by="Position", ascending=False).reset_index()


In [None]:
#Mostramos un TreeMap con los nombres
fig9 = px.treemap(df_by_Name[:100], path=["Company"], values="Position", hover_data=["Company", "Position"], template="plotly_dark",color="Company", labels={"Company":"Nº de puestos"}, title="Empresas por Puestos")
fig9.show()

### 5. Word cloud

We will do a graphic analysis by words

In [None]:
from wordcloud import WordCloud 
import matplotlib.pyplot as plt #Import matplotlib library 
import seaborn as sns


text = ' '.join(map(str, df['Company'].values)) 

# stopwords = ['Inc', 'Global', 'Ltd', 'Self', 'Services', 'hiring', 'Club', 'Training', 'Technologies', 'Pvt', 'Limited', 'Project', 'Career', 'Education', 'Solutions', 'LLC', 'Private', 'College', 'Wholesale', 'University', 'India', 'Group']
wrd = WordCloud(background_color='black', margin=0, collocations=False)

# for sw in stopwords:
#     wrd.stopwords.add(sw)
wordcloud = wrd.generate(text)

plt.figure(figsize=(30,15))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()

Made with ❤️ by [Iñigo Jiménez](https://www.linkedin.com/in/inigojimenez) based on the project by [Sergio Pereira](https://www.linkedin.com/in/sergiopereiralema)