<a href="https://colab.research.google.com/github/rsidorchuk93/text/blob/main/Linkedin_my_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Example of analysing my own Linkedin data

You can request your data in LinkedIn account settings - and then download zip archive with your data files and upload it to Google Drive for analysis

In [3]:
# importing libraries
import pandas as pd
from google.colab import drive
import os

In [6]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [5]:
# define folder address
folder = "/content/drive/My Drive/my_linkedin_data/Basic_LinkedInDataExport_04-01-2023/"

In [7]:
# list files in the folder
os.listdir(folder)

['Registration.csv',
 'Profile.csv',
 'Recommendations_Received.csv',
 'Contacts.csv',
 'Connections.csv',
 'messages.csv']

In [17]:
# Get size of each file
# Get a list of files in the folder
files = os.listdir(folder)

# Loop over the files and get their sizes
for file in files:
    file_path = os.path.join(folder, file)
    file_size = os.path.getsize(file_path)
    print(f'{file}: {file_size} bytes')

Registration.csv: 113 bytes
Profile.csv: 852 bytes
Recommendations_Received.csv: 518 bytes
Contacts.csv: 268176 bytes
Connections.csv: 484849 bytes
messages.csv: 8390482 bytes


In [18]:
# first look at connections
Connections = pd.read_csv(folder + 'Connections.csv', skiprows=3) # need to skip first few rows with data description
Connections

Unnamed: 0,First Name,Last Name,Email Address,Company,Position,Connected On
0,Jey,Riabchuk,,Evisort,Staff DevOps Engineer,01 Apr 2023
1,Trevor,Braun,,Instacart,Senior Data Scientist,01 Apr 2023
2,Jack,Qiao,,"Lowe's Companies, Inc.",Senior Manager Data Science & AI,31 Mar 2023
3,Harsh,Verma,,Microsoft,Applied Scientist II,31 Mar 2023
4,Wei,He,,Outreach,Software Engineer,31 Mar 2023
...,...,...,...,...,...,...
6731,Kozlova,Anna,,Mazars Russia,Recruiter & HR Assistant,14 Jun 2011
6732,Mélanie,"De Baets ""She/her""",,Fasttrack International,Senior Business Consultant,14 Jun 2011
6733,Daria,Brehm,,Concentrix,Sr. Sales Quality Assurance Specialist - Socia...,22 May 2011
6734,Lilia,Bikbova,,esprezo.,"Co-owner, communication coach at esprezo",01 May 2011


**I have ~7K contacts on LinkedIn over last 12 years (2011-2023). Linkedin shows us only information about contacts employer and title, even their profile description, location, and contact info are not included so we can't do detailed analytics**

In [35]:
# messages data
Messages = pd.read_csv(folder + 'messages.csv', parse_dates=['DATE'])
Messages.sort_values(by='DATE')

Unnamed: 0,CONVERSATION ID,CONVERSATION TITLE,FROM,SENDER PROFILE URL,TO,RECIPIENT PROFILE URLS,DATE,SUBJECT,CONTENT,FOLDER
21239,2-YjQ4MWRhMTYtODFiYi01MDI1LWE3ZDgtOWZjNjE4MWQ4...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,Anna Morozova,https://www.linkedin.com/in/amorozova,2011-10-17 18:32:11+00:00,"Interview in McKinsey on Wednesday, 19.10","Dear Anna, I would be grateful to you if yo...",INBOX
21238,2-YjQ4MWRhMTYtODFiYi01MDI1LWE3ZDgtOWZjNjE4MWQ4...,,Anna Morozova,https://www.linkedin.com/in/amorozova,Roman Sidorchuk,https://www.linkedin.com/in/luoman,2011-10-18 13:30:39+00:00,"RE: Interview in McKinsey on Wednesday, 19.10","Рома, привет! официально порекомендовать не мо...",INBOX
21327,2-ZjllMGFkZmItOWI1MC01ZWEzLWIzM2ItZjI3NTNiZWU1...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,LinkedIn Member,,2012-07-16 14:53:41+00:00,,Ты в Росатом идёшь случайно не в ОИК? У меня з...,INBOX
21326,2-ZjllMGFkZmItOWI1MC01ZWEzLWIzM2ItZjI3NTNiZWU1...,,LinkedIn Member,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,2012-07-17 21:13:27+00:00,,"Привет, Нет не в ОИК, но если есть возможно...",INBOX
21325,2-ZjllMGFkZmItOWI1MC01ZWEzLWIzM2ItZjI3NTNiZWU1...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,LinkedIn Member,,2012-07-18 20:00:15+00:00,,"Я поговорил с моим знакомым - он сказал, что в...",INBOX
...,...,...,...,...,...,...,...,...,...,...
4,2-MTc2ZDc1MzktMjFiMi00YzFjLTg3NzgtYzFhYWZmNzM0...,,Jack Qiao,https://www.linkedin.com/in/jack-qiao-95908b149,Roman Sidorchuk,https://www.linkedin.com/in/luoman,2023-03-31 23:16:06+00:00,,Thank you for joining my network and see you a...,INBOX
3,2-MTc2ZDc1MzktMjFiMi00YzFjLTg3NzgtYzFhYWZmNzM0...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,Jack Qiao,https://www.linkedin.com/in/jack-qiao-95908b149,2023-03-31 23:54:43+00:00,,"I actually organize DS/ML community in Miami, ...",INBOX
2,2-MTc2ZDc1MzktMjFiMi00YzFjLTg3NzgtYzFhYWZmNzM0...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,Jack Qiao,https://www.linkedin.com/in/jack-qiao-95908b149,2023-03-31 23:55:35+00:00,,we also have telegram channel - the link is in...,INBOX
1,2-MTc2ZDc1MzktMjFiMi00YzFjLTg3NzgtYzFhYWZmNzM0...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,Jack Qiao,https://www.linkedin.com/in/jack-qiao-95908b149,2023-03-31 23:56:32+00:00,,"I like it overall, quality of life better for ...",INBOX


I have ~21K messages (both received and sent) which translates on average into 3 messages per contact

In [36]:
Messages_sent = Messages[Messages["FROM"]=="Roman Sidorchuk"]
Messages_sent.sort_values(by='DATE')

Unnamed: 0,CONVERSATION ID,CONVERSATION TITLE,FROM,SENDER PROFILE URL,TO,RECIPIENT PROFILE URLS,DATE,SUBJECT,CONTENT,FOLDER
21239,2-YjQ4MWRhMTYtODFiYi01MDI1LWE3ZDgtOWZjNjE4MWQ4...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,Anna Morozova,https://www.linkedin.com/in/amorozova,2011-10-17 18:32:11+00:00,"Interview in McKinsey on Wednesday, 19.10","Dear Anna, I would be grateful to you if yo...",INBOX
21327,2-ZjllMGFkZmItOWI1MC01ZWEzLWIzM2ItZjI3NTNiZWU1...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,LinkedIn Member,,2012-07-16 14:53:41+00:00,,Ты в Росатом идёшь случайно не в ОИК? У меня з...,INBOX
21325,2-ZjllMGFkZmItOWI1MC01ZWEzLWIzM2ItZjI3NTNiZWU1...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,LinkedIn Member,,2012-07-18 20:00:15+00:00,,"Я поговорил с моим знакомым - он сказал, что в...",INBOX
21320,2-ZjllMGFkZmItOWI1MC01ZWEzLWIzM2ItZjI3NTNiZWU1...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,LinkedIn Member,,2012-07-20 21:06:09+00:00,,"Спасибо, Света! То есть в тесте половины пр...",INBOX
21129,2-MGJhNmRhMTQtYzdiNC01ZjI2LWI0YmYtNjk4ZGU5MWQ2...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,Dmitry Polukhin,https://www.linkedin.com/in/dmitrypolukhin,2012-07-30 09:45:10+00:00,,"Привет, Дима! Хотел задать тебе несколько воп...",INBOX
...,...,...,...,...,...,...,...,...,...,...
6,2-MTc2ZDc1MzktMjFiMi00YzFjLTg3NzgtYzFhYWZmNzM0...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,Jack Qiao,https://www.linkedin.com/in/jack-qiao-95908b149,2023-03-31 20:46:12+00:00,,Nice to meet you Jack,INBOX
3,2-MTc2ZDc1MzktMjFiMi00YzFjLTg3NzgtYzFhYWZmNzM0...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,Jack Qiao,https://www.linkedin.com/in/jack-qiao-95908b149,2023-03-31 23:54:43+00:00,,"I actually organize DS/ML community in Miami, ...",INBOX
2,2-MTc2ZDc1MzktMjFiMi00YzFjLTg3NzgtYzFhYWZmNzM0...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,Jack Qiao,https://www.linkedin.com/in/jack-qiao-95908b149,2023-03-31 23:55:35+00:00,,we also have telegram channel - the link is in...,INBOX
1,2-MTc2ZDc1MzktMjFiMi00YzFjLTg3NzgtYzFhYWZmNzM0...,,Roman Sidorchuk,https://www.linkedin.com/in/luoman,Jack Qiao,https://www.linkedin.com/in/jack-qiao-95908b149,2023-03-31 23:56:32+00:00,,"I like it overall, quality of life better for ...",INBOX


I sent ~14K messages

In [44]:
Messages_sent.groupby('TO').size().sort_values()

TO
 Enzo di Taranto                    1
Lena Chukhno🇺🇦                      1
Leendert (Leonard) Bottelberghs     1
Leanne Bradley                      1
Lauren Sheehan                      1
                                   ..
Elliott Star                       52
Oleg Novikov                       54
Elena Sokolova, PhD                61
Olga Boldarieva                    74
Masha Kahn                         74
Length: 3098, dtype: int64

I sent texts to ~3K contacts, maximum about 70 messages to a single contact 

In [58]:
Messages_sent.set_index('DATE').resample('MS').size().sort_values(ascending=False)

DATE
2023-02-01 00:00:00+00:00    1396
2017-09-01 00:00:00+00:00     830
2019-03-01 00:00:00+00:00     650
2017-11-01 00:00:00+00:00     508
2023-03-01 00:00:00+00:00     448
                             ... 
2014-09-01 00:00:00+00:00       0
2014-12-01 00:00:00+00:00       0
2015-03-01 00:00:00+00:00       0
2015-04-01 00:00:00+00:00       0
2013-12-01 00:00:00+00:00       0
Length: 139, dtype: int64

On months with very active job search I sent a lot of messages - record was ~1.4K messages in Feb 2023