<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# LinkedIn - Get posts stats from profile
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/LinkedIn/LinkedIn_Get_posts_stats_from_profile.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/open_in_naas.svg"/></a>

**Tags:** #linkedin #profile #post #stats #naas_drivers #content #automation #csv

**Author:** [Florent Ravenel](https://www.linkedin.com/in/florent-ravenel/)

With this notebook, you can get post stats from any profile in LinkedIn.<br>
A dataframe will be returned and saved in CSV on your local.<br><br>
**Available columns :**
- **ACTIVITY_ID:** Post unique ID.
- **PAGINATION_TOKEN:** Token used to decode published date.
- **PUBLISHED_DATE:** When the post has been published.
- **AUTHOR_NAME:** Name of post author.
- **SUBDESCRIPTION:** Subdescription of post (Time since published).
- **TITLE:** First sentence of post.
- **TEXT:** Content of post.
- **CHARACTER_COUNT:** Number of characters in the post.  
- **TAGS:** List of the hashtags. 
- **TAGS_COUNT:** Number of hashtags.
- **EMOJIS:** List of emojis.
- **EMOJIS_COUNT:** Number of emojis.
- **LINKS:** Links used in post.
- **LINKS_COUNT:** Number of links.
- **PROFILE_MENTION:** People mentioned in post. 
- **COMPANY_MENTION:** Companies mentioned in post.
- **CONTENT:** Type of content.
- **CONTENT_TITLE:** Type of post content.
- **CONTENT_URL:** Title of content.
- **CONTENT_ID:** ID of content.
- **IMAGE_URL:** Image URL linked in post.
- **POLL_ID:** Poll unique ID.
- **POLL_QUESTION:** Poll question.
- **POLL_RESULTS:** Poll results.
- **POST_URL:** Post URL.
- **VIEWS:** Amount of people who saw the content (Only available on your post profile).
- **COMMENTS:** Amount of people who wrote something in the comment section.
- **LIKES:** Amount of people who pushed the like (or other reaction) button.
- **SHARES:** Amount of people who shared the content.
- **ENGAGEMENT_SCORE:** Ratio between views and likes/comments (It will be at 0 if you are not the author of the post).
- **DATE_EXTRACT:** Date of last extraction.

## Input

### Import libraries

In [None]:
from naas_drivers import linkedin
import pandas as pd
from datetime import datetime
import naas

### Setup LinkedIn
<a href='https://www.notion.so/LinkedIn-driver-Get-your-cookies-d20a8e7e508e42af8a5b52e33f3dba75'>How to get your cookies ?</a>

In [None]:
# LinkedIn cookies
LI_AT = "ENTER_YOUR_COOKIE_HERE" # EXAMPLE : "AQFAzQN_PLPR4wAAAXc-FCKmgiMit5FLdY1af3-2"
JSESSIONID = "ENTER_YOUR_JSESSIONID_HERE" # EXAMPLE : "ajax:8379907400220387585"

# LinkedIn profile url
PROFILE_URL = "ENTER_YOUR_LINKEDIN_PROFILE_HERE" # EXAMPLE "https://www.linkedin.com/in/myprofile/"

# The first execution all posts will be retrieved.
# Then, you can use the parameter below to setup the number of posts you want to retrieved from LinkedIn API everytime this notebook is run.
NO_POSTS_RETRIEVED = 10

### Setup Outputs
Create CSV to store your posts stats.<br>
PS: This CSV could be used in others LinkedIn templates.

In [None]:
# Custom path of your CSV with the profile URL
profile_id = PROFILE_URL.split("https://www.linkedin.com/in/")[-1].split("/")[0]
csv_output = f"LINKEDIN_POSTS_{profile_id}.csv"

### Setup Naas scheduler
Schedule your notebook with the naas scheduler feature

In [None]:
# the default settings below will make the notebook run everyday at 8:00
# for information on changing this setting, please check https://crontab.guru/ for information on the required CRON syntax 
naas.scheduler.add(cron="0 8 * * *")

# to de-schedule this notebook, simply run the following command: 
# naas.scheduler.delete()

## Model

### Get your posts from CSV
All your posts will be stored in CSV.

In [None]:
def read_csv(file_path):
    try:
        df = pd.read_csv(file_path)
    except FileNotFoundError as e:
        # Empty dataframe returned
        return pd.DataFrame()
    return df

df_posts = read_csv(csv_output)
df_posts

### Update last posts
It will get the last X posts from LinkedIn API (X = number of set in variable "NO_POSTS_RETRIEVED") and update it in your CSV.<br>
PS: On the first execution all posts will be retrieved.

In [None]:
def update_last_posts(df_posts, key="POST_URL", no_posts=10):
    # Init output
    df_update = pd.DataFrame()
    
    # Init df posts is empty then return entire database
    if len(df_posts) > 0:
        # If df posts not empty get the last X posts (new and already existing)
        df_update = linkedin.connect(LI_AT, JSESSIONID).profile.get_posts_feed(PROFILE_URL,
                                                                               limit=no_posts,
                                                                               sleep=False)
    else:
        df_update = linkedin.connect(LI_AT, JSESSIONID).profile.get_posts_feed(PROFILE_URL,
                                                                               limit=-1)
    # Concat and add extract date
    df = pd.concat([df_update, df_posts]).drop_duplicates(key, keep="first")
    df["DATE_EXTRACT"] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")        
    # Return only last post retrieved
    return df.reset_index(drop=True)
    
df_update = update_last_posts(df_posts,
                              no_posts=NO_POSTS_RETRIEVED)
df_update

## Output

### Save dataframe in CSV and send to production

In [None]:
# Save dataframe in CSV
df_update.to_csv(csv_output, index=False)

# Send CSV to production (It could be used with other scripts)
naas.dependency.add(csv_output)