<img width="8%" alt="LinkedIn.png" src="https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/.github/assets/logos/LinkedIn.png" style="border-radius: 15%">

# LinkedIn - Get profile posts stats

**Tags:** #linkedin #profile #post #stats #naas_drivers #content #automation #picke

**Author:** [Florent Ravenel](https://www.linkedin.com/in/florent-ravenel/)

**Description:** This notebook fetches your profile's post statistics from LinkedIn and stores them in a pickle file.


<div class="alert alert-info" role="info" style="margin: 10px">
<b>Disclaimer:</b><br>
This code is in no way affiliated with, authorized, maintained, sponsored or endorsed by Linkedin or any of its affiliates or subsidiaries. It uses an independent and unofficial API. Use at your own risk.

This project violates Linkedin's User Agreement Section 8.2, and because of this, Linkedin may (and will) temporarily or permanently ban your account. We are not responsible for your account being banned.
<br>
</div>

## Input

### Import libraries

In [1]:
from naas_drivers import linkedin
import naas
import os
from datetime import date, timedelta, datetime
import pandas as pd
import naas_data_product

✅ utils file '/home/ftp/abi/utils/data.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/naas_chat_plugin.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/naas_lab.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/openai.ipynb' successfully loaded.


### Setup variables
**Inputs**
- `li_at`: Cookie used to authenticate Members and API clients.
- `JSESSIONID`: Cookie used for Cross Site Request Forgery (CSRF) protection and URL signature validation.
- `linkedin_url`: This variable represents the LinkedIn profile URL.
- `limit`: Date limit.
- `force_update`: Boolean to force update.

**Outputs**
- `output_dir`: Output directory
- `file_name`: Name of the file to be saved in your local.

In [2]:
# Inputs
li_at = naas.secret.get("LINKEDIN_LI_AT") or "YOUR_LINKEDIN_LI_AT"
JSESSIONID = naas.secret.get("LINKEDIN_JSESSIONID") or "YOUR_LINKEDIN_JSESSIONID"
linkedin_url = pload(os.path.join(naas_data_product.OUTPUTS_PATH, "entity"), "linkedin_url") or "YOUR_LINKEDIN_URL"
limit = date.today() - timedelta(days=date.today().weekday() + 7)
force_update = False

# Outputs
output_dir = os.path.join(naas_data_product.OUTPUTS_PATH, "content-engine", date.today().isoformat())
file_name = "linkedin_posts"

## Model

### Get posts from LinkedIn

In [3]:
def get_posts(
    li_at,
    JSESSIONID,
    linkedin_url,
    limit=None,
    force_update=False,
):
    # Init
    df = pd.DataFrame()
    
    # Get posts
    i = 1
    pagination_token = None
    while True:
        # Requests from LinkedIn API
        tmp_df = linkedin.connect(li_at, JSESSIONID).profile.get_posts_feed(
            linkedin_url,
            pagination_token=pagination_token,
            limit=1,
            sleep=False
        )
        title = tmp_df.loc[0, "TITLE"]
        pagination_token = tmp_df.loc[0, "PAGINATION_TOKEN"]
        published_date = tmp_df.loc[0, "PUBLISHED_DATE"]
        post_url = tmp_df.loc[0, "POST_URL"]
        
        # Check if published date > limit
        datetime_obj = datetime.strptime(published_date, "%Y-%m-%d %H:%M:%S%z").date()
        if limit > datetime_obj:
            break
         
        # Concat df
        print(f"{i} - ✅ '{title}' published on {published_date} ({post_url})")
        df = pd.concat([df, tmp_df])
        i += 1 # Count
    return df.reset_index(drop=True)

# Load post from picke file
df_posts = pload(output_dir, file_name)

# Get posts from LinkedIn
if df_posts is None or force_update:
    df_posts = get_posts(
        li_at,
        JSESSIONID,
        linkedin_url,
        limit=limit,
        force_update=force_update,
    )
    
print('✍️ Posts:', len(df_posts))
df_posts.head(1)

1 - ✅ 'Good old days stories: ' published on 2023-12-15 14:42:28+0100 (https://www.linkedin.com/feed/update/urn:li:activity:7141422260812210176)
✍️ Posts: 1


Unnamed: 0,ACTIVITY_ID,PAGINATION_TOKEN,PUBLISHED_DATE,AUTHOR_NAME,AUTHOR_URL,SUBDESCRIPTION,TITLE,TEXT,CHARACTER_COUNT,TAGS,...,POLL_ID,POLL_QUESTION,POLL_RESULTS,POST_URL,VIEWS,COMMENTS,LIKES,SHARES,ENGAGEMENT_SCORE,DATE_EXTRACT
0,7141422260812210176,dXJuOmxpOmFjdGl2aXR5OjcxNDE0MjIyNjA4MTIyMTAxNz...,2023-12-15 14:42:28+0100,Travis Oliphant,https://www.linkedin.com/in/ACoAAACOHgYBT-RIKF...,16h •,Good old days stories:,"Good old days stories: \n\nAs a child, after t...",289,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,4,36,0,0,2023-12-16 07:41:50


## Output

### Save data

In [5]:
pdump(output_dir, df_posts, file_name)