<img width="8%" alt="LinkedIn.png" src="https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/.github/assets/logos/LinkedIn.png" style="border-radius: 15%">

# LinkedIn - Send Fine-tuned dataset from posts to Google Sheets

**Tags:** #linkedin #profile #post #stats #naas_drivers #content #automation #gsheet

**Author:** [Maxime Jublou](https://www.linkedin.com/in/maximejublou/)

**Last update:** 2023-11-14 (Created: 2023-11-14)

**Description:** This notebook fetches your profile's post statistics from LinkedIn, transform them into a dataset to be fine-tuned with an LLM model.


<div class="alert alert-info" role="info" style="margin: 10px">
<b>Disclaimer:</b><br>
This code is in no way affiliated with, authorized, maintained, sponsored or endorsed by Linkedin or any of its affiliates or subsidiaries. It uses an independent and unofficial API. Use at your own risk.

This project violates Linkedin's User Agreement Section 8.2, and because of this, Linkedin may (and will) temporarily or permanently ban your account. We are not responsible for your account being banned.
<br>
</div>

### Import libraries

In [7]:
from naas_drivers import linkedin, gsheet
import naas
import pickle
import openai
import json
from tqdm import tqdm
import pandas as pd
import time
from IPython import display
import naas_data_product

### Setup variables
**Mandatory**

[Learn how to get your cookies on LinkedIn](https://www.notion.so/LinkedIn-driver-Get-your-cookies-d20a8e7e508e42af8a5b52e33f3dba75)
- `li_at`: Cookie used to authenticate Members and API clients.
- `JSESSIONID`: Cookie used for Cross Site Request Forgery (CSRF) protection and URL signature validation.
- `linkedin_url`: This variable represents the LinkedIn profile URL.
- `openai.api_key`: Connect to OpenAI with the API key.

**Optional**
- `limit`: number of posts to be retrieved.

In [11]:
# Mandatory
li_at = naas.secret.get("LINKEDIN_LI_AT") or "YOUR_LINKEDIN_LI_AT" #example: AQFAzQN_PLPR4wAAAXc-FCKmgiMit5FLdY1af3-2
JSESSIONID = naas.secret.get("LINKEDIN_JSESSIONID") or "YOUR_LINKEDIN_JSESSIONID" #example: ajax:8379907400220387585
linkedin_url = "https://www.linkedin.com/in/jeremyravenel/"  # EXAMPLE "https://www.linkedin.com/in/XXXXXX/"
openai.api_key = naas.secret.get("OPENAI_API_KEY")
spreadsheet_url = "https://docs.google.com/spreadsheets/d/1wediMdHcq5WDqLMZ7ryNrcPxCmlX8BX4ZEl3JNWT8wg/edit#gid=0"
sheet_name = "Jeremy"
avatar_owner = "Jeremy Ravenel"

# Optional
output_dir = os.path.join(naas_data_product.OUTPUTS_PATH, "ai-characters", avatar_owner.lower().replace(" ", "_"))
os.makedirs(output_dir, exist_ok=True)
limit = 50

## Model

### Pickel functions

#### Pickle dump

In [15]:
def pdump(
    output_dir,
    object_to_dump,
    file_to_dump_to
):
    file_path = os.path.join(output_dir, f'{file_to_dump_to}.pickle')
    with open(file_path, 'wb') as file:
        pickle.dump(object_to_dump, file)

#### Pickle load

In [16]:
def pload(
    output_dir,
    file_to_load_from
):
    file_path = os.path.join(output_dir, f'{file_to_load_from}.pickle')
    try:
        with open(file_path, 'rb') as file:
            return pickle.load(file)
    except:
        return None

### Get posts

In [20]:
posts = pload(output_dir, 'posts')

if posts is None:
    posts = linkedin.connect(li_at, JSESSIONID).profile.get_posts_feed(linkedin_url, limit=limit, sleep=False)
    pdump(output_dir, posts, 'posts')

posts

Unnamed: 0,ACTIVITY_ID,PAGINATION_TOKEN,PUBLISHED_DATE,AUTHOR_NAME,AUTHOR_URL,SUBDESCRIPTION,TITLE,TEXT,CHARACTER_COUNT,TAGS,...,POLL_ID,POLL_QUESTION,POLL_RESULTS,POST_URL,VIEWS,COMMENTS,LIKES,SHARES,ENGAGEMENT_SCORE,DATE_EXTRACT
0,7130321324874309633,dXJuOmxpOmFjdGl2aXR5OjcxMzAzMjEzMjQ4NzQzMDk2Mz...,2023-11-14 23:31:18+0100,Jérémy Ravenel,https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKH...,17h •,What if GPTs are a distraction on what OpenAI ...,What if GPTs are a distraction on what OpenAI ...,192,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,9,15,0,0,2023-11-15 16:35:31
1,7129977164090785793,dXJuOmxpOmFjdGl2aXR5OjcxMjk5NzcxNjQwOTA3ODU3OT...,2023-11-14 00:43:44+0100,Jérémy Ravenel,https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKH...,1d •,AI Agents vs Assistants vs Avatars. What's the...,AI Agents vs Assistants vs Avatars. What's the...,1044,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,3,17,0,0,2023-11-15 16:35:31
2,7129599084172054528,dXJuOmxpOmFjdGl2aXR5OjcxMjk1OTkwODQxNzIwNTQ1Mj...,2023-11-12 23:41:23+0100,Jérémy Ravenel,https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKH...,2d •,"Today, I experienced something quite remarkabl...","Today, I experienced something quite remarkabl...",1129,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,3,35,0,0,2023-11-15 16:35:32
3,7129182281679695872,dXJuOmxpOmFjdGl2aXR5OjcxMjkxODIyODE2Nzk2OTU4Nz...,2023-11-11 20:05:09+0100,Jérémy Ravenel,https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKH...,3d •,Let's fight those who want to capture the AI m...,Let's fight those who want to capture the AI m...,177,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,1,14,0,0,2023-11-15 16:35:33
4,7128864186863886337,dXJuOmxpOmFjdGl2aXR5OjcxMjg4NjQxODY4NjM4ODYzMz...,2023-11-10 23:01:10+0100,Jérémy Ravenel,https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKH...,4d •,📩 I have started building my first GPT to clea...,📩 I have started building my first GPT to clea...,1948,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,31,82,3,0,2023-11-15 16:35:34
5,7128434418733473792,dXJuOmxpOmFjdGl2aXR5OjcxMjg0MzQ0MTg3MzM0NzM3OT...,2023-11-09 18:33:25+0100,Jérémy Ravenel,https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKH...,5d •,Don't put all your eggs in the same basket. Th...,Don't put all your eggs in the same basket. Th...,1790,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,25,49,2,0,2023-11-15 16:35:35
6,7128354439509217280,dXJuOmxpOmFjdGl2aXR5OjcxMjgzNTQ0Mzk1MDkyMTcyOD...,2023-11-09 13:15:36+0100,Jérémy Ravenel,https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKH...,6d •,"Since we launched our open source program, mor...","Since we launched our open source program, mor...",595,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,0,9,0,0,2023-11-15 16:35:36
7,7128134385198891008,dXJuOmxpOmFjdGl2aXR5OjcxMjgxMzQzODUxOTg4OTEwMD...,2023-11-08 22:41:11+0100,Jérémy Ravenel,https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKH...,6d •,"🤪 Tired of the ""50 tricks on how to use chatGP...","🤪 Tired of the ""50 tricks on how to use chatGP...",2128,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,9,28,1,0,2023-11-15 16:35:37
8,7127751209163067392,dXJuOmxpOmFjdGl2aXR5OjcxMjc3NTEyMDkxNjMwNjczOT...,2023-11-07 21:18:35+0100,Jérémy Ravenel,https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKH...,1w •,What was the big missing ingredient of yesterd...,What was the big missing ingredient of yesterd...,936,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,14,42,2,0,2023-11-15 16:35:37
9,7127410400408526848,dXJuOmxpOmFjdGl2aXR5OjcxMjc0MTA0MDA0MDg1MjY4ND...,2023-11-06 22:44:20+0100,Jérémy Ravenel,https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKH...,1w •,Prompting is plumbing in AI. It just became ev...,Prompting is plumbing in AI. It just became ev...,2102,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,5,28,0,0,2023-11-15 16:35:38


### Data set preparation

In [21]:
def get_full_name(sentence):
    response = openai.ChatCompletion.create(
        model="gpt-4",
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": """
                    The user will pass you a sentence with the name of the person. You must output the full name of the person. NOTHING ELSE.
                    
                    ```example_user_message
                    "Hello my name is John Doe 🚀"
                    ```
                    
                    ```example_answer
                    John Doe
                    ```
                    """
            
            },
            {
                "role": "user",
                "content": sentence
            }
        ]
    )
    
    return response['choices'][0]['message']['content']

full_name = pload(output_dir, 'full_name')

if full_name is None:
    full_name = get_full_name(posts.iloc[0].AUTHOR_NAME)
    pdump(output_dir, full_name, 'full_name')

full_name

'Jérémy Ravenel'

### Generate questions that should lead to post generation

In [22]:
def generate_question_from_text(text, prompt, text_model):
    response = openai.ChatCompletion.create(
        model=text_model,
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": prompt
            },
            {
                "role": "user",
                "content": text
            }
        ]
    )
    
    return response['choices'][0]['message']['content']

    
question_prompt = f"""
    You are a highly skilled AI trained in language comprehension and creation of prompt that could lead to the text provided by the user.
    I would like you to read the following text and create the prompt that would lead to the LLM generate that specific text.
    The texts being provided are post from LinkedIn.
    Please avoid unnecessary details or tangential points.
    
    ```instructions
    WRITE IN THE LANGUAGE THE TEXT IS IN
    ```
"""

texts = [p.TEXT for _, p in posts.iterrows()]
questions = pload(output_dir, 'questions')

if questions is None:
    questions = [generate_question_from_text(text, question_prompt, "gpt-4") for text in tqdm(texts)]
    pdump(output_dir, questions, 'questions')

questions

100%|██████████| 50/50 [06:55<00:00,  8.30s/it]


["As an AI enthusiast, share your thoughts on OpenAI's mission to create AGI and how GPTs fit into this vision. Discuss if you think GPTs might be a distraction from what OpenAI is actually planning next and if there seems to be a missing piece in the puzzle.",
 'As an AI expert, explain the differences between AI Agents, AI Assistants, and AI Avatars, providing definitions and examples for each. Also, discuss their roles in the AI landscape and ask for feedback on your explanation.',
 'Write a LinkedIn post about your recent experience using the voice feature of the ChatGPT app for a 45-minute session. Discuss how the AI assistant helped you structure and articulate your thoughts for an AI Avatar project, emphasizing the step-by-step approach and the importance of focused dialogue. Share your insights on how AI can be a valuable tool not just for generating ideas but for refining and articulating them, and how the voice feature enhances accessibility.',
 'As an AI enthusiast, express 

### Create DataFrame question & answer

In [23]:
dataset = []

for index, q in enumerate(questions):
    dataset.append(
        {
            "question": questions[index],
            "answer": texts[index]
        }
    )
    
final_df = pd.DataFrame(dataset)
final_df

Unnamed: 0,question,answer
0,"As an AI enthusiast, share your thoughts on Op...",What if GPTs are a distraction on what OpenAI ...
1,"As an AI expert, explain the differences betwe...",AI Agents vs Assistants vs Avatars. What's the...
2,Write a LinkedIn post about your recent experi...,"Today, I experienced something quite remarkabl..."
3,"As an AI enthusiast, express your concerns abo...",Let's fight those who want to capture the AI m...
4,Write a LinkedIn post about your experience bu...,📩 I have started building my first GPT to clea...
5,As an AI enthusiast who recently experienced a...,Don't put all your eggs in the same basket. Th...
6,As the head of an open source program that rec...,"Since we launched our open source program, mor..."
7,Write a LinkedIn post explaining the C.I.D.I. ...,"🤪 Tired of the ""50 tricks on how to use chatGP..."
8,Create a LinkedIn post discussing the importan...,What was the big missing ingredient of yesterd...
9,Write a LinkedIn post discussing the importanc...,Prompting is plumbing in AI. It just became ev...


## Output

### Send to Google Sheets spreadsheet

In [24]:
gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=final_df, append=False)

{'insertedRow': 50}