<img width="8%" alt="LinkedIn.png" src="https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/.github/assets/logos/LinkedIn.png" style="border-radius: 15%">

# LinkedIn - Send Fine-tuned dataset from posts to Google Sheets

**Tags:** #linkedin #profile #post #stats #naas_drivers #content #automation #gsheet

**Author:** [Maxime Jublou](https://www.linkedin.com/in/maximejublou/)

**Last update:** 2023-11-14 (Created: 2023-11-14)

**Description:** This notebook fetches your profile's post statistics from LinkedIn, transform them into a dataset to be fine-tuned with an LLM model.


<div class="alert alert-info" role="info" style="margin: 10px">
<b>Disclaimer:</b><br>
This code is in no way affiliated with, authorized, maintained, sponsored or endorsed by Linkedin or any of its affiliates or subsidiaries. It uses an independent and unofficial API. Use at your own risk.

This project violates Linkedin's User Agreement Section 8.2, and because of this, Linkedin may (and will) temporarily or permanently ban your account. We are not responsible for your account being banned.
<br>
</div>

### Import libraries

In [None]:
from naas_drivers import linkedin, gsheet
import naas
import pickle
import openai
import json
from tqdm import tqdm
import pandas as pd
import time
from IPython import display
import naas_data_product

### Setup variables
**Mandatory**

[Learn how to get your cookies on LinkedIn](https://www.notion.so/LinkedIn-driver-Get-your-cookies-d20a8e7e508e42af8a5b52e33f3dba75)
- `li_at`: Cookie used to authenticate Members and API clients.
- `JSESSIONID`: Cookie used for Cross Site Request Forgery (CSRF) protection and URL signature validation.
- `linkedin_url`: This variable represents the LinkedIn profile URL.
- `openai.api_key`: Connect to OpenAI with the API key.

**Optional**
- `limit`: number of posts to be retrieved.

In [None]:
# Avatar meta
avatar_name = "Maxime Jublou"

# Mandatory
li_at = naas.secret.get("LINKEDIN_LI_AT") or "YOUR_LINKEDIN_LI_AT" #example: AQFAzQN_PLPR4wAAAXc-FCKmgiMit5FLdY1af3-2
JSESSIONID = naas.secret.get("LINKEDIN_JSESSIONID") or "YOUR_LINKEDIN_JSESSIONID" #example: ajax:8379907400220387585
linkedin_url = "https://www.linkedin.com/in/maximejublou/"  # EXAMPLE "https://www.linkedin.com/in/XXXXXX/"
openai.api_key = naas.secret.get("OPENAI_API_KEY")
spreadsheet_url = "https://docs.google.com/spreadsheets/d/1wediMdHcq5WDqLMZ7ryNrcPxCmlX8BX4ZEl3JNWT8wg/edit#gid=0"
sheet_name = "Maxime_LK_posts"

# Optional
output_dir = os.path.join(naas_data_product.OUTPUTS_PATH, "ai-characters", avatar_name.lower().replace(" ", "_"))
os.makedirs(output_dir, exist_ok=True)
limit = 30

## Model

### Get posts

In [None]:
posts = pload(output_dir, 'posts')

if posts is None:
    posts = linkedin.connect(li_at, JSESSIONID).profile.get_posts_feed(linkedin_url, limit=limit, sleep=False)
    pdump(output_dir, posts, 'posts')

posts

### Generate questions that should lead to post generation

In [None]:
def generate_question_from_text(text, prompt, text_model):
    response = openai.ChatCompletion.create(
        model=text_model,
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": prompt
            },
            {
                "role": "user",
                "content": text
            }
        ]
    )
    
    return response['choices'][0]['message']['content']

    
question_prompt = f"""
    You are a highly skilled AI trained in language comprehension and creation of prompt that could lead to the text provided by the user.
    I would like you to read the following text and create the prompt that would lead to the LLM generate that specific text.
    The texts being provided are post from LinkedIn.
    Please avoid unnecessary details or tangential points.
    
    ```instructions
    WRITE IN THE LANGUAGE THE TEXT IS IN
    ```
"""

texts = [p.TEXT for _, p in posts.iterrows()]
questions = pload(output_dir, 'questions')

if questions is None:
    questions = [generate_question_from_text(text, question_prompt, "gpt-4") for text in tqdm(texts)]
    pdump(output_dir, questions, 'questions')

questions

### Create DataFrame question & answer

In [None]:
dataset = []

for index, q in enumerate(questions):
    dataset.append(
        {
            "question": questions[index],
            "answer": texts[index]
        }
    )
    
final_df = pd.DataFrame(dataset)
final_df

## Output

### Send to Google Sheets spreadsheet

In [None]:
gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=final_df, append=False)