### LinkedIn post scraper and RAG chat bot analyzer

**This concept was developed using Python 3.10.16 in an Anaconda environment.**

```
conda create -n osint python=3.10  
conda activate osint
```

**This concept demonstrates which data we share on LinkedIn and how easily it can be analyzed and misused.**  
**Please use it only as a tool for your own social network hygiene.**

In [1]:
!pip install ipykernel
!pip install undetected-chromedriver
!pip install selenium
!pip install time
!pip install bs4
!pip install pandas
!pip install openai
!pip install gradio
!pip install chromadb
!pip install python-dotenv

[31mERROR: Could not find a version that satisfies the requirement time (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for time[0m[31m


In [2]:
import time
import random
from urllib.parse import urlparse
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from selenium.common.exceptions import TimeoutException, NoSuchElementException
import os
import json

In [3]:
def random_sleep(min_seconds=5, max_seconds=30):
    """
    Sleep for a random amount of time between min and max seconds.
    This helps avoid detection by mimicking human behavior.
    """
    time.sleep(random.uniform(min_seconds, max_seconds))

### Saving Cookies

Saving cookies while web scraping serves several important purposes:

1. **Authentication Persistence**: Maintains login state between sessions, avoiding repeated logins
2. **Reduced Detection Risk**: Makes requests appear more like a regular user by maintaining consistent cookies
3. **Efficiency**: Saves time by reusing authenticated sessions rather than logging in each time
4. **Rate Limiting**: Helps manage rate limits by maintaining consistent session identification
5. **Compliance**: Some websites require specific cookies for tracking user agreements and preferences

The `save_cookies()` function stores these essential browser cookies to a text file for later reuse in subsequent scraping sessions.

In [4]:
def save_cookies(driver, filename="linkedin_cookies.txt"):
    """Save cookies to a file for later use"""
    if not os.path.exists("cookies"):
        os.makedirs("cookies")
    
    path = os.path.join("cookies", filename)
    with open(path, "w", encoding="utf-8") as f:
        for cookie in driver.get_cookies():
            f.write(f"{cookie['name']}={cookie['value']}\n")
            print(f"Cookies saved to {path}")

In [5]:
def load_cookies(driver, filename="linkedin_cookies.txt") -> bool:
    """Load cookies from a file and add them to the browser session"""
    try:
        path = os.path.join("cookies", filename)
        if os.path.exists(path):
            # First navigate to LinkedIn domain to set cookies properly
            driver.get("https://www.linkedin.com")
            with open(path, "r", encoding="utf-8") as f:
                cookies = f.readlines()
                for cookie in cookies:
                    name, value = cookie.strip().split('=', 1)
                    cookie_dict = {
                        'name': name,
                        'value': value,
                        'domain': '.linkedin.com'  # Set proper domain for LinkedIn cookies
                    }
                    try:
                        driver.add_cookie(cookie_dict)
                    except Exception as e:
                        print(f"Error adding cookie {name}: {e}")
                        continue
            print(f"Cookies loaded from {path}")
            return True
        else:
            print(f"Cookie file not found at {path}")
            return False
    except Exception as e:
        print(f"Error loading cookies: {e}")
        return False

In [6]:
# Chrome executable path in my Linux, replace with your own path (run which chromium-browser in terminal)
chrome_path = "/usr/bin/chromium-browser"

In [7]:
# Chrome browser settings: 
# headless mode is used to run the browser in the background without a GUI. It doesn't work with lazy loading
def get_driver(headless=False) -> "uc.Chrome | None": 
    options = uc.ChromeOptions()
    if headless:
         options.add_argument("--headless")  # Run in headless mode

    else:
        # Disagle GPU acceleration
        options.add_argument("--disable-gpu")

        # Disable sandboxing
        options.add_argument("--no-sandbox")
    
        # Disable shared memory space
        options.add_argument("--disable-dev-shm-usage")

        # It helps bypass anti-bot detection systems that look for automated browser signatures
        options.add_argument("--disable-blink-features=AutomationControlled")
        
        # Open Chrome in a maximized window
        options.add_argument("--start-maximized")
    
        # This is try, how to disable saving passwords, privacy settings, etc.
        options.add_argument("--disable-extensions")
        options.add_argument("--disable-infobars")
        options.add_argument("--disable-popup-blocking")
        options.add_argument("--disable-notifications")
        options.add_argument("--disable-translate")
        options.add_argument("--disable-application-cache")
        options.add_argument("--disable-extensions-file-access-check")
        options.add_argument("--disable-extensions-http-throttling")
        prefs = {
          "credentials_enable_service": False,
          "profile.password_manager_enabled": False,
          "autofill.profile_enabled": False,
          "autofill.enabled": False,
          "profile.default_content_setting_values.notifications": 2,
          "profile.managed_default_content_settings.popups": 2
        }
        options.add_experimental_option("prefs", prefs)
        options.add_argument("--password-store=basic")
        options.add_argument("--disable-features=PasswordManager")

    driver = None
    try:
        driver = uc.Chrome(options=options, version_main=134)  # Specify your Chrome major version
        driver.start_session()
    except Exception as e:
        print(f"Error: {e}")
        
    return driver    

## Opening Chrome Browser & Log in to LinkedIn

Do not interact with the Chrome browser. All actions are performed automatically by the scripts below.

driver = new Chrome browser window

#### Set get_driver(True) for head less mode

In [8]:
driver = get_driver(headless=False)

In [9]:
type(driver)

undetected_chromedriver.Chrome

If driver is working, we can quit it and use get_driver method in login function.

In [None]:
# driver.quit()

This function is only for logging into a LinkedIn account. To log in, you need an account email and password. Use your own credentials or a test account.

In [13]:
def linkedin_login(username, password) -> "uc.Chrome | None":  
    """
    Function to login to LinkedIn using undetected_chromedriver
    
    Parameters:
    username (str): Your LinkedIn email/username
    password (str): Your LinkedIn password
    
    Returns:
    webdriver: Browser instance logged into LinkedIn
    """


     
    if load_cookies(driver):
        # Navigate to LinkedIn feed page
        driver.get("https://www.linkedin.com")
        random_sleep(2, 4)
        driver.get("https://www.linkedin.com/feed/")
        # Check if we're already logged in
        if "/feed" in driver.current_url:
            print("Already logged in to LinkedIn!")
            return driver
        
        # If not logged in, clear cookies and start fresh
        driver.delete_all_cookies()

    else:
        try:
            # Navigate to LinkedIn login page
            driver.get("https://www.linkedin.com/login")
            random_sleep(2, 4)
        
            # Wait for the page to load and username field to be present
            username_field = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.ID, "username"))
            )
        
            # Enter username with random typing speed to mimic human behavior
            for char in username:
                username_field.send_keys(char)
                random_sleep(0.1, 0.3)
                
            random_sleep(3,6)
            
            # Enter password with random typing speed
            password_field = driver.find_element(By.ID, "password")
            for char in password:
                password_field.send_keys(char)
                random_sleep(0.1, 0.3)
        
            random_sleep(2, 4)
        
            # Click login button
            login_button = driver.find_element(By.XPATH, "//button[contains(@class, 'btn__primary--large')]")
            login_button.click()
        
            # Wait 30 seconds for login to complete and verify we're on the feed page
            try:
                WebDriverWait(driver, 30).until(
                    lambda driver: urlparse(driver.current_url).netloc == "www.linkedin.com" and 
                    ("/feed" in driver.current_url or "/checkpoint" in driver.current_url)
                    )
                
                # Check if we've been redirected to a security checkpoint
                if "/checkpoint" in driver.current_url:
                    print("LinkedIn security checkpoint detected. Manual intervention may be required.")
                    # Allow time for manual intervention if needed
                    input("Press Enter after completing the security checkpoint...")
                else:
                    print("Successfully logged in to LinkedIn!")    
    
                save_cookies(driver)
                return driver
            
            except TimeoutException:
                print("Login unsuccessful or redirected to an unexpected page, or it take more than 30 seconds.")
                print(f"Current URL: {driver.current_url}")
                driver.quit()
                
            return None
        
        except Exception as e:
            print(f"An error occurred during login: {e}")
            driver.quit()

            return None

### .env File

Copy .env.default to .env, edit .env file and set your LinkedIn username and password. Then update the target name and target username with the account you're interested in.

In [None]:
# Example of copy the .env.default file to .env: 
# !cp .env.default .env

In [11]:
linkedin_username = os.getenv("LINKEDIN_USER")
linkedin_password = os.getenv("LINKEDIN_PASSWORD")
target_name = os.getenv("LINKEDIN_TARGET_NAME")
target_username = os.getenv("LINKEDIN_TARGET_USERNAME")

### Finally, Login

Now I will use linkedin_username and linkedin_password defined above to invoke the linkedin_login function. You can run it and watch your Chrome browser to see what the script is doing.

In [14]:
 # Use existing driver instance
linkedin = linkedin_login(linkedin_username, linkedin_password)

if linkedin is None:
    print("Failed to log in to LinkedIn.")
else:
    print(f"Logged in to LinkedIn successfully: {linkedin.current_url}")

Cookies loaded from cookies/linkedin_cookies.txt
Already logged in to LinkedIn!
Logged in to LinkedIn successfully: https://www.linkedin.com/feed/


**Notice**  
If you get the error `"An error occurred during login: 'NoneType' object is not iterable"`, it means you skipped the part where the `.env` file is created and the credentials and target are updated.

In [15]:
# Just check, if I have correct taget username
target_username

'ludekkvapil'

### Let Chrome open the target user's posts URL.

In [16]:
linkedin.get(f"https://www.linkedin.com/in/{target_username}/recent-activity/all/")

In [17]:
linkedin.current_url

'https://www.linkedin.com/in/ludekkvapil/recent-activity/all/'

## Lazy Loading Posts  
When we open the feed, not all posts are loaded initially. We need to scroll down and wait until the posts are fully loaded. This function does it for us.

In [18]:
def lazy_load_post(max_scrolls=1000) -> bool:
    """
    Improved scrolling with incremental approach and better detection
    """
    last_height = driver.execute_script("return document.body.scrollHeight")
    posts_count = len(driver.find_elements(By.CLASS_NAME, "feed-shared-update-v2__control-menu-container"))
    
    for i in range(max_scrolls):
        # Scroll down incrementally (1/4 of viewport height at a time)
        driver.execute_script("window.scrollBy(0, window.innerHeight/4);")
        
        # Wait for new content to load with random timing
        random_sleep(2, 4)  # Shorter initial wait
        
        # After a few incremental scrolls, check if new content loaded
        if i % 4 == 3:  # Check after every 4 small scrolls
            # Count visible posts
            new_posts_count = len(driver.find_elements(By.CLASS_NAME, "feed-shared-update-v2__control-menu-container"))
            new_height = driver.execute_script("return document.body.scrollHeight")
            
            print(f"Scroll progress: {i+1}/{max_scrolls}, Posts: {new_posts_count}, Height: {new_height}")
            
            # If no new posts appeared after multiple scroll attempts
            if new_posts_count == posts_count and new_height == last_height:
                # Try one big scroll as a last attempt
                driver.execute_script("window.scrollBy(0, window.innerHeight*2);")
                random_sleep(5, 10)
                
                # Check again
                newer_posts_count = len(driver.find_elements(By.CLASS_NAME, "feed-shared-update-v2__control-menu-container"))
                if newer_posts_count == new_posts_count:
                    print(f"No new content loaded after {i+1} scrolls. Stopping.")
                    driver.execute_script("window.scrollTo(0, 0);")  # Scroll back to top
                    return True
            
            # Update tracking variables
            posts_count = new_posts_count
            last_height = new_height
    
    print(f"Reached maximum number of scrolls ({max_scrolls}).")
    driver.execute_script("window.scrollTo(0, 0);")  # Scroll back to top
    return False

Now run the lazy load function, and it will perform the scrolling magic in your Chrome browser. It can take several minutes, depending on the number of posts the target has.

In [19]:
lazy_load_post()

Scroll progress: 4/1000, Posts: 7, Height: 4835
Scroll progress: 8/1000, Posts: 10, Height: 5105
Scroll progress: 12/1000, Posts: 13, Height: 5817
Scroll progress: 16/1000, Posts: 15, Height: 6438
Scroll progress: 20/1000, Posts: 17, Height: 6662
Scroll progress: 24/1000, Posts: 20, Height: 7553
Scroll progress: 28/1000, Posts: 23, Height: 11200
Scroll progress: 32/1000, Posts: 25, Height: 11424
Scroll progress: 36/1000, Posts: 28, Height: 11817
Scroll progress: 40/1000, Posts: 31, Height: 12701
Scroll progress: 44/1000, Posts: 32, Height: 12813
Scroll progress: 48/1000, Posts: 35, Height: 13756
Scroll progress: 52/1000, Posts: 36, Height: 14161
Scroll progress: 56/1000, Posts: 39, Height: 15099
Scroll progress: 60/1000, Posts: 40, Height: 15158
Scroll progress: 64/1000, Posts: 43, Height: 18690
Scroll progress: 68/1000, Posts: 44, Height: 19391
Scroll progress: 72/1000, Posts: 47, Height: 20603
Scroll progress: 76/1000, Posts: 48, Height: 21324
Scroll progress: 80/1000, Posts: 49, Hei

True

### Get post 
This function search for CSS elements and map them as post data and add them to list. 

In [20]:
def get_posts() -> dict:
    """
    Improved post extraction with multiple selector strategies and better error handling
    """
    posts = {}
    
    # Try multiple selector strategies
    post_containers = driver.find_elements(By.CLASS_NAME, "feed-shared-update-v2__control-menu-container")
    print(f"Found {len(post_containers)} potential post containers")
    
    for idx, element in enumerate(post_containers):
        try:
            # Try first approach - standard post
            try:
                post_element = element.find_element(By.CLASS_NAME, "update-components-text")
                post_element = post_element.find_element(By.CSS_SELECTOR, "span[dir='ltr']").get_attribute("innerHTML")
                metadata = element.find_element(By.CSS_SELECTOR, "span.update-components-actor__sub-description.text-body-xsmall.t-black--light")
                metadata = metadata.find_element(By.CSS_SELECTOR, "span.visually-hidden").get_attribute("innerHTML")
            except NoSuchElementException:
                # Try alternative selectors for different post types
                try:
                    # Look for any text content with fallbacks
                    post_element = element.find_element(By.CSS_SELECTOR, ".feed-shared-update-v2__description-wrapper").get_attribute("innerHTML")
                    metadata_element = element.find_element(By.CSS_SELECTOR, ".feed-shared-actor__meta")
                    metadata = metadata_element.text
                except NoSuchElementException:
                    # Skip this post if we can't find text
                    print(f"Skipping post #{idx} - structure not recognized")
                    # print(f"Post HTML: {element.get_attribute('innerHTML')}")
                    continue
            
            # Clean and process the text
            text = BeautifulSoup(post_element, "html.parser").get_text("\n", strip=True)
            metadata = BeautifulSoup(metadata, "html.parser").get_text("\n", strip=True)
            
            # Skip empty posts
            if not text.strip():
                print(f"Skipping empty post #{idx}")
                continue
                
            unique_key = f"{abs(hash(metadata + text))}"
            posts[unique_key] = {
                "user": target_name,
                "metadata": metadata, 
                "text": text
            }
            
        except Exception as e:
            print(f"Error processing post #{idx}: {e}")
            continue
    
    print(f"Successfully extracted {len(posts)} posts out of {len(post_containers)} containers")
    return posts

In [None]:
# Checkout scrapped posts, uncomment the line below to see them.
# get_posts()

### Store it to JSON File  
For future use, I’d like to have the posts stored in a file called `posts.json`. It will be created in the root folder of this project.

In [21]:
# Get posts
posts = get_posts()

# Save posts to a JSON file
with open("posts.json", "w", encoding="utf-8") as f:
    json.dump(posts, f, ensure_ascii=False, indent=4)
   
print("Posts saved to posts.json")

Found 264 potential post containers
Skipping post #41 - structure not recognized
Skipping post #73 - structure not recognized
Skipping post #74 - structure not recognized
Skipping post #239 - structure not recognized
Successfully extracted 260 posts out of 264 containers
Posts saved to posts.json


In [22]:
# We don't need Chrome browser anymore. It works only for headless mode
if linkedin:
    linkedin.quit()
elif driver:
    driver.quit()

## Analysis Using GenAI: LinkedIn Posts RAG Chatbot  

In this section, I use ChromaDB to store posts as vectors. Then, I use OpenAI to analyze the retrieved data and display the results in a Gradio chat window.

In [23]:
from openai import OpenAI
import gradio as gr
import chromadb
from chromadb.utils import embedding_functions
from typing import List

  from .autonotebook import tqdm as notebook_tqdm


In [24]:
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
MODEL = 'gpt-4o-mini'
openai = OpenAI()

### RAG Application  

This object is responsible for embedding, storing, retrieving, and answering.

In [25]:
class RagChat:
    def __init__(self, json_path: str = 'posts.json', collection_name: str = 'knowledge_base'):
        """
        Initialize the RAG application with a JSON file and ChromaDB collection
        
        :param json_path: Path to the JSON file containing knowledge base
        :param collection_name: Name of the ChromaDB collection
        """
        # Initialize ChromaDB client with specific folder and index, 
        # default ChromaDb embedding model.    
        os.makedirs("./chromadb", exist_ok=True)
        self.chroma_client = chromadb.PersistentClient(path="./chromadb")
        self.linkedin = self.chroma_client.get_or_create_collection(
            name="linkedin_posts"
        )
        
        # Load and embed JSON data
        self.data = self.load_json_data(json_path)
        self.index_data(json_path)
        

    def index_data(self, json_path: str):
        if self.data is not None:
            print("Data loaded successfully.")
        
            for idx, (post_id, post_data) in enumerate(self.data.items()):
                chroma_id = f"topic_{idx}"
            
                # Check if ID already exists in the collection
                existing = self.linkedin.get(ids=[chroma_id])
                if existing and existing.get("ids"):
                   # print(f"Skipping {chroma_id} (already exists in ChromaDB)")
                   continue  # Skip if already indexed
            
                # Create a comprehensive text document from the topic
                document_parts = [
                   f"User: {post_data.get('user', '')}",
                   f"Text: {post_data.get('text', '')}",
                   f"Metadata: {post_data.get('metadata', '')}"
                ]
                document = '\n\n'.join(document_parts)

                metadata = {
                    "source": json_path,
                    "index": idx,
                    "title": post_id
                }

                # Add the new document to ChromaDB
                self.linkedin.add(
                   documents=[document],
                   metadatas=[metadata],
                   ids=[chroma_id]
                   )
                # print(f"Added {chroma_id} to ChromaDB")

            print("Indexing complete.")
        else:
            print("No topics found in the JSON file.")

       
    def load_json_data(self, json_path: str, strings_only: bool = False):
        """
        Load JSON data and embed it into ChromaDB
        :param json_path: Path to the JSON file
        """
        # Read JSON file
        try:
            with open(json_path, 'r', encoding='utf-8') as f:
                if strings_only:
                    return data
                else:
                    # Load the entire JSON structure
                    data = json.load(f)
                    print(f"Data found {data} " )
                    return data
        except FileNotFoundError:
            print(f"Error: File {json_path} not found.")
            return
        except json.JSONDecodeError:
            print(f"Error: Invalid JSON format in {json_path}.")
            return
        
    
    def get_prompt_category(self, prompt: str) -> str:
        """
        Categorize the query to determine the context to retrieve
        :param prompt User's query
        :return: Category of the query
        """
        system_prompt = f"""
        You are a personal assistant who categorizes queries.
        You are given a query and you need to categorize it.
        The categories are:
        1. summary: The query is asking for a summary, all posts, or full text analysis
        2. default: Rest of the queries

        The query is: {prompt}
        Based on the query, return the category.
        Do not make up any answers.

        """
        
        category = self.chat_completion(system_prompt, prompt)

        if "summary" in category.lower():
            return "summary"
        else:
            return "default"

    def retrieve_context(self, query: str, top_k: int = 10) -> List[str]:
        """
        Retrieve most relevant context for a given query
        
        :param query: User's query
        :param top_k: Number of top results to retrieve
        :return: List of retrieved context documents
        """
        # Retrieve top K most similar documents
        results = self.linkedin.query(
            query_texts=[query],
            n_results=top_k
        )
        
        # Extract and parse documents
        contexts = []
        if 'documents' in results and len(results['documents']) > 0:
            for doc in results['documents'][0]:
                contexts.append(doc)

        print(f"Context: {contexts} ")
        return contexts
    
    def chat_completion(self, system_prompt: str, prompt: str) -> str:
        """
        Generate a chat completion using OpenAI
        :param  system_prompt: System prompt
        :param  prompt: User's query
        :return: Generated response
        """
        print(f"System prompt: {system_prompt}")
        print(f"User prompt: {prompt}")

        try:
            # Generate response using OpenAI
            completion = openai.chat.completions.create(
                model=MODEL,
                temperature=0,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": prompt}
                ]
            )

            # Access the response correctly based on OpenAI API version
            if hasattr(completion, 'choices') and len(completion.choices) > 0:
                if hasattr(completion.choices[0], 'message'):
                    return completion.choices[0].message.content
                else:
                    return completion.choices[0].text
            return "No response generated"
        except Exception as e:
            return f"An error occurred while generating the response: {str(e)}"

    def generate_response(self, user_prompt: str) -> str:
        """
        Generate a response using retrieved context and OpenAI
        
        :param user_prompt: User's query
        :return: Generated response
        """
        try:
            category = self.get_prompt_category(user_prompt)
            print(f"Query category: {category}")

            if category == "summary":
                # Convert JSON data to list of strings for the summary case
                contexts = []
                for post_id, post_data in self.data.items():
                    document_parts = [
                        f"User: {post_data.get('user', '')}",
                        f"Text: {post_data.get('text', '')}",
                        f"Metadata: {post_data.get('metadata', '')}"
                        ]
                    contexts.append('\n\n'.join(document_parts))
         
            else:
                # Retrieve context based on the user's query
                contexts = self.retrieve_context(user_prompt)

            
            if contexts:
                print(f"Retrieved {len(contexts)} contexts")
            else:
                print("No relevant contexts found")

            system_prompt = f""".
            You are a personal assistant who is analyzing LinkedIn posts written by {target_name}
            Do not make up any answers. 
            If user ask about he, or she, is is question about {target_name}.
            Use knowledge base to answer the question.
            If you don't have enough information from knowledge base, say "I don't know".

            Knowledge base: 
            {' '.join(contexts)}
            """
        
            return self.chat_completion(system_prompt, user_prompt)
        
        except Exception as e:
            return f"An error occurred while generating the response: {str(e)}"

In [26]:
# Loading scraped posts from JSON file to ChromaDb
rag_app = RagChat('posts.json')

Data found {'5800241450330637768': {'user': 'Ludek Kvapil', 'metadata': '5 hours ago • Visible to anyone on or off LinkedIn', 'text': "### I Created a Weapon\nI'm working on my personal RAG chatbot and got the idea that it would be useful to include some LinkedIn posts in the knowledge base. So, I built a scraper that collects all posts from my profile and saves them to a JSON file. In the next step, I use ChromaDB as a vector store for that text, allowing me to search it as vectors. The data is then used by an OpenAI-based chatbot, which can search and analyze the posts.\nThis concept is great for analyzing your posts and improving your social media hygiene, but it can also be misused for malicious purposes like scams, manipulation, or doxing.\nTechnologies used:\nhashtag\n#\nPython\n,\nhashtag\n#\nJupyterLab\n,\nhashtag\n#\nSelenium\n,\nhashtag\n#\nChromaDB\n,\nhashtag\n#\nOpenAI\n, and\nhashtag\n#\nGradio\n.\nUse this\nhashtag\n#\nOSINT\n&\nhashtag\n#\nGenAI\ntool wisely."}, '438281

Test of RAG chat with dataset based on scraped LinkedIn posts

In [27]:
# Test questions
queries = [
    "Show me all posts about OSINT",
    "Does {target_name} have any experience with Drupal?",
]
    
for query in queries:
    print(f"\nQuery: {query}")
    response = rag_app.generate_response(query)
    print(f"Response: {response}")


Query: Show me all posts about OSINT
System prompt: 
        You are a personal assistant who categorizes queries.
        You are given a query and you need to categorize it.
        The categories are:
        1. summary: The query is asking for a summary, all posts, or full text analysis
        2. default: Rest of the queries

        The query is: Show me all posts about OSINT
        Based on the query, return the category.
        Do not make up any answers.

        
User prompt: Show me all posts about OSINT
Query category: summary
Retrieved 260 contexts
System prompt: .
            You are a personal assistant who is analyzing LinkedIn posts written by Ludek Kvapil
            Do not make up any answers. 
            If user ask about he, or she, is is question about Ludek Kvapil.
            Use knowledge base to answer the question.
            If you don't have enough information from knowledge base, say "I don't know".

            Knowledge base: 
            User: Lude

## Configure Gradio 
Gradio is an open-source Python package that allows you to quickly build a demo or web application for your machine learning model, API, or any arbitrary Python function. You can then share a link to your demo or web application in just a few seconds using Gradio's built-in sharing features. 

In [28]:
# This is just helper function which allow me use RAG chat object in Gradio
def respond(message, history):
    # Generate response using the RAG object
    bot_message = rag_app.generate_response(message)
    # Return rag bot message
    return bot_message

In [29]:
demo = gr.ChatInterface(
    fn=respond,
    title="RAG Knowledge Base Chat",
    description="Ask questions about the content in your posts.json file",
    examples=["Show me all posts about OSINT"],
    theme="monochrome", # Theme list: monochrome, default, soft
    chatbot=gr.Chatbot(type="messages"),
    type="messages"
)

In [30]:
### Launch the Gradio app, set true for public link 
demo.launch()

* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




In [None]:
# demo.close()