# Web Scraping and Text Processing with LM Studio API

This notebook demonstrates how to scrape text content from a given website and process it using the LM Studio API for summarization. The scraped content will be displayed in a user-friendly Markdown format.


### Import Required Libraries

In this cell, we import the necessary libraries:
- `requests`: To send HTTP requests to fetch the website content.
- `BeautifulSoup`: To parse and extract text from HTML content.
- `Markdown` and `display`: To format and display output in Jupyter Notebook.
- `os`: To manage environment variables for API keys.


In [1]:
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import os


### Define the Scraping Function

Here, we define the `scrape_website` function, which takes a URL as input and:
1. Sends an HTTP GET request to the URL.
2. Parses the HTML content to extract all paragraphs (`<p>` tags).
3. Returns the text content as a single string.
4. Handles any exceptions that occur during the process.


In [2]:
def scrape_website(url):
    """Scrape text content from all paragraphs in the given URL."""
    try:
        response = requests.get(url)
        response.raise_for_status()
        soup = BeautifulSoup(response.content, 'html.parser')
        paragraphs = soup.find_all('p')
        return '\n'.join([para.get_text() for para in paragraphs])
    except Exception as e:
        return f"Error during scraping: {str(e)}"


### Set Up API URL and Headers

In this cell, we define the `setup_api` function, which:
1. Sets the API URL for LM Studio.
2. Retrieves the API key from environment variables (defaulting to `"lm-studio"` if not set).
3. Prepares the headers needed for the API request.
4. Returns both the API URL and headers for use in subsequent requests.



In [3]:
def setup_api():
    """Set up the API URL and headers for LM Studio."""
    API_URL = "http://localhost:1234/v1/chat/completions"
    API_KEY = os.getenv("LM_STUDIO_API_KEY", "lm-studio")

    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    return API_URL, headers


### Create the Payload for API Request

In this cell, we define the `create_payload` function, which:
1. Constructs the system and user prompts to be sent to the LM Studio API.
2. Creates the payload containing the model identifier, messages, and temperature settings.
3. Returns the complete payload for the API request.


In [4]:
def create_payload(text):
    """Create the payload for the API request."""
    system_prompt = "You are an assistant that analyzes website content for summarization. Ignore negative text and report in Markdown format."
    user_prompt = (
        "Please provide a summary of the today's headlines categorized by the following topics:. "
        "Include General News, Countries-wise, Financial,Sports,Technology, etc."
    )

    payload = {
        "model": "model-identifier",  # Replace with your actual model identifier
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
            {"role": "user", "content": text}
        ],
        "temperature": 0.7
    }

    return payload


### Process API Request and Response

In this cell, we define the `process_with_lm_studio` function, which:
1. Calls the `setup_api` function to get the API URL and headers.
2. Calls the `create_payload` function to construct the payload with the scraped text.
3. Sends the POST request to the LM Studio API and processes the response.
4. Returns the API response content or an error message if the call fails.


In [5]:
def process_with_lm_studio(text):
    """Send the scraped text to the LM Studio API and return the response."""
    API_URL, headers = setup_api()  # Get the API URL and headers
    payload = create_payload(text)  # Create the payload

    try:
        response = requests.post(API_URL, headers=headers, json=payload)
        response.raise_for_status()
        completion = response.json()
        if 'choices' in completion and len(completion['choices']) > 0:
            return completion['choices'][0]['message']['content']
        else:
            return "No valid response from LM Studio API."
    except Exception as e:
        return f"Error during API call: {str(e)}"


### Define the Markdown Display Function

This cell defines the `display_in_markdown` function, which takes a Markdown-formatted string as input and displays it in the Jupyter Notebook. This function enhances the readability of the output.


In [6]:
def display_in_markdown(markdown_text):
    """Display the given text in Markdown format."""
    display(Markdown(markdown_text))

### Define the Main Execution Function

In this cell, we define the `execute_url_scraping` function, which orchestrates the entire workflow:
1. Scrapes the website content from the provided URL.
2. Processes the scraped text using the LM Studio API.
3. Displays the API response in Markdown format.


In [7]:
def execute_url_scraping(url):
    """Execute the web scraping and processing workflow for the given URL."""
    scraped_text = scrape_website(url)
    if scraped_text:
        lm_studio_response = process_with_lm_studio(scraped_text)
        display_in_markdown(lm_studio_response)
    else:
        print("No text found on the website.")


### Example Execution

In this cell, we provide an example of how to call the `execute_url_scraping` function. Simply modify the URL string to scrape a different website.


In [8]:
execute_url_scraping("https://edition.cnn.com/world")

I'll provide a summary of today's headlines in Markdown format, categorized by topic and ignoring negative text.

**General News**
-------------------

* **Breaking News**: New Study Reveals Human Brain Can Learn from Artificial Intelligence (CNN)
	+ Researchers say AI can improve human cognition and memory
* **World Leaders Meet for Climate Change Summit**: Global leaders gather to discuss climate change solutions (BBC)

**Countries-wise**
-----------------

* **India**: Indian Space Agency Launches Mission to Explore Moon's South Pole (Hindustan Times)
	+ ISRO launches Chandrayaan-3 mission to study lunar south pole
* **China**: China Announces Plans for Lunar Base by 2028 (SCMP)
	+ Chinese space agency aims to establish permanent human presence on the moon

**Financial**
-------------

* **Stock Market**: Global Stock Markets Experience Upbeat Day as Economies Show Signs of Recovery (Reuters)
	+ Investors optimistic about economic growth and trade deals
* **Cryptocurrency**: Bitcoin Surges Past $40,000 Mark Amid Increased Adoption (CoinDesk)
	+ Cryptocurrency market sees significant gains due to increased adoption

**Sports**
----------

* **Football**: UEFA Champions League Quarterfinals Kicks Off with High-Stakes Matches (ESPN)
	+ Top teams compete for spot in semifinals
* **Tennis**: Novak Djokovic Wins Indian Wells Masters, Secures Spot at US Open (Bloomberg)
	+ Tennis star dominates competition to secure US Open spot

**Technology**
-------------

* **Artificial Intelligence**: AI-Powered Robot Assistants Set to Revolutionize Healthcare (TechCrunch)
	+ Robots with AI capabilities to revolutionize healthcare industry
* **5G Networks**: 5G Network Rollout Accelerates as More Cities Get Connected (CNET)
	+ 5G networks expanding rapidly, improving connectivity and speeds

**Entertainment**
-----------------

* **Movie News**: New Star Wars Film to be Released in 2026, According to Disney (Variety)
	+ Upcoming Star Wars film to feature new characters and storylines
* **Music**: Taylor Swift Announces New Album, Set to Release Later This Year (Billboard)
	+ Singer-songwriter announces new album, sparks excitement among fans

Please note that this summary is based on a limited selection of headlines and may not be exhaustive.

In [9]:
execute_url_scraping("https://www.bbc.com/news/world")

Here are the summaries categorized by topic:

**General News**
* Several UN troops have been injured in recent weeks, as Israel continues its military offensive against Hezbollah.
* A high-profile politician was killed in a busy Mumbai street, sending shockwaves through the country.
* Pyongyang vowed to sever road and railway access to South Korea in a bid to "completely separate" the two countries.

**Countries-wise**
* Israel: Israeli troops have been injured in recent weeks as part of its military offensive against Hezbollah, and a food distribution centre was attacked, killing 10 people.
* South Korea: Pyongyang has vowed to sever road and railway access to South Korea in an attempt to "completely separate" the two countries.
* North Korea: Nasa's spacecraft could change what we know about life in our solar system if it reaches its destination.
* China: China sees Taiwan as a breakaway province, but the self-ruled island sees itself as distinct.
* India: The killing of a high-profile politician in a busy Mumbai street has sent shockwaves through the country.

**Financial**
* Tech giant Microsoft will use energy from small reactors to power its use of artificial intelligence.

**Sports**
* (No specific sports-related news found in this article)

**Technology**
* Nasa's spacecraft could change what we know about life in our solar system if it reaches its destination.
* The BBC visits a key city under attack by Russia and finds eroded morale among its few remaining residents.

**Entertainment**
* A new film has fallen to third spot in its second week on the chart, receiving mixed reviews from critics.
* A Hollywood actor discusses his fatherhood, film career, and a brush with death in an interview.

**Science**
* It is estimated that a comet has made its closest approach to Earth at 44 million miles away, according to Nasa.

Note: I have ignored negative text and only reported on the positive or neutral news items.