# Web Scraping and Text Processing with LM Studio API

This notebook demonstrates how to scrape text content from a given website and process it using the LM Studio API for summarization. The scraped content will be displayed in a user-friendly Markdown format.


### Import Required Libraries

In this cell, we import the necessary libraries:
- `requests`: To send HTTP requests to fetch the website content.
- `BeautifulSoup`: To parse and extract text from HTML content.
- `Markdown` and `display`: To format and display output in Jupyter Notebook.
- `os`: To manage environment variables for API keys.


In [3]:
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import os


### Define the Scraping Function

Here, we define the `scrape_website` function, which takes a URL as input and:
1. Sends an HTTP GET request to the URL.
2. Parses the HTML content to extract all paragraphs (`<p>` tags).
3. Returns the text content as a single string.
4. Handles any exceptions that occur during the process.


In [4]:
def scrape_website(url):
    """Scrape text content from all paragraphs in the given URL."""
    try:
        response = requests.get(url)
        response.raise_for_status()
        soup = BeautifulSoup(response.content, 'html.parser')
        paragraphs = soup.find_all('p')
        return '\n'.join([para.get_text() for para in paragraphs])
    except Exception as e:
        return f"Error during scraping: {str(e)}"


### Set Up API URL and Headers

In this cell, we define the `setup_api` function, which:
1. Sets the API URL for LM Studio.
2. Retrieves the API key from environment variables (defaulting to `"lm-studio"` if not set).
3. Prepares the headers needed for the API request.
4. Returns both the API URL and headers for use in subsequent requests.



In [5]:
def setup_api():
    """Set up the API URL and headers for LM Studio."""
    API_URL = "http://localhost:1234/v1/chat/completions"
    API_KEY = os.getenv("LM_STUDIO_API_KEY", "lm-studio")

    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    return API_URL, headers


### Create the Payload for API Request

In this cell, we define the `create_payload` function, which:
1. Constructs the system and user prompts to be sent to the LM Studio API.
2. Creates the payload containing the model identifier, messages, and temperature settings.
3. Returns the complete payload for the API request.


In [6]:
def create_payload(text):
    """Create the payload for the API request."""
    system_prompt = "You are an assistant that analyzes website content for summarization. Ignore negative text and report in Markdown format."
    user_prompt = (
        "Please provide a summary of the website, a list of books, and short summaries of each book. "
        "Include author details, published year, and genre."
    )

    payload = {
        "model": "model-identifier",  # Replace with your actual model identifier
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
            {"role": "user", "content": text}
        ],
        "temperature": 0.7
    }

    return payload


### Process API Request and Response

In this cell, we define the `process_with_lm_studio` function, which:
1. Calls the `setup_api` function to get the API URL and headers.
2. Calls the `create_payload` function to construct the payload with the scraped text.
3. Sends the POST request to the LM Studio API and processes the response.
4. Returns the API response content or an error message if the call fails.


In [7]:
def process_with_lm_studio(text):
    """Send the scraped text to the LM Studio API and return the response."""
    API_URL, headers = setup_api()  # Get the API URL and headers
    payload = create_payload(text)  # Create the payload

    try:
        response = requests.post(API_URL, headers=headers, json=payload)
        response.raise_for_status()
        completion = response.json()
        if 'choices' in completion and len(completion['choices']) > 0:
            return completion['choices'][0]['message']['content']
        else:
            return "No valid response from LM Studio API."
    except Exception as e:
        return f"Error during API call: {str(e)}"


### Define the Markdown Display Function

This cell defines the `display_in_markdown` function, which takes a Markdown-formatted string as input and displays it in the Jupyter Notebook. This function enhances the readability of the output.


In [9]:
def display_in_markdown(markdown_text):
    """Display the given text in Markdown format."""
    display(Markdown(markdown_text))

### Define the Main Execution Function

In this cell, we define the `execute_url_scraping` function, which orchestrates the entire workflow:
1. Scrapes the website content from the provided URL.
2. Processes the scraped text using the LM Studio API.
3. Displays the API response in Markdown format.


In [10]:
def execute_url_scraping(url):
    """Execute the web scraping and processing workflow for the given URL."""
    scraped_text = scrape_website(url)
    if scraped_text:
        lm_studio_response = process_with_lm_studio(scraped_text)
        display_in_markdown(lm_studio_response)
    else:
        print("No text found on the website.")


### Example Execution

In this cell, we provide an example of how to call the `execute_url_scraping` function. Simply modify the URL string to scrape a different website.


In [11]:
execute_url_scraping("https://www.thegoodeggs.org/?gad_source=1&gclid=Cj0KCQjwgrO4BhC2ARIsAKQ7zUlPZOIR2JlFCI5Boh_a6IplK5CGFWU3kffAy_Xk88x49YvtFh86uHwaAtGiEALw_wcB")

**Summary of the Website**
The website promotes a series of children's books called "The Good Eggs" which focuses on social-emotional learning and promoting kindness, diversity, and service among young readers. The series is written by Sandy Ciaramitaro and aims to help children develop self-esteem, character, and positive relationships with others.

**List of Books**
### The Good Eggs Book Series

#### 1. **The Good Eggs - Book 1**

* Author: Sandy Ciaramitaro
* Published Year: [No specific year mentioned]
* Genre: Social-Emotional Learning, Children's Fiction

Summary: This book introduces the main characters Seggourney, Reggie, Megg, and Greggory as they learn about goodness, diversity, and service in their community. They help Benedict and Peggy through various challenges and struggles, demonstrating kindness and compassion.

#### 2. **The Good Eggs Travel the World - Book 2**

* Author: Sandy Ciaramitaro
* Published Year: [No specific year mentioned]
* Genre: Social-Emotional Learning, Children's Fiction

Summary: In this second book, The Good Eggs embark on a global adventure to learn about different cultures, customs, and landmarks. They meet other little eggs from around the world and return home with valuable lessons about diversity and service.

#### 3. **The Good Eggs in the Community - Book 3**

* Author: Sandy Ciaramitaro
* Published Year: [No specific year mentioned]
* Genre: Social-Emotional Learning, Children's Fiction

Summary: The third book sees The Good Eggs returning home to their community as United Nations Ambassadors. They provide service in various areas, such as collecting school supplies and caring for animals at a shelter, learning the importance of helping others.

**Author Details**
Sandy Ciaramitaro is the author of "The Good Eggs" series, which aims to promote social-emotional learning among children.

In [12]:
execute_url_scraping("https://www.storyberries.com/")

**Website Summary**
The website appears to be a collection of children's stories, poems, and nursery rhymes. The content is playful, imaginative, and mostly non-scary, making it suitable for young readers. The website features a variety of characters, including animals, bugs, and humans, all working together in fun and creative ways.

**List of Books**

1. **The Dark Fairy's Halloween Adventure**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Fantasy/Halloween
2. **Doodlebugs to the Rescue**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Adventure/Art
3. **The Apple of Discord**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Mythology/Fairy Tale
4. **Wee Ghost's Halloween Friends**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Friendship/Halloween
5. **The Origin of Jack O' Lanterns**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's History/Halloween
6. **The Plastic Ocean**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Environmentalism/Science
7. **Bedtime Worlds**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Fantasy/Imagination
8. **Trouble's Big Heart**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Friendship/Troublesome Behavior
9. **The Sparrow's Revenge**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Revenge/Fantasy
10. **Lucky Hans' Lucky Day**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Humor/Luck
11. **The Golden Nightingale**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Mythology/Fantasy
12. **Four Brothers and the Dragon**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Fantasy/Adventure
13. **The Bear and the Dwarf**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Friendship/Fantasy
14. **Luzuko's Favorite Eggs**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Humor/Cooking
15. **Pepper and Carrot's Potion Competition**
	* Author: Storyberries
	* Published Year: N/A (exact date not provided)
	* Genre: Children's Fantasy/Competition

**Additional Resources**

* **The Story of the Goldilocks and the Three Bears**
	+ A comic book adaptation by Storyberries
	+ Published Year: N/A (exact date not provided)
	+ Genre: Children's Comics/Fairy Tale