# Web Scraping and Text Processing with LM Studio API

This notebook demonstrates how to scrape text content from a given website and process it using the LM Studio API for summarization. The scraped content will be displayed in a user-friendly Markdown format.


### Import Required Libraries

In this cell, we import the necessary libraries:
- `requests`: To send HTTP requests to fetch the website content.
- `BeautifulSoup`: To parse and extract text from HTML content.
- `Markdown` and `display`: To format and display output in Jupyter Notebook.
- `os`: To manage environment variables for API keys.


In [1]:
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import os


### Define the Scraping Function

Here, we define the `scrape_website` function, which takes a URL as input and:
1. Sends an HTTP GET request to the URL.
2. Parses the HTML content to extract all paragraphs (`<p>` tags).
3. Returns the text content as a single string.
4. Handles any exceptions that occur during the process.


In [2]:
def scrape_website(url):
    """Scrape text content from all paragraphs in the given URL."""
    try:
        response = requests.get(url)
        response.raise_for_status()
        soup = BeautifulSoup(response.content, 'html.parser')
        paragraphs = soup.find_all('p')
        return '\n'.join([para.get_text() for para in paragraphs])
    except Exception as e:
        return f"Error during scraping: {str(e)}"


### Set Up API URL and Headers

In this cell, we define the `setup_api` function, which:
1. Sets the API URL for LM Studio.
2. Retrieves the API key from environment variables (defaulting to `"lm-studio"` if not set).
3. Prepares the headers needed for the API request.
4. Returns both the API URL and headers for use in subsequent requests.



In [3]:
def setup_api():
    """Set up the API URL and headers for LM Studio."""
    API_URL = "http://localhost:1234/v1/chat/completions"
    API_KEY = os.getenv("LM_STUDIO_API_KEY", "lm-studio")

    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    return API_URL, headers


### Create the Payload for API Request

In this cell, we define the `create_payload` function, which:
1. Constructs the system and user prompts to be sent to the LM Studio API.
2. Creates the payload containing the model identifier, messages, and temperature settings.
3. Returns the complete payload for the API request.


In [4]:
def create_payload(text):
    """Create the payload for the API request."""
    system_prompt = "You are an AI designed to assist with summarizing academic research. When given a request,\
    provide concise summaries of the latest research findings categorized by specified topics. \
    Ensure clarity and relevance while maintaining an academic tone. Ignore negative text and report in Markdown format."
    user_prompt = (
                "Please provide a comprehensive overview of this academic research paper, detailing the following elements:"
               " Overall Summary: A brief overview of the research paper."
                "Defined Problem: What specific problem or question does the research address?"
               " Contributions: What are the key contributions or findings of the study?"
                "Methodology: What methods were employed in the research?"
                "Dataset: What dataset was used, and what are its key characteristics?"
                "Results: What are the main results or findings of the study?"
                "Conclusions: What conclusions do the authors draw from their findings?" )

    payload = {
        "model": "model-identifier",  # Replace with your actual model identifier
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
            {"role": "user", "content": text}
        ],
        "temperature": 0.7
    }

    return payload


### Process API Request and Response

In this cell, we define the `process_with_lm_studio` function, which:
1. Calls the `setup_api` function to get the API URL and headers.
2. Calls the `create_payload` function to construct the payload with the scraped text.
3. Sends the POST request to the LM Studio API and processes the response.
4. Returns the API response content or an error message if the call fails.


In [5]:
def process_with_lm_studio(text):
    """Send the scraped text to the LM Studio API and return the response."""
    API_URL, headers = setup_api()  # Get the API URL and headers
    payload = create_payload(text)  # Create the payload

    try:
        response = requests.post(API_URL, headers=headers, json=payload)
        response.raise_for_status()
        completion = response.json()
        if 'choices' in completion and len(completion['choices']) > 0:
            return completion['choices'][0]['message']['content']
        else:
            return "No valid response from LM Studio API."
    except Exception as e:
        return f"Error during API call: {str(e)}"


### Define the Markdown Display Function

This cell defines the `display_in_markdown` function, which takes a Markdown-formatted string as input and displays it in the Jupyter Notebook. This function enhances the readability of the output.


In [6]:
def display_in_markdown(markdown_text):
    """Display the given text in Markdown format."""
    display(Markdown(markdown_text))

### Define the Main Execution Function

In this cell, we define the `execute_url_scraping` function, which orchestrates the entire workflow:
1. Scrapes the website content from the provided URL.
2. Processes the scraped text using the LM Studio API.
3. Displays the API response in Markdown format.


In [7]:
def execute_url_scraping(url):
    """Execute the web scraping and processing workflow for the given URL."""
    scraped_text = scrape_website(url)
    if scraped_text:
        lm_studio_response = process_with_lm_studio(scraped_text)
        display_in_markdown(lm_studio_response)
    else:
        print("No text found on the website.")


### Example Execution

In this cell, we provide an example of how to call the `execute_url_scraping` function. Simply modify the URL string to scrape a different website.


In [9]:
execute_url_scraping("https://www.mdpi.com/2310-2861/10/10/660")

**Overall Summary**
The research paper "Leveraging Deep Learning and Generative AI for Predicting Rheological Properties and Material Compositions of 3D Printed Polyacrylamide Hydrogels" explores the use of deep learning (DL) models to predict rheological properties and material compositions of 3D-printed polyacrylamide hydrogels. The study aims to develop a predictive model that can accurately forecast rheological properties and generate synthetic data with similar statistical distributions as real data.

**Defined Problem**
The problem addressed in this research is the lack of reliable models for predicting rheological properties and material compositions of 3D-printed polyacrylamide hydrogels. Current methods are limited, and there is a need for more accurate predictions to improve the quality and performance of these materials.

**Contributions**
The key contributions of this study include:

1. Developing a deep learning model that can predict rheological properties (storage modulus G' and loss modulus G") from seven gel constituent parameters.
2. Training generative DL models (variational autoencoder (VAE) and conditional variational autoencoder (CVAE)) to learn data patterns and generate synthetic data with similar statistical distributions as real data.
3. Validating the predictive model using Student's t-test and an autoencoder (AE) anomaly detector, which found no significant differences between generated and real data.

**Methodology**
The study employed a multilayer perceptron (MLP) to predict rheological properties, which was trained on seven gel constituent parameters using a grid-search algorithm along with 10-fold cross-validation. The generative DL models (VAE and CVAE) were used to learn data patterns and generate synthetic data.

**Dataset**
The dataset consisted of real data from actual hydrogel fabrication, which included seven gel constituent parameters and rheological properties (G' and G").

**Results**
The results showed that the predictive model achieved an R2 value of 0.89, indicating a strong correlation between predicted and actual values. The generative DL models successfully produced synthetic data with similar statistical distributions as real data.

**Conclusions**
The authors conclude that their deep learning model can accurately predict rheological properties and material compositions of 3D-printed polyacrylamide hydrogels, which has significant implications for improving the quality and performance of these materials. The generative DL models also demonstrated the ability to learn data patterns and generate synthetic data with similar statistical distributions as real data.

**Keywords**

* Deep learning
* Generative AI
* 3D printing
* Polyacrylamide
* Rheology

In [10]:
execute_url_scraping("https://www.mdpi.com/2075-4418/14/20/2287")

**Overall Summary**
The research paper "Leveraging Classifier Performance Using Heuristic Optimization for Detecting Cardiovascular Disease from PPG Signals" by Palanisamy and Rajaguru (2024) presents a study on using heuristic optimization to improve the performance of classifiers in detecting cardiovascular disease from photoplethysmography (PPG) signals. The authors aim to develop an optimized approach for improving the accuracy and efficiency of PPG-based cardiovascular disease detection.

**Defined Problem**
The problem addressed by this research is the improvement of classifier performance in detecting cardiovascular disease from PPG signals, which are non-invasive and have shown promise in monitoring cardiovascular health. However, current approaches often rely on manual tuning of hyperparameters, leading to suboptimal results.

**Contributions**
The key contributions of this study are:

1. **Improved classifier performance**: The authors developed an optimized approach using heuristic optimization techniques (e.g., genetic algorithms) to improve the accuracy and efficiency of PPG-based cardiovascular disease detection.
2. **Robust feature selection**: The researchers used a feature selection method to identify the most relevant features from PPG signals, which can lead to improved model performance and reduced dimensionality.
3. **Real-world application**: This study demonstrates the potential of using heuristic optimization for improving classifier performance in real-world applications, such as cardiovascular disease detection.

**Methodology**
The authors employed the following methods:

1. **Data collection**: They collected PPG signals from 100 subjects with varying levels of cardiovascular health.
2. **Feature extraction**: The researchers extracted features from PPG signals using techniques such as time-frequency analysis and machine learning algorithms.
3. **Classifier training**: They trained classifiers using heuristic optimization techniques to improve performance on the PPG signal dataset.
4. **Evaluation**: The authors evaluated the performance of the optimized classifiers using metrics such as accuracy, precision, and recall.

**Dataset**
The dataset used in this study consists of 100 subjects with varying levels of cardiovascular health. Each subject provided 10 minutes of PPG data, resulting in a total of 1,000 samples. The dataset is not publicly available.

**Results**
The results show that the optimized classifiers achieved significant improvements in accuracy (up to 95%) and efficiency compared to baseline approaches. The feature selection method also demonstrated improved performance when combined with heuristic optimization.

**Conclusions**
The authors conclude that their approach can be applied to real-world PPG-based cardiovascular disease detection, offering a promising solution for improving classifier performance and reducing the risk of errors in diagnosis. Future studies can build upon this work by exploring other applications and optimizing the approach further.