Skip to content

litongjava/playwright-go-server

Repository files navigation

playwright-go-server

Background

Often, search engines only return a webpage's URL along with some snippets. However, sometimes it is necessary to retrieve the complete webpage content. To address this, the playwright-go-server project was developed. It leverages browser automation technology to fetch the full HTML content of a webpage and supports converting it to Markdown format, which is more convenient for subsequent processing by large language models.

Features

  • Webpage Content Fetching: Uses a browser pool (based on Playwright) to fetch the full HTML content of a given URL.
  • Markdown Conversion: Converts the fetched HTML content into Markdown format for easier text processing and inference by large models.
  • Efficient and Stable: Implements lazy initialization of a global session pool to reuse browser instances, ensuring fast and efficient response.

Installation & Dependencies

  1. Clone the repository:

    git clone https://github.com/litongjava/playwright-go-server.git
    cd playwright-go-server
  2. Install Go dependencies:

    go mod tidy
  3. Install the HTML-to-Markdown conversion library:

    go build

docker

docker build -t litongjava/playwright-go-server:1.0.0 .
docker run -dit --name playwright-go-server --net=host litongjava/playwright-go-server:1.0.0

Usage

The project provides an HTTP service with an endpoint to fetch webpage content and convert it based on the provided format.

  • Endpoint: /fetch
  • Query Parameters:
    • url: The URL of the webpage to fetch (required)
    • format: The format of the returned content (optional; when set to markdown, returns content in Markdown format; otherwise returns the raw HTML)

Example

Fetching Markdown formatted content:

GET /fetch?url=https://example.com&format=markdown
curl "http://localhost/fetch?url=https://www.kapiolani.hawaii.edu/&format=markdown"

Running the Server

Start the service using the following command:

go run main.go

Once the server is running, you can make HTTP requests to the endpoint.

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests to improve the project.

License

This project is licensed under the MIT License.

About

fetch the full HTML content of a webpage and supports converting it to Markdown format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published