Webpage Downloader

This is a server endpoint that takes the URL of a webpage as input, fetches the webpage, and downloads it as a file on the local file system.

Features

Retry limit: The server accepts a retry limit as a parameter. It retries a maximum of 10 times or the retry limit, whichever is lower, before either successfully downloading the webpage or marking the page as a failure. This helps to ensure that the server can handle intermittent connectivity issues or temporary server errors.
Cache: If the webpage has already been requested in the last 24 hours, it is served from the local cache. This helps to reduce the load on the server and improve the response time for repeated requests. The server also has a go routine that runs in the background and clears the cache of any items that are older than 24 hours. This helps to ensure that the cache does not grow too large and consumes unnecessary memory.
Worker pool: The server has a pool of workers that do the work of downloading the requested webpage. This allows the server to handle a large number of concurrent requests while still limiting the number of actual requests to download the webpages. This helps to prevent the server from being overloaded and ensures that it remains responsive.

Usage

To start the server, run the following command:

go run app.go

To send a request to the server, use a tool such as curl to send a POST request to the /pagesource endpoint with a JSON body containing the URL of the webpage you want to download.

Here's an example of how you can send a request using curl:

curl --location --request POST 'http://localhost:7771/pagesource' \
--header 'Content-Type: application/json' \
--data-raw '{
   "url": "https://google.com",
   "retryLimit": 3
}'

The server will respond with a JSON object containing the unique ID of the request, the URL of the requested webpage, and the source URL of the downloaded webpage.

Here's an example of the server's response:

{
"id": "c33357d82b6105c59e3089f2b70af7f8",
"url": "https://google.com",
"sourceUrl": "/files/c33357d82b6105c59e3089f2b70af7f8.html"
}

The downloaded webpage will be saved to the local file system as a file with the name [ID].html, where [ID] is the unique ID of the request. The file will be saved in the files directory.

Configuration

The server can be configured using the following environment variables:

PORT: the port number on which the server will listen for requests (default: 7771)

Dependencies

The server has the following dependencies:

Go

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
README.md		README.md
app.go		app.go
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webpage Downloader

Features

Usage

Configuration

Dependencies

About

Releases

Packages

Languages

eshanvaid/webpage-downloader

Folders and files

Latest commit

History

Repository files navigation

Webpage Downloader

Features

Usage

Configuration

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages