Skip to content

Commit 965229f

Browse files
committed
docs: everything of web readme
1 parent 6d2a855 commit 965229f

File tree

11 files changed

+430
-20
lines changed

11 files changed

+430
-20
lines changed

web-README.md

Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
![html2rss logo](https://github.com/html2rss/html2rss/raw/master/support/logo.png)
2+
3+
# html2rss-web
4+
5+
This web application scrapes websites to build and deliver RSS 2.0 feeds.
6+
7+
**Features:**
8+
9+
- Provides stable URLs for feeds generated by automatic sourcing.
10+
- [Create your custom feeds](#how-to-build-your-rss-feeds)!
11+
- Comes with plenty of [included configs](https://github.com/html2rss/html2rss-configs) out of the box.
12+
- Handles request caching.
13+
- Sets caching-related HTTP headers.
14+
15+
The functionality of scraping websites and building the RSS feeds is provided by the Ruby gem [`html2rss`](https://github.com/html2rss/html2rss).
16+
17+
## Get started
18+
19+
This application should be used with Docker. It is designed to require as little maintenance as possible. See [Versioning and Releases](#versioning-and-releases) and [consider automatic updates](#docker-automatically-keep-the-html2rss-web-image-up-to-date).
20+
21+
### With Docker
22+
23+
```sh
24+
docker run -p 3000:3000 gilcreator/html2rss-web
25+
```
26+
27+
Then open <http://127.0.0.1:3000/> in your browser and click the example feed link.
28+
29+
This is the quickest way to get started. However, it's also the option with the least flexibility: it doesn't allow you to use custom feed configs and doesn't update automatically.
30+
31+
If you want more flexibility and automatic updates sound good to you, read on to get started _with docker compose_
32+
33+
### With `docker compose`
34+
35+
Create a `docker-compose.yml` file and paste the following into it:
36+
37+
```yaml
38+
services:
39+
html2rss-web:
40+
image: gilcreator/html2rss-web
41+
ports:
42+
- "3000:3000"
43+
volumes:
44+
- type: bind
45+
source: ./feeds.yml
46+
target: /app/config/feeds.yml
47+
read_only: true
48+
environment:
49+
RACK_ENV: production
50+
HEALTH_CHECK_USERNAME: health
51+
HEALTH_CHECK_PASSWORD: please-set-YOUR-OWN-veeeeeery-l0ng-aNd-h4rd-to-gue55-Passw0rd!
52+
# AUTO_SOURCE_ENABLED: 'true'
53+
# AUTO_SOURCE_USERNAME: foobar
54+
# AUTO_SOURCE_PASSWORD: A-Unique-And-Long-Password-For-Your-Own-Instance
55+
## to allow just requests originating from the local host
56+
# AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:3000
57+
## to allow multiple origins, seperate those via comma:
58+
# AUTO_SOURCE_ALLOWED_ORIGINS: example.com,h2r.host.tld
59+
BROWSERLESS_IO_WEBSOCKET_URL: ws://browserless:3001
60+
BROWSERLESS_IO_API_TOKEN: 6R0W53R135510
61+
62+
watchtower:
63+
image: containrrr/watchtower
64+
volumes:
65+
- /var/run/docker.sock:/var/run/docker.sock
66+
- "~/.docker/config.json:/config.json"
67+
command: --cleanup --interval 7200
68+
69+
browserless:
70+
image: "ghcr.io/browserless/chromium"
71+
ports:
72+
- "3001:3001"
73+
environment:
74+
PORT: 3001
75+
CONCURRENT: 10
76+
TOKEN: 6R0W53R135510
77+
```
78+
79+
Start it up with: `docker compose up`.
80+
81+
If you have not created your `feeds.yml` yet, download [this `feeds.yml` as a blueprint](https://raw.githubusercontent.com/html2rss/html2rss-web/master/config/feeds.yml) into the directory containing the `docker-compose.yml`.
82+
83+
## Docker: Automatically keep the html2rss-web image up-to-date
84+
85+
The [watchtower](https://containrrr.dev/watchtower/) service automatically pulls running Docker images and checks for updates. If an update is available, it will automatically start the updated image with the same configuration as the running one. Please read its manual.
86+
87+
The `docker-compose.yml` above contains a service description for watchtower.
88+
89+
## How to use automatic feed generation
90+
91+
> [!NOTE]
92+
> This feature is disabled by default.
93+
94+
To enable the `auto_source` feature, comment in the env variables in the `docker-compose.yml` file from above and change the values accordingly:
95+
96+
```yaml
97+
environment:
98+
## … snip ✁
99+
AUTO_SOURCE_ENABLED: "true"
100+
AUTO_SOURCE_USERNAME: foobar
101+
AUTO_SOURCE_PASSWORD: A-Unique-And-Long-Password-For-Your-Own-Instance
102+
## to allow just requests originating from the local host
103+
AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:3000
104+
## to allow multiple origins, seperate those via comma:
105+
# AUTO_SOURCE_ALLOWED_ORIGINS: example.com,h2r.host.tld
106+
## … snap ✃
107+
```
108+
109+
Restart the container and open <http://127.0.0.1:3000/auto_source/>.
110+
When asked, enter your username and password.
111+
112+
Then enter the URL of a website and click on the _Generate_ button.
113+
114+
## How to use the included configs
115+
116+
html2rss-web comes with many feed configs out of the box. [See the file list of all configs.](https://github.com/html2rss/html2rss-configs/tree/master/lib/html2rss/configs)
117+
118+
To use a config from there, build the URL like this:
119+
120+
| | |
121+
| ------------------------ | ----------------------------- |
122+
| `lib/html2rss/configs/` | `domainname.tld/whatever.yml` |
123+
| Would become this URL: | |
124+
| `http://localhost:3000/` | `domainname.tld/whatever.rss` |
125+
| | `^^^^^^^^^^^^^^^^^^^^^^^^^^^` |
126+
127+
## How to build your RSS feeds
128+
129+
To build your own RSS feed, you need to create a _feed config_.\
130+
That _feed config_ goes into the file `feeds.yml`.\
131+
Check out the [`example` feed config](https://github.com/html2rss/html2rss-web/blob/master/config/feeds.yml#L9).
132+
133+
Please refer to [html2rss' README for a description of _the feed config and its options_](https://github.com/html2rss/html2rss#the-feed-config-and-its-options). html2rss-web is just a small web application that builds on html2rss.
134+
135+
## Versioning and releases
136+
137+
This web application is distributed in a [rolling release](https://en.wikipedia.org/wiki/Rolling_release) fashion from the `master` branch.
138+
139+
For the latest commit passing GitHub CI/CD on the master branch, an updated Docker image will be pushed to [Docker Hub: `gilcreator/html2rss-web`](https://hub.docker.com/r/gilcreator/html2rss-web).
140+
The [SBOM](https://en.wikipedia.org/wiki/Software_supply_chain) is embedded in the Docker image.
141+
142+
GitHub's @dependabot is enabled for dependency updates and they are automatically merged to the `master` branch when the CI gives the green light.
143+
144+
If you use Docker, you should update to the latest image automatically by [setting up _watchtower_ as described](#get-started).
145+
146+
## Use in production
147+
148+
This app is published on Docker Hub and therefore easy to use with Docker.\
149+
The above `docker-compose.yml` is a good starting point.
150+
151+
If you're going to host a public instance, _please, please, please_:
152+
153+
- Put the application behind a reverse proxy.
154+
- Allow outside connections only via HTTPS.
155+
- Have an auto-update strategy (e.g., watchtower).
156+
- Monitor your `/health_check.txt` endpoint.
157+
- [Let the world know and add your instance to the wiki](https://github.com/html2rss/html2rss-web/wiki/Instances) -- thank you!
158+
159+
### Supported ENV variables
160+
161+
| Name | Description |
162+
| ------------------------------ | ---------------------------------- |
163+
| `BASE_URL` | default: '<http://localhost:3000>' |
164+
| `LOG_LEVEL` | default: 'warn' |
165+
| `HEALTH_CHECK_USERNAME` | default: auto-generated on start |
166+
| `HEALTH_CHECK_PASSWORD` | default: auto-generated on start |
167+
| | |
168+
| `AUTO_SOURCE_ENABLED` | default: false |
169+
| `AUTO_SOURCE_USERNAME` | no default. |
170+
| `AUTO_SOURCE_PASSWORD` | no default. |
171+
| `AUTO_SOURCE_ALLOWED_ORIGINS` | no default. |
172+
| | |
173+
| `PORT` | default: 3000 |
174+
| `RACK_ENV` | default: 'development' |
175+
| `RACK_TIMEOUT_SERVICE_TIMEOUT` | default: 15 |
176+
| `WEB_CONCURRENCY` | default: 2 |
177+
| `WEB_MAX_THREADS` | default: 5 |
178+
| | |
179+
| `SENTRY_DSN` | no default. |
180+
181+
### Runtime monitoring via `GET /health_check.txt`
182+
183+
It is recommended to set up monitoring of the `/health_check.txt` endpoint. With that, you can find out when one of _your own_ configs breaks. The endpoint uses HTTP Basic authentication.
184+
185+
First, set the username and password via these environment variables: `HEALTH_CHECK_USERNAME` and `HEALTH_CHECK_PASSWORD`. If these are not set, html2rss-web will generate a new random username and password on _each_ start.
186+
187+
An authenticated `GET /health_check.txt` request will respond with:
188+
189+
- If the feeds are generatable: `success`.
190+
- Otherwise: the names of the broken configs.
191+
192+
To get notified when one of your configs breaks, set up monitoring of this endpoint.
193+
194+
[UptimeRobot's free plan](https://uptimerobot.com/) is sufficient for basic monitoring (every 5 minutes).\
195+
Create a monitor of type _Keyword_ with this information and make it aware of your username and password:
196+
197+
![A screenshot showing the Keyword Monitor: a name, the instance's URL to /health_check.txt, and an interval.](docs/uptimerobot_monitor.jpg)
198+
199+
### Application Performance Monitoring using Sentry
200+
201+
When you specify `SENTRY_DSN` in your environment variables, the application will be setup to use Sentry.
202+
203+
## Setup for development
204+
205+
Check out the git repository and…
206+
207+
### Using Docker
208+
209+
This approach allows you to experiment without installing Ruby on your machine.
210+
All you need to do is install and run Docker.
211+
212+
```sh
213+
# Build image from Dockerfile and name/tag it as html2rss-web:
214+
docker build -t html2rss-web -f Dockerfile .
215+
216+
# Run the image and name it html2rss-web-dev:
217+
docker run \
218+
--detach \
219+
--mount type=bind,source=$(pwd)/config,target=/app/config \
220+
--name html2rss-web-dev \
221+
html2rss-web
222+
223+
# Open an interactive TTY with the shell `sh`:
224+
docker exec -ti html2rss-web-dev sh
225+
226+
# Stop and clean up the container
227+
docker stop html2rss-web-dev
228+
docker rm html2rss-web-dev
229+
230+
# Remove the image
231+
docker rmi html2rss-web
232+
```
233+
234+
### Using installed Ruby
235+
236+
If you're comfortable with installing Ruby directly on your machine, follow these instructions:
237+
238+
1. Install Ruby `>= 3.2`
239+
2. `gem install bundler foreman`
240+
3. `bundle`
241+
4. `foreman start`
242+
243+
_html2rss-web_ now listens on port **3000** for requests.
244+
245+
## Contribute
246+
247+
Contributions are welcome!
248+
249+
Open a pull request with your changes,\
250+
open an issue, or\
251+
[join discussions on html2rss](https://github.com/orgs/html2rss/discussions).
Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,20 @@
11
---
22
layout: default
3-
title: Deploying to Production
4-
nav_order: 2
3+
title: Deployment
4+
nav_order: 4
55
parent: How-To Guides
66
grand_parent: Web Application
77
---
88

9-
# Deploying to Production
9+
# Deployment
1010

11-
If you're looking to host a personal instance of `html2rss-web`, the [Installation Guide]({{ '/web-application/installation' | relative_url }}) provides a detailed, step-by-step walkthrough.
11+
This app is published on Docker Hub and therefore easy to use with Docker.
12+
The `docker-compose.yml` in the [Installation Guide]({{ '/web-application/installation' | relative_url }}) is a good starting point.
1213

13-
For those who wish to host a public-facing instance, this page serves as a high-level checklist of best practices to ensure a secure and stable deployment.
14+
If you're going to host a public instance, _please, please, please_:
1415

15-
## Production Deployment Checklist
16-
17-
- **Reverse Proxy**: Always put the application behind a reverse proxy. This is crucial for security and for managing incoming traffic.
18-
- **HTTPS**: Enforce HTTPS for all outside connections to encrypt traffic and protect user data.
19-
- **Auto-Updates**: Implement an auto-update strategy (e.g., using [Watchtower](https://containrrr.dev/watchtower/)) to ensure you're always running the latest, most secure version of the application.
20-
- **Monitoring**: Regularly monitor the `/health_check.txt` endpoint to ensure your instance is running correctly.
21-
- **Share Your Instance**: If you've set up a public instance, please consider [adding it to our wiki](https://github.com/html2rss/html2rss-web/wiki/Instances). This helps others discover and use the service. Thank you!
22-
23-
## Versioning and Releases
24-
25-
For information on the web application's versioning and release strategy, please refer to the [main Web Application overview]({{ '/web-application' | relative_url }}).
16+
- Put the application behind a reverse proxy.
17+
- Allow outside connections only via HTTPS.
18+
- Have an auto-update strategy (e.g., watchtower).
19+
- Monitor your `/health_check.txt` endpoint.
20+
- [Let the world know and add your instance to the wiki](https://github.com/html2rss/html2rss-web/wiki/Instances) -- thank you!

web-application/how-to/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ has_children: true
88

99
# How-To Guides
1010

11-
This section contains detailed, task-based guides for users who want to go beyond the basics.
11+
This section provides guides on how to perform specific tasks with html2rss-web.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
layout: default
3+
title: Setup for development
4+
nav_order: 5
5+
parent: How-To Guides
6+
grand_parent: Web Application
7+
---
8+
9+
# Setup for development
10+
11+
Check out the git repository and…
12+
13+
### Using Docker
14+
15+
This approach allows you to experiment without installing Ruby on your machine.
16+
All you need to do is install and run Docker.
17+
18+
```sh
19+
# Build image from Dockerfile and name/tag it as html2rss-web:
20+
docker build -t html2rss-web -f Dockerfile .
21+
22+
# Run the image and name it html2rss-web-dev:
23+
docker run \
24+
--detach \
25+
--mount type=bind,source=$(pwd)/config,target=/app/config \
26+
--name html2rss-web-dev \
27+
html2rss-web
28+
29+
# Open an interactive TTY with the shell `sh`:
30+
docker exec -ti html2rss-web-dev sh
31+
32+
# Stop and clean up the container
33+
docker stop html2rss-web-dev
34+
docker rm html2rss-web-dev
35+
36+
# Remove the image
37+
docker rmi html2rss-web
38+
```
39+
40+
### Using installed Ruby
41+
42+
If you're comfortable with installing Ruby directly on your machine, follow these instructions:
43+
44+
1. Install Ruby `>= 3.2`
45+
2. `gem install bundler foreman`
46+
3. `bundle`
47+
4. `foreman start`
48+
49+
_html2rss-web_ now listens on port **3000** for requests.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
layout: default
3+
title: Use automatic feed generation
4+
nav_order: 2
5+
parent: How-To Guides
6+
grand_parent: Web Application
7+
---
8+
9+
# How to use automatic feed generation
10+
11+
> This feature is disabled by default.
12+
13+
To enable the `auto_source` feature, comment in the env variables in the `docker-compose.yml` file from above and change the values accordingly:
14+
15+
```yaml
16+
environment:
17+
## … snip ✁
18+
AUTO_SOURCE_ENABLED: "true"
19+
AUTO_SOURCE_USERNAME: foobar
20+
AUTO_SOURCE_PASSWORD: A-Unique-And-Long-Password-For-Your-Own-Instance
21+
## to allow just requests originating from the local host
22+
AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:3000
23+
## to allow multiple origins, seperate those via comma:
24+
# AUTO_SOURCE_ALLOWED_ORIGINS: example.com,h2r.host.tld
25+
## … snap ✃
26+
```
27+
28+
Restart the container and open <http://127.0.0.1:3000/auto_source/>.
29+
When asked, enter your username and password.
30+
31+
Then enter the URL of a website and click on the _Generate_ button.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
layout: default
3+
title: Use in production
4+
nav_order: 4
5+
parent: How-To Guides
6+
grand_parent: Web Application
7+
---
8+
9+
# Use in production
10+
11+
This app is published on Docker Hub and therefore easy to use with Docker.
12+
The above `docker-compose.yml` is a good starting point.
13+
14+
If you're going to host a public instance, _please, please, please_:
15+
16+
- Put the application behind a reverse proxy.
17+
- Allow outside connections only via HTTPS.
18+
- Have an auto-update strategy (e.g., watchtower).
19+
- Monitor your `/health_check.txt` endpoint.
20+
- [Let the world know and add your instance to the wiki](https://github.com/html2rss/html2rss-web/wiki/Instances) -- thank you!

0 commit comments

Comments
 (0)