Skip to content

Commit ae292d0

Browse files
authored
docs: update and provide correct information (#941)
* docs: update and provide correct information * improve copy
1 parent 9d384fe commit ae292d0

File tree

14 files changed

+227
-122
lines changed

14 files changed

+227
-122
lines changed

404.html

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,39 @@
33
layout: default
44
---
55

6-
<h1>404</h1>
6+
<h1>404 - Page Not Found</h1>
77

8-
<p><strong>Page not found :(</strong></p>
9-
<p>The requested page could not be found.</p>
8+
<p><strong>Sorry, the page you're looking for doesn't exist.</strong></p>
9+
10+
<p>Here are some helpful links to get you back on track:</p>
11+
12+
<ul>
13+
<li>
14+
<a href="{{ '/' | relative_url }}">Home</a> - Start from the beginning
15+
</li>
16+
<li>
17+
<a href="{{ '/web-application/getting-started' | relative_url }}"
18+
>Getting Started</a
19+
>
20+
- Learn how to use html2rss
21+
</li>
22+
<li>
23+
<a href="{{ '/ruby-gem' | relative_url }}">Ruby Gem Documentation</a> -
24+
Developer resources
25+
</li>
26+
<li>
27+
<a href="{{ '/feed-directory' | relative_url }}">Feed Directory</a> - Browse
28+
available feeds
29+
</li>
30+
<li>
31+
<a href="{{ '/get-involved' | relative_url }}">Get Involved</a> - Join the
32+
community
33+
</li>
34+
</ul>
35+
36+
<p>
37+
If you think this is an error, please
38+
<a href="https://github.com/html2rss/html2rss.github.io/issues"
39+
>report it on GitHub</a
40+
>.
41+
</p>

about.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,4 @@ For insights into our ongoing development, project roadmap, and how you can get
3131

3232
`html2rss` is maintained by a dedicated group of volunteers and contributors from around the world. We are passionate about open source and committed to continuously improving the project.
3333

34-
Want to join us? Check out our [Contributing Guide]({{ '/contributing' | relative_url }})!
34+
Want to join us? Check out our [Contributing Guide]({{ '/get-involved/contributing' | relative_url }})!

bin/data-update

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,7 @@ def extract_default_parameters(parameters)
1616
return {} unless parameters.is_a?(Hash)
1717

1818
parameters.each_with_object({}) do |(param_name, param_config), defaults|
19-
if param_config.is_a?(Hash) && param_config['default']
20-
defaults[param_name] = param_config['default']
21-
end
19+
defaults[param_name] = param_config['default'] if param_config.is_a?(Hash) && param_config['default']
2220
end
2321
end
2422

get-involved/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ Engage with the `html2rss` project. Contribute and connect with the community.
1414
- [**Project Roadmap**]({{ 'https://github.com/orgs/html2rss/projects/3/views/1' }}): View current work, plans, and priorities.
1515
- [**Report Bugs & Discuss Features**]({{ '/get-involved/issues-and-features' | relative_url }}): Report bugs or propose features.
1616
- [**Join Community Discussions**]({{ '/get-involved/discussions' | relative_url }}): Connect with users and contributors.
17-
- [**Contribute to html2rss**]({{ '/contributing' | relative_url }}): Contribute code, documentation, or feed configurations.
17+
- [**Contribute to html2rss**]({{ '/get-involved/contributing' | relative_url }}): Contribute code, documentation, or feed configurations.

html2rss-configs/index.md

Lines changed: 46 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,29 @@ has_children: false
55
nav_order: 5
66
---
77

8-
# Creating Feed Configurations
8+
# Creating Custom RSS Feeds
99

10-
Welcome to the guide for `html2rss-configs`. This document explains how to create your own configuration files to convert any website into an RSS feed.
10+
Want to create RSS feeds for websites that don't offer them? This guide shows you how to write simple configuration files that tell the html2rss engine exactly what content to extract.
1111

12-
You can find a list of all community-contributed configurations in the [Feed Directory]({{ '/feed-directory/' | relative_url }}).
12+
**Don't worry if you're not technical** - we'll explain everything step by step!
13+
14+
You can see examples of what others have created in the [Feed Directory]({{ '/feed-directory/' | relative_url }}).
1315

1416
---
1517

16-
## Core Concepts
18+
## How It Works
19+
20+
Think of the html2rss engine as a smart assistant that needs instructions. You give it a simple "recipe" (called a config file) that tells it:
21+
22+
1. **Which website** to look at
23+
2. **What content** to find (articles, posts, etc.)
24+
3. **How to organize** that content into an RSS feed
1725

18-
An `html2rss` config is a YAML file that defines how to extract data from a web page. It consists of two main building blocks: `channel` and `selectors`.
26+
The recipe is written in YAML - a simple format that's easy to read and write. Both html2rss-web and the html2rss Ruby gem use these same configuration files.
1927

2028
### The `channel` Block
2129

22-
The `channel` block contains metadata about the RSS feed itself, such as its title and the source URL.
30+
This tells the html2rss engine basic information about your feed - like giving it a name and telling it which website to look at.
2331

2432
**Example:**
2533

@@ -29,11 +37,11 @@ channel:
2937
title: My Awesome Blog
3038
```
3139
32-
For a complete list of all available channel options, please see the [Channel Reference]({{ '/ruby-gem/reference/channel/' | relative_url }}).
40+
This says: "Look at this website and call the feed 'My Awesome Blog'"
3341
3442
### The `selectors` Block
3543

36-
The `selectors` block is the core of the configuration, defining the rules for extracting content. It always contains an `items` selector to identify the list of articles and individual selectors for the data points within each item (e.g., `title`, `link`).
44+
This is where you tell the html2rss engine exactly what to find on the page. You use CSS selectors (like you might use in web design) to point to specific parts of the webpage.
3745

3846
**Example:**
3947

@@ -47,17 +55,19 @@ selectors:
4755
selector: "h2 a"
4856
```
4957

50-
For a comprehensive guide on all available selectors, extractors, and post-processors, please see the [Selectors Reference]({{ '/ruby-gem/reference/selectors/' | relative_url }}).
58+
This says: "Find each article, get the title from the h2 link, and get the link from the same h2 link"
59+
60+
**Need more details?** Check our [complete guide to selectors]({{ '/ruby-gem/reference/selectors/' | relative_url }}) for all the options.
5161

5262
---
5363

54-
## Tutorial: Your First Config
64+
## Tutorial: Your First Feed
5565

56-
This tutorial walks you through creating a basic configuration file from scratch.
66+
Let's create a simple RSS feed step by step. We'll use a basic blog as our example.
5767

58-
### Step 1: Identify the Target Content
68+
### Step 1: Look at the Website
5969

60-
First, identify the HTML structure of the website you want to create a feed for. For this example, we'll use a simple blog structure:
70+
First, visit the website you want to create a feed for. Right-click and "View Page Source" to see the HTML structure. Look for patterns like this:
6171

6272
```html
6373
<div class="posts">
@@ -72,9 +82,11 @@ First, identify the HTML structure of the website you want to create a feed for.
7282
</div>
7383
```
7484

75-
### Step 2: Create the Config File and Define the Channel
85+
**What we see:** Each article is wrapped in `<article class="post">`, titles are in `<h2><a>` tags, and descriptions are in `<p>` tags.
86+
87+
### Step 2: Create Your Config File
7688

77-
Create a new YAML file (e.g., `my-blog.yml`) and define the `channel`:
89+
Create a new text file and save it as `my-blog.yml` (or any name you like). Add this basic information:
7890

7991
```yaml
8092
# my-blog.yml
@@ -84,9 +96,11 @@ channel:
8496
description: The latest news from my awesome blog.
8597
```
8698

87-
### Step 3: Define the Selectors
99+
This tells html2rss: "Look at this website and call the feed 'My Awesome Blog'"
88100

89-
Next, add the `selectors` block to extract the content for each post.
101+
### Step 3: Tell html2rss What to Find
102+
103+
Now add the selectors that tell html2rss exactly what content to extract:
90104

91105
```yaml
92106
# my-blog.yml
@@ -101,26 +115,17 @@ selectors:
101115
selector: "p"
102116
```
103117

104-
- `items`: This CSS selector identifies the container for each article.
105-
- `title`, `link`, `description`: These selectors target the specific data points within each item. For a `link` selector, `html2rss` defaults to extracting the `href` attribute from the matched `<a>` tag.
118+
**What this means:**
119+
120+
- `items: "article.post"` = "Find each article with class 'post'"
121+
- `title: "h2 a"` = "Get the title from the h2 link"
122+
- `link: "h2 a"` = "Get the link from the same h2 link"
123+
- `description: "p"` = "Get the description from the paragraph"
106124

107125
---
108126

109127
## Advanced Techniques
110128

111-
### Handling Pagination
112-
113-
To aggregate content from multiple pages, use the `pagination` option within the `items` selector.
114-
115-
```yaml
116-
selectors:
117-
items:
118-
selector: ".post-listing .post"
119-
pagination:
120-
selector: ".pagination .next-page"
121-
limit: 5 # Optional: sets the maximum number of pages to follow
122-
```
123-
124129
### Dynamic Feeds with Parameters
125130

126131
Use the `parameters` block to create flexible configs. This is useful for feeds based on search terms, categories, or regions.
@@ -135,6 +140,15 @@ parameters:
135140
channel:
136141
url: "https://news.example.com/search?q={query}"
137142
title: "News results for '{query}'"
143+
144+
selectors:
145+
items:
146+
selector: ".article"
147+
title:
148+
selector: "h2 a"
149+
url:
150+
selector: "h2 a"
151+
extractor: "href"
138152
```
139153

140154
---

index.md

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,30 +4,35 @@ title: Home
44
nav_order: 1
55
---
66

7-
# Create RSS Feeds for Any Website
7+
# Turn Any Website Into an RSS Feed
88

9-
`html2rss` creates RSS feeds for any website.
10-
[**🚀 Get Started with the Web App**]({{ '/web-application/getting-started' | relative_url }})
9+
Ever wished you could follow your favorite websites like a social media feed? The html2rss project makes it possible by creating RSS feeds for any website - even ones that don't offer them.
10+
11+
[**🚀 Get Started with html2rss-web**]({{ '/web-application/getting-started' | relative_url }})
1112

1213
---
1314

14-
## Key Features
15+
## What is RSS?
1516

16-
- **Automatic Feed Generation:** `auto_source` intelligently extracts content, simplifying feed creation.
17-
- **Precise Content Extraction:** Use CSS selectors for targeted content inclusion.
18-
- **JavaScript Rendering:** A headless browser renders JavaScript-heavy sites for comprehensive content extraction.
19-
- **Open Source:** `html2rss` is free to use, modify, and contribute.
17+
RSS (Really Simple Syndication) lets you follow websites in your favorite feed reader. Instead of checking multiple websites daily, you get all updates in one place - like a personalized news feed.
2018

21-
---
19+
## The html2rss Project
20+
21+
The html2rss project provides two main ways to create RSS feeds:
2222

23-
## The html2rss Ecosystem
23+
- **html2rss-web** - A user-friendly web application (recommended for most users)
24+
- **html2rss** - A Ruby gem for developers and advanced users
25+
26+
Both use the same powerful engine to extract content from websites and convert it into RSS feeds.
27+
28+
---
2429

25-
The `html2rss` project offers a complete RSS solution through a collection of integrated tools:
30+
## Choose Your Path
2631

27-
- **[html2rss-web]({{ '/web-application' | relative_url }}):** User-friendly web application to create, manage, and share RSS feeds. Recommended starting point.
28-
- **[html2rss (Ruby Gem)]({{ '/ruby-gem' | relative_url }}):** Core library and command-line interface for developers.
29-
- **[Feed Directory]({{ '/feed-directory' | relative_url }}):** Public listing of community-driven RSS feed configurations.
32+
- **[html2rss-web]({{ '/web-application' | relative_url }}):** **Start here!** Easy-to-use web application. No technical knowledge required.
33+
- **[Feed Directory]({{ '/feed-directory' | relative_url }}):** Browse ready-made feeds for popular websites
34+
- **[html2rss (Ruby Gem)]({{ '/ruby-gem' | relative_url }}):** For developers who want to create custom configurations
3035

3136
---
3237

33-
Engage with the `html2rss` community or contribute. Visit our [Get Involved]({{ '/get-involved' | relative_url }}) page.
38+
**Ready to get started?** Check out our [html2rss-web getting started guide]({{ '/web-application/getting-started' | relative_url }}) or [browse existing feeds]({{ '/feed-directory' | relative_url }}) to see what's possible.

ruby-gem/how-to/scraping-json.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,8 @@ headers:
8989
channel:
9090
url: "http://domainname.tld/whatever.json"
9191
selectors:
92+
items:
93+
selector: "array > object"
9294
title:
9395
selector: "foo"
9496
```

ruby-gem/installation.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ This guide will walk you through the process of installing html2rss on your syst
1313

1414
### Prerequisites
1515

16-
- **Ruby:** html2rss is built with Ruby. Ensure you have Ruby installed (version 3.3 or higher recommended). You can check your Ruby version by running `ruby -v` in your terminal. If you don't have Ruby, visit [ruby-lang.org](https://www.ruby-lang.org/en/documentation/installation/) for installation instructions.
16+
- **Ruby:** html2rss is built with Ruby. Ensure you have Ruby installed (version 3.2 or higher required). You can check your Ruby version by running `ruby -v` in your terminal. If you don't have Ruby, visit [ruby-lang.org](https://www.ruby-lang.org/en/documentation/installation/) for installation instructions.
1717
- **Bundler (Recommended):** Bundler is a Ruby gem that manages your application's dependencies. It's highly recommended for a smooth installation. Install it with `gem install bundler`.
1818

1919
---
@@ -43,15 +43,13 @@ Then, run `bundle install` in your project directory.
4343

4444
---
4545

46-
### Method 3: Docker (For Containerized Environments)
46+
### Method 3: GitHub Codespaces (For Cloud Development)
4747

48-
For a more isolated and reproducible environment, you can use the official html2rss Docker image.
48+
For a quick start without local setup, you can develop html2rss directly in your browser using GitHub Codespaces:
4949

50-
```bash
51-
docker pull html2rss/html2rss
52-
```
50+
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?repo=html2rss/html2rss)
5351

54-
You can then run html2rss commands within a Docker container. Refer to the [Docker Hub page](https://hub.docker.com/r/html2rss/html2rss) for detailed usage.
52+
The Codespace comes pre-configured with Ruby 3.4, all dependencies, and VS Code extensions ready to go!
5553

5654
---
5755

ruby-gem/reference/selectors.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ selectors:
4242
While you can define any named selector, only the following are used in the final RSS feed:
4343

4444
| RSS 2.0 Tag | `html2rss` Name |
45-
| ------------- | --------------- |
45+
| ------------- | --------------- | ------------------------------ |
4646
| `title` | `title` |
4747
| `description` | `description` |
4848
| `link` | `url` |
@@ -51,17 +51,19 @@ While you can define any named selector, only the following are used in the fina
5151
| `guid` | `guid` |
5252
| `enclosure` | `enclosure` |
5353
| `pubDate` | `published_at` |
54-
| `comments` | `comments` |
54+
| `comments` | `comments` | ⚠️ _Not currently implemented_ |
5555

5656
## Selector Options
5757

5858
Each selector can be configured with the following options:
5959

60-
| Name | Description |
61-
| -------------- | ------------------------------------------------ |
62-
| `selector` | The CSS selector for the target element. |
63-
| `extractor` | The extractor to use for this selector. |
64-
| `post_process` | A list of post-processors to apply to the value. |
60+
| Name | Description |
61+
| -------------- | -------------------------------------------------------- |
62+
| `selector` | The CSS selector for the target element. |
63+
| `extractor` | The extractor to use for this selector. |
64+
| `attribute` | The attribute name (required for `attribute` extractor). |
65+
| `static` | The static value (required for `static` extractor). |
66+
| `post_process` | A list of post-processors to apply to the value. |
6567

6668
### Extractors
6769

@@ -126,6 +128,10 @@ To add an enclosure (e.g., an image, audio, or video file) to an item, use the `
126128

127129
```yml
128130
selectors:
131+
items:
132+
selector: ".post"
133+
title:
134+
selector: "h2"
129135
enclosure:
130136
selector: "audio"
131137
extractor: "attribute"

ruby-gem/reference/stylesheets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ You can add multiple stylesheets to your configuration:
1717

1818
```yaml
1919
stylesheets:
20-
- href: "/path/to/style.xls"
20+
- href: "/path/to/style.xsl"
2121
media: "all"
2222
type: "text/xsl"
2323
- href: "https://example.com/rss.css"

0 commit comments

Comments
 (0)