Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 35 additions & 3 deletions 404.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,39 @@
layout: default
---

<h1>404</h1>
<h1>404 - Page Not Found</h1>

<p><strong>Page not found :(</strong></p>
<p>The requested page could not be found.</p>
<p><strong>Sorry, the page you're looking for doesn't exist.</strong></p>

<p>Here are some helpful links to get you back on track:</p>

<ul>
<li>
<a href="{{ '/' | relative_url }}">Home</a> - Start from the beginning
</li>
<li>
<a href="{{ '/web-application/getting-started' | relative_url }}"
>Getting Started</a
>
- Learn how to use html2rss
</li>
<li>
<a href="{{ '/ruby-gem' | relative_url }}">Ruby Gem Documentation</a> -
Developer resources
</li>
<li>
<a href="{{ '/feed-directory' | relative_url }}">Feed Directory</a> - Browse
available feeds
</li>
<li>
<a href="{{ '/get-involved' | relative_url }}">Get Involved</a> - Join the
community
</li>
</ul>

<p>
If you think this is an error, please
<a href="https://github.com/html2rss/html2rss.github.io/issues"
>report it on GitHub</a
>.
</p>
2 changes: 1 addition & 1 deletion about.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ For insights into our ongoing development, project roadmap, and how you can get

`html2rss` is maintained by a dedicated group of volunteers and contributors from around the world. We are passionate about open source and committed to continuously improving the project.

Want to join us? Check out our [Contributing Guide]({{ '/contributing' | relative_url }})!
Want to join us? Check out our [Contributing Guide]({{ '/get-involved/contributing' | relative_url }})!
4 changes: 1 addition & 3 deletions bin/data-update
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,7 @@ def extract_default_parameters(parameters)
return {} unless parameters.is_a?(Hash)

parameters.each_with_object({}) do |(param_name, param_config), defaults|
if param_config.is_a?(Hash) && param_config['default']
defaults[param_name] = param_config['default']
end
defaults[param_name] = param_config['default'] if param_config.is_a?(Hash) && param_config['default']
end
end

Expand Down
2 changes: 1 addition & 1 deletion get-involved/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ Engage with the `html2rss` project. Contribute and connect with the community.
- [**Project Roadmap**]({{ 'https://github.com/orgs/html2rss/projects/3/views/1' }}): View current work, plans, and priorities.
- [**Report Bugs & Discuss Features**]({{ '/get-involved/issues-and-features' | relative_url }}): Report bugs or propose features.
- [**Join Community Discussions**]({{ '/get-involved/discussions' | relative_url }}): Connect with users and contributors.
- [**Contribute to html2rss**]({{ '/contributing' | relative_url }}): Contribute code, documentation, or feed configurations.
- [**Contribute to html2rss**]({{ '/get-involved/contributing' | relative_url }}): Contribute code, documentation, or feed configurations.
78 changes: 46 additions & 32 deletions html2rss-configs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,29 @@ has_children: false
nav_order: 5
---

# Creating Feed Configurations
# Creating Custom RSS Feeds

Welcome to the guide for `html2rss-configs`. This document explains how to create your own configuration files to convert any website into an RSS feed.
Want to create RSS feeds for websites that don't offer them? This guide shows you how to write simple configuration files that tell the html2rss engine exactly what content to extract.

You can find a list of all community-contributed configurations in the [Feed Directory]({{ '/feed-directory/' | relative_url }}).
**Don't worry if you're not technical** - we'll explain everything step by step!

You can see examples of what others have created in the [Feed Directory]({{ '/feed-directory/' | relative_url }}).

---

## Core Concepts
## How It Works

Think of the html2rss engine as a smart assistant that needs instructions. You give it a simple "recipe" (called a config file) that tells it:

1. **Which website** to look at
2. **What content** to find (articles, posts, etc.)
3. **How to organize** that content into an RSS feed

An `html2rss` config is a YAML file that defines how to extract data from a web page. It consists of two main building blocks: `channel` and `selectors`.
The recipe is written in YAML - a simple format that's easy to read and write. Both html2rss-web and the html2rss Ruby gem use these same configuration files.

### The `channel` Block

The `channel` block contains metadata about the RSS feed itself, such as its title and the source URL.
This tells the html2rss engine basic information about your feed - like giving it a name and telling it which website to look at.

**Example:**

Expand All @@ -29,11 +37,11 @@ channel:
title: My Awesome Blog
```

For a complete list of all available channel options, please see the [Channel Reference]({{ '/ruby-gem/reference/channel/' | relative_url }}).
This says: "Look at this website and call the feed 'My Awesome Blog'"

### The `selectors` Block

The `selectors` block is the core of the configuration, defining the rules for extracting content. It always contains an `items` selector to identify the list of articles and individual selectors for the data points within each item (e.g., `title`, `link`).
This is where you tell the html2rss engine exactly what to find on the page. You use CSS selectors (like you might use in web design) to point to specific parts of the webpage.

**Example:**

Expand All @@ -47,17 +55,19 @@ selectors:
selector: "h2 a"
```

For a comprehensive guide on all available selectors, extractors, and post-processors, please see the [Selectors Reference]({{ '/ruby-gem/reference/selectors/' | relative_url }}).
This says: "Find each article, get the title from the h2 link, and get the link from the same h2 link"

**Need more details?** Check our [complete guide to selectors]({{ '/ruby-gem/reference/selectors/' | relative_url }}) for all the options.

---

## Tutorial: Your First Config
## Tutorial: Your First Feed

This tutorial walks you through creating a basic configuration file from scratch.
Let's create a simple RSS feed step by step. We'll use a basic blog as our example.

### Step 1: Identify the Target Content
### Step 1: Look at the Website

First, identify the HTML structure of the website you want to create a feed for. For this example, we'll use a simple blog structure:
First, visit the website you want to create a feed for. Right-click and "View Page Source" to see the HTML structure. Look for patterns like this:

```html
<div class="posts">
Expand All @@ -72,9 +82,11 @@ First, identify the HTML structure of the website you want to create a feed for.
</div>
```

### Step 2: Create the Config File and Define the Channel
**What we see:** Each article is wrapped in `<article class="post">`, titles are in `<h2><a>` tags, and descriptions are in `<p>` tags.

### Step 2: Create Your Config File

Create a new YAML file (e.g., `my-blog.yml`) and define the `channel`:
Create a new text file and save it as `my-blog.yml` (or any name you like). Add this basic information:

```yaml
# my-blog.yml
Expand All @@ -84,9 +96,11 @@ channel:
description: The latest news from my awesome blog.
```

### Step 3: Define the Selectors
This tells html2rss: "Look at this website and call the feed 'My Awesome Blog'"

Next, add the `selectors` block to extract the content for each post.
### Step 3: Tell html2rss What to Find

Now add the selectors that tell html2rss exactly what content to extract:

```yaml
# my-blog.yml
Expand All @@ -101,26 +115,17 @@ selectors:
selector: "p"
```

- `items`: This CSS selector identifies the container for each article.
- `title`, `link`, `description`: These selectors target the specific data points within each item. For a `link` selector, `html2rss` defaults to extracting the `href` attribute from the matched `<a>` tag.
**What this means:**

- `items: "article.post"` = "Find each article with class 'post'"
- `title: "h2 a"` = "Get the title from the h2 link"
- `link: "h2 a"` = "Get the link from the same h2 link"
- `description: "p"` = "Get the description from the paragraph"

---

## Advanced Techniques

### Handling Pagination

To aggregate content from multiple pages, use the `pagination` option within the `items` selector.

```yaml
selectors:
items:
selector: ".post-listing .post"
pagination:
selector: ".pagination .next-page"
limit: 5 # Optional: sets the maximum number of pages to follow
```

### Dynamic Feeds with Parameters

Use the `parameters` block to create flexible configs. This is useful for feeds based on search terms, categories, or regions.
Expand All @@ -135,6 +140,15 @@ parameters:
channel:
url: "https://news.example.com/search?q={query}"
title: "News results for '{query}'"

selectors:
items:
selector: ".article"
title:
selector: "h2 a"
url:
selector: "h2 a"
extractor: "href"
```

---
Expand Down
35 changes: 20 additions & 15 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,35 @@ title: Home
nav_order: 1
---

# Create RSS Feeds for Any Website
# Turn Any Website Into an RSS Feed

`html2rss` creates RSS feeds for any website.
[**🚀 Get Started with the Web App**]({{ '/web-application/getting-started' | relative_url }})
Ever wished you could follow your favorite websites like a social media feed? The html2rss project makes it possible by creating RSS feeds for any website - even ones that don't offer them.

[**🚀 Get Started with html2rss-web**]({{ '/web-application/getting-started' | relative_url }})

---

## Key Features
## What is RSS?

- **Automatic Feed Generation:** `auto_source` intelligently extracts content, simplifying feed creation.
- **Precise Content Extraction:** Use CSS selectors for targeted content inclusion.
- **JavaScript Rendering:** A headless browser renders JavaScript-heavy sites for comprehensive content extraction.
- **Open Source:** `html2rss` is free to use, modify, and contribute.
RSS (Really Simple Syndication) lets you follow websites in your favorite feed reader. Instead of checking multiple websites daily, you get all updates in one place - like a personalized news feed.

---
## The html2rss Project

The html2rss project provides two main ways to create RSS feeds:

## The html2rss Ecosystem
- **html2rss-web** - A user-friendly web application (recommended for most users)
- **html2rss** - A Ruby gem for developers and advanced users

Both use the same powerful engine to extract content from websites and convert it into RSS feeds.

---

The `html2rss` project offers a complete RSS solution through a collection of integrated tools:
## Choose Your Path

- **[html2rss-web]({{ '/web-application' | relative_url }}):** User-friendly web application to create, manage, and share RSS feeds. Recommended starting point.
- **[html2rss (Ruby Gem)]({{ '/ruby-gem' | relative_url }}):** Core library and command-line interface for developers.
- **[Feed Directory]({{ '/feed-directory' | relative_url }}):** Public listing of community-driven RSS feed configurations.
- **[html2rss-web]({{ '/web-application' | relative_url }}):** **Start here!** Easy-to-use web application. No technical knowledge required.
- **[Feed Directory]({{ '/feed-directory' | relative_url }}):** Browse ready-made feeds for popular websites
- **[html2rss (Ruby Gem)]({{ '/ruby-gem' | relative_url }}):** For developers who want to create custom configurations

---

Engage with the `html2rss` community or contribute. Visit our [Get Involved]({{ '/get-involved' | relative_url }}) page.
**Ready to get started?** Check out our [html2rss-web getting started guide]({{ '/web-application/getting-started' | relative_url }}) or [browse existing feeds]({{ '/feed-directory' | relative_url }}) to see what's possible.
2 changes: 2 additions & 0 deletions ruby-gem/how-to/scraping-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ headers:
channel:
url: "http://domainname.tld/whatever.json"
selectors:
items:
selector: "array > object"
title:
selector: "foo"
```
12 changes: 5 additions & 7 deletions ruby-gem/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ This guide will walk you through the process of installing html2rss on your syst

### Prerequisites

- **Ruby:** html2rss is built with Ruby. Ensure you have Ruby installed (version 3.3 or higher recommended). You can check your Ruby version by running `ruby -v` in your terminal. If you don't have Ruby, visit [ruby-lang.org](https://www.ruby-lang.org/en/documentation/installation/) for installation instructions.
- **Ruby:** html2rss is built with Ruby. Ensure you have Ruby installed (version 3.2 or higher required). You can check your Ruby version by running `ruby -v` in your terminal. If you don't have Ruby, visit [ruby-lang.org](https://www.ruby-lang.org/en/documentation/installation/) for installation instructions.
- **Bundler (Recommended):** Bundler is a Ruby gem that manages your application's dependencies. It's highly recommended for a smooth installation. Install it with `gem install bundler`.

---
Expand Down Expand Up @@ -43,15 +43,13 @@ Then, run `bundle install` in your project directory.

---

### Method 3: Docker (For Containerized Environments)
### Method 3: GitHub Codespaces (For Cloud Development)

For a more isolated and reproducible environment, you can use the official html2rss Docker image.
For a quick start without local setup, you can develop html2rss directly in your browser using GitHub Codespaces:

```bash
docker pull html2rss/html2rss
```
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?repo=html2rss/html2rss)

You can then run html2rss commands within a Docker container. Refer to the [Docker Hub page](https://hub.docker.com/r/html2rss/html2rss) for detailed usage.
The Codespace comes pre-configured with Ruby 3.4, all dependencies, and VS Code extensions ready to go!

---

Expand Down
20 changes: 13 additions & 7 deletions ruby-gem/reference/selectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ selectors:
While you can define any named selector, only the following are used in the final RSS feed:

| RSS 2.0 Tag | `html2rss` Name |
| ------------- | --------------- |
| ------------- | --------------- | ------------------------------ |
| `title` | `title` |
| `description` | `description` |
| `link` | `url` |
Expand All @@ -51,17 +51,19 @@ While you can define any named selector, only the following are used in the fina
| `guid` | `guid` |
| `enclosure` | `enclosure` |
| `pubDate` | `published_at` |
| `comments` | `comments` |
| `comments` | `comments` | ⚠️ _Not currently implemented_ |

## Selector Options

Each selector can be configured with the following options:

| Name | Description |
| -------------- | ------------------------------------------------ |
| `selector` | The CSS selector for the target element. |
| `extractor` | The extractor to use for this selector. |
| `post_process` | A list of post-processors to apply to the value. |
| Name | Description |
| -------------- | -------------------------------------------------------- |
| `selector` | The CSS selector for the target element. |
| `extractor` | The extractor to use for this selector. |
| `attribute` | The attribute name (required for `attribute` extractor). |
| `static` | The static value (required for `static` extractor). |
| `post_process` | A list of post-processors to apply to the value. |

### Extractors

Expand Down Expand Up @@ -126,6 +128,10 @@ To add an enclosure (e.g., an image, audio, or video file) to an item, use the `

```yml
selectors:
items:
selector: ".post"
title:
selector: "h2"
enclosure:
selector: "audio"
extractor: "attribute"
Expand Down
2 changes: 1 addition & 1 deletion ruby-gem/reference/stylesheets.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ You can add multiple stylesheets to your configuration:

```yaml
stylesheets:
- href: "/path/to/style.xls"
- href: "/path/to/style.xsl"
media: "all"
type: "text/xsl"
- href: "https://example.com/rss.css"
Expand Down
Loading