Once upon a time every website offered an RSS feed to keep readers updated about new articles/blog posts via the users' feed readers. These times are long gone. The once iconic orange RSS icon has been replaced by "social share" buttons.
Feeds aims to bring back the good old reading times. It creates Atom feeds for websites that don't offer them (anymore). It allows you to read new articles of your favorite websites in your feed reader (e.g. Tiny Tiny RSS) even if this is not officially supported by the website.
Furthermore it can also enhance existing feeds by inlining the actual content into the feed entry so it can be read without leaving the feed reader.
Feeds is based on Scrapy, a framework for extracting data from websites, and
it's easy to add support for new websites. Just take a look at the existing
feeds/spiders and feel free to open a pull request!
Feeds comes with extensive documentation. It is available at https://pyfeeds.readthedocs.io.
Feeds is currently able to create Atom feeds for various sites. The complete list of supported websites is available in the documentation.
Some sites (Falter, Konsument, LWN, Oberösterreichische Nachrichten,
Übermedien) offer articles only behind a paywall. If you have a paid
subscription, you can configure your username and password in
also read paywalled articles from within your feed reader. For the less
fortunate who don't have a subscription, paywalled articles are tagged with
paywalled so they can be filtered, if desired.
All feeds contain the articles in full text so you never have to leave your feed reader while reading.
Feeds is meant to be installed on your server and run periodically in a cron job.
The easiest way to install Feeds is via
pip in a virtual environment. Feeds
does not provide any releases yet, so one might directly install the current
$ git clone https://github.com/nblock/feeds.git $ cd feeds $ pip install .
feeds is available in your virtual environment.
Feeds supports Python 3.4+.
List all available spiders:
$ feeds list
Feeds allows to crawl one or more spiders without configuration, e.g.:
$ feeds crawl tvthek.orf.at
A configuration file is supported too. Simply copy the template configuration and adjust it. Enable the spiders you are interested in and adjust the output path where Feeds stores the scraped Atom feeds:
$ cp feeds.cfg.dist feeds.cfg $ $EDITOR feeds.cfg $ feeds --config feeds.cfg crawl
Point your feed reader to the generated Atom feeds and start reading. Feeds works best when run periodically in a cron job.
feeds <subcommand> --helpfor help and usage details.
Feeds can be configured to use a cache for HTTP responses which is highly recommended to save bandwidth. It can be enabled in the config file. See feeds.cfg.dist for an example on how to do that.
Entries are cached for 14 days by default (this can be overwritten in the config file). Entries are purged from cache automatically after a crawl. It's also possible to explicitly invalidate the cache:
$ feeds --config feeds.cfg cleanup
- morss creates feeds, similar to Feeds but in "real-time", i.e. on (HTTP) request.
- Full-Text RSS converts feeds to contain the full article and not only a teaser based on heuristics and rules. Feeds are converted in "real-time", i.e. on request basis.
- f43.me converts feeds to contain the full article and also improves articles by adding links to the comment sections of Hacker News and Reddit. Feeds are converted periodically.
- python-ftr is a library to extract content from pages. A partial reimplementation of Full-Text RSS.
How to contribute
- Search the existing issues in the issue tracker.
- File a new issue in case the issue is undocumented.
- Fork the project to your private repository.
- Create a topic branch and make your desired changes.
- Open a pull request. Make sure the travis checks are passing.
AGPL3, see LICENSE for details.