A collection of scripts to parse and process input files (html, RSS, Atom) – without dependencies1.
1: Well, except for pytelegrambotapi
, but feel free to replace it with another module.
Includes tools for extracting “news” articles, detection of duplicates, content downloader, cron-like repeated events, and Telegram bot client.
Basically everything you need for quick and dirty format processing (html->RSS
, notification->telegram
, rss/podcast->download
, web scraper
, ...) without writing the same code over and over again.
The architecture is modular and pipeline oriented.
Use whatever suits the task at hand.
There is a short usage documentation on the individual componentes of this lib. And there are some examples on how to combine them. Lastly, for web scraping, open the playground.py to test your regex.