GitHub - gromnitsky/grepfeed: Filters out RSS/Atom feeds, returning articles that match a specified pattern. The output is another valid XML feed.

Filters out RSS/Atom feeds, returning articles that match a specified pattern. The output is another valid XML feed.

What's included

a cli util;
a standalone http server that shares the same engine w/ the cli util.
a web client that uses the included server as an intermediary and acts as a gui version of the cli util.

Requirements

node >= 20

Setup

$ npm i -g grepfeed
$ grepfeed-server

Open http://127.0.0.0:3000 in a browser.

How it works

lib/feed.js contains all the code that parses & transforms xml feeds. Its core is Grep class--a Transform stream:

readable_stream.pipe(<our filter>).pipe(writable_stream)

cli

cli/grepfeed.js extends Grep to override several methods where it's convenient to write the output in any format one wants. 3 interfaces are included: text-only (the default), json, xml. The latter produces a valid rss 2.0 feed. E.g.

$ curl http://example.com/rss | cli/grepfeed.js apple -d=2016 -x

parses the input feed, selects only articles written in 2016 or newer that match the regexp pattern /apple/. -x means xml output.

Usage: grepfeed.js [opt] [PATTERN] < xml

  -e      print only articles w/ enclosures
  -n NUM  number of articles to print
  -x      xml output
  -j      json output
  -m      print only meta
  -V      program version

Filter by:

  -d      [-]date[,date]
  -c      categories

Or/and search for a regexp PATTERN in each rss article & print the
matching ones. The internal order of the search: title, summary,
description, author.

  -v      invert match

server

Acts as a proxy: downloads a requested feed & returns the filtered xml. Query params match cli/grepfeed.js command line interface. To start a server, run

$ make
$ server/index.js

(For a different host/port combination, use HOST & PORT env vars.)

This following example yields the same xml as in the cli/grepfeed.js case, only does it through http:

$ curl '127.0.0.1:3000/api/?_=apple&d=2016&url=http%3A%2F%2Fexample.com%2Frss'

Notice d means -d in the cli/grepfeed.js example, -x doesn't make sense here, _ means the 1st command line arg, apple in this case. The server doesn't invoke cli/grepfeed.js program; they both use minimist to parse command options, thus the perceived similarity in the behaviour.

caveats

A URL you'd like to filter must be reachable from within the machine server/index.js is running on. This could pose a security risk or be inconvenient if you want to filter XML from your LAN. In the latter case run grepfeed-server on your local machine.

Bugs

All html tags in article titles are removed, even if a title is in plain text.
This should've been written in Rust or something similar, as Node is slow and memory hungry for this kind of tasks.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
cli		cli
lib		lib
server		server
test		test
web		web
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
rollup.config.js		rollup.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What's included

Requirements

Setup

How it works

cli

server

caveats

Bugs

See also

License

About

Releases 1

Packages

Languages

gromnitsky/grepfeed

Folders and files

Latest commit

History

Repository files navigation

What's included

Requirements

Setup

How it works

cli

server

caveats

Bugs

See also

License

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages