Skip to content

ld066077/paper-rss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

paper-rss

paper-rss is a Linux-first local RSS gateway for OpenClaw and similar AI agents. It fetches RSS and Atom feeds, stores normalized paper metadata in SQLite, and exposes that data through both a CLI and a local HTTP API.

The goal is simple: do the paper discovery and parsing in deterministic code, then let the model summarize only the returned structured data. This reduces hallucinations when the model is weak or the feed format is messy.

Acknowledgements

paper-rss was designed and implemented collaboratively with OpenAI Codex, including architecture, implementation, testing, and documentation support.

What paper-rss does

paper-rss is designed for journal RSS and paper update workflows:

  • add and manage multiple RSS or Atom feeds
  • fetch and cache feed entries locally
  • store results in SQLite
  • deduplicate entries across refreshes
  • search papers by keyword
  • list the latest papers in JSON
  • expose a local HTTP API for OpenClaw
  • run as a background user service on Linux with systemd --user
  • generate a thin OpenClaw Skill template that tells the model to use paper-rss first

OpenClaw quick start

If you only care about using paper-rss with OpenClaw, start here.

On Linux:

git clone https://github.com/ld066077/paper-rss
cd paper-rss
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e .
paper-rss setup-linux
paper-rss install-openclaw-skill --dir ~/.openclaw/workspace/skills
paper-rss service status
curl http://127.0.0.1:9493/health

Once these two checks pass:

  • paper-rss service status shows active (running)
  • curl http://127.0.0.1:9493/health returns healthy JSON

OpenClaw can start controlling paper-rss through the local HTTP API.

At that point, OpenClaw can:

  • add feeds
  • refresh feeds
  • fetch latest papers
  • fetch papers since a timestamp
  • search papers by keyword

If you want the safest first run, add one feed manually:

paper-rss add-feed http://feeds.aps.org/rss/tocsec/PRL-PlasmaandBeamPhysics.xml --name APS-PRL-Plasma
paper-rss refresh --all

Then let OpenClaw continue from there.

What paper-rss does not do

Current MVP boundaries:

  • no web UI
  • no cloud sync
  • no mandatory Crossref or DOI enrichment
  • no MCP server
  • no full paper-page scraping by default

Why this exists

If an AI agent reads publisher pages or RSS summaries directly, weaker models can:

  • invent authors
  • guess DOI values
  • confuse issue dates
  • mix papers from different journals

paper-rss avoids that by moving feed retrieval, parsing, caching, and filtering into code. The model becomes a summarizer instead of a fact generator.

Installation

Option 1: install from source

For now, the most direct install path is from source:

git clone https://github.com/ld066077/paper-rss
cd paper-rss
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e .

After that, you can run:

paper-rss doctor

Option 2: developer install with tests

If you want the test dependencies too:

git clone https://github.com/ld066077/paper-rss
cd paper-rss
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e .[dev]
pytest

Option 3: packaged Linux helper files

You can generate the helper assets:

paper-rss write-linux-assets --output-dir ./dist

This writes:

  • install.sh
  • paper-rss.service

There is also a repository copy of the installer script at install.sh.

OpenClaw integration

The intended OpenClaw pattern is:

  1. paper-rss runs locally on Linux
  2. OpenClaw queries the local HTTP API
  3. The model answers only from returned JSON

For scheduled notifications, the recommended flow is:

  1. OpenClaw triggers on its own timer
  2. OpenClaw calls POST /feeds/refresh
  3. OpenClaw calls GET /papers/since?since=<last-check-timestamp>&feed_id=<feed_id>&limit=<n>
  4. OpenClaw summarizes only the returned papers
  5. OpenClaw stores the new timestamp on its own side for the next run

Behavior rules for OpenClaw

  • always query paper-rss first for RSS and paper-update questions
  • only use fields returned by paper-rss
  • if a field is missing or null, say it is unavailable from the RSS feed
  • always include the source link when presenting papers
  • do not infer authors, DOI, publication date, or abstract from memory
  • do not scrape publisher pages unless the user explicitly asks

Install the OpenClaw Skill template

Generate a skill directory:

paper-rss write-openclaw-skill --output-dir ./dist

Install it into a target skills directory:

paper-rss install-openclaw-skill --dir ~/.openclaw/workspace/skills

This creates a paper-rss-openclaw skill directory containing:

  • SKILL.md
  • agents/openai.yaml

Repository copies live at:

  • SKILL.md
  • openai.yaml

Detailed command and API reference lives in usage.md.

Will OpenClaw auto-detect the skill?

Maybe, but not always.

It depends on whether your OpenClaw installation automatically scans a local skills directory. If it does, placing the skill into the correct folder may be enough. If it does not, you may need to import the skill manually or restart OpenClaw.

The important distinction is:

  • installing paper-rss adds the software
  • installing the Skill teaches OpenClaw how to use that software correctly

For more detailed integration rules, see openclaw-integration.md.

Verified example

This project has already been validated against the APS feed:

http://feeds.aps.org/rss/tocsec/PRL-PlasmaandBeamPhysics.xml

Verified behavior so far:

  • real feed refresh succeeds
  • first refresh inserted 100 entries
  • second refresh inserted 0 entries
  • latest paper queries returned structured JSON
  • HTML in authors and summaries is cleaned before storage

DOI handling

Current DOI extraction is best-effort from RSS content and link patterns. That means:

  • if a DOI-like value is clearly present, paper-rss returns it
  • if it is not present, paper-rss returns null
  • paper-rss does not guess DOI values

If you want stricter DOI quality later, add an optional enrichment layer using Crossref or a publisher-specific API. The MVP intentionally leaves this off by default to keep refreshes simple and deterministic.

More detail is in doi-enrichment.md.

Development

For local development:

python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e .[dev]
pytest

Current status

Implemented today:

  • CLI
  • local HTTP API
  • SQLite storage
  • RSS fetch and parsing
  • deduplication
  • Linux systemd --user helpers
  • OpenClaw Skill template generation
  • real APS RSS feed verification

Planned or optional later:

  • Crossref DOI enrichment
  • automated scheduled refresh strategy
  • MCP adapter
  • web UI

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors