Skip to content

Add feed plugin for parsing RSS and Atom syndication feeds#1

Merged
rjrodger merged 4 commits into
mainfrom
claude/add-rss-feed-parser-yTV7g
May 6, 2026
Merged

Add feed plugin for parsing RSS and Atom syndication feeds#1
rjrodger merged 4 commits into
mainfrom
claude/add-rss-feed-parser-yTV7g

Conversation

@rjrodger
Copy link
Copy Markdown
Contributor

@rjrodger rjrodger commented May 6, 2026

This PR replaces the JSONC plugin with a new Feed plugin that parses RSS (0.90, 0.91, 0.92, 1.0, 2.0) and Atom (0.3, 1.0) syndication feeds.

Summary

The Feed plugin is built on top of the @jsonic/xml plugin and normalizes all feed dialects to an Atom-shaped result by default. It supports three output formats:

  • atom (default): Normalized Atom-shaped structure
  • native: Dialect-specific structure (preserves RSS or Atom format)
  • raw: Raw XML element tree from the XML plugin

Key Changes

  • New TypeScript implementation (src/feed.ts): Complete feed parser with support for multiple RSS and Atom versions, including comprehensive type definitions for all feed formats
  • New Go implementation (go/feed.go): Port of the TypeScript implementation with equivalent functionality and type structures
  • Comprehensive test suite (go/feed_test.go): Tests covering various feed formats and edge cases using the feedparser test suite
  • Updated documentation (README.md): Replaced JSONC documentation with Feed plugin documentation
  • Updated package metadata (package.json, go/go.mod): Changed from JSONC to Feed plugin
  • Updated third-party notices (THIRD_PARTY_NOTICES.md): Updated to reference feedparser test suite instead of JSON test suite

Implementation Details

  • Supports parsing of feed metadata (title, author, links, categories, etc.)
  • Handles feed entries/items with full content, summaries, and source information
  • Normalizes RSS-specific elements (guid, enclosure, source) to Atom equivalents
  • Preserves dialect-specific structures when format: 'native' is used
  • Comprehensive handling of text content types (text, html, xhtml)
  • Support for person elements (authors, contributors) with name, URI, and email

https://claude.ai/code/session_01Bu1d6Wy1spSXSMPpxjmbZ5

claude added 4 commits May 6, 2026 11:01
Replace the JSONC parser with @jsonic/feed: an RSS (0.90, 0.91, 0.92,
1.0, 2.0) and Atom (0.3, 1.0) parser built on top of @jsonic/xml. By
default every dialect is normalised to an Atom-shaped result; a
`format: 'native'` option preserves the source dialect's structure and
`format: 'raw'` returns the underlying XmlElement tree.

- src/feed.ts: typed AtomFeed / Rss2Feed / Rss1Feed shapes, dialect
  detection, native parsers for each format, and best-effort
  RSS-to-Atom conversion mappings (guid -> id, enclosure -> link[rel],
  managingEditor -> authors, lastBuildDate -> updated, etc.).
- Plugin form (`Jsonic.use(Feed)` adds a `.feed(src)` method) and a
  standalone `parseFeed(src, options)` helper.
- test/feed.test.ts: hand-curated samples covering each dialect, the
  three output formats, the plugin form, and xhtml content extraction.
- test/feedparser-wellformed/: focused subset of well-formed feed
  samples vendored from kurtmckee/feedparser (BSD 2-Clause); upstream
  LICENSE preserved alongside, attribution recorded in
  THIRD_PARTY_NOTICES.md.
- test/feedparser.test.ts: dialect detection, no-error parse, and
  targeted value checks against the vendored corpus.

Removed jsonc grammar, embed script, Go module, and JSONTestSuite
corpus, all of which were specific to the previous JSONC parser.
Remove parseFeed() and the j.feed() method. With the Feed plugin
installed, calling the jsonic instance directly (j(src)) returns the
converted feed result, matching the standard jsonic plugin pattern.

Implementation: register a `bc` (before-close) action on the `xml`
root rule. After @jsonic/xml's own @xml-bc copies the parsed
XmlElement onto ctx.root().node, our hook replaces it with the
converted feed (Atom by default, native, or raw per options.format).

Tests and README updated to use the j(src) API throughout.
Add a Go implementation of the feed parser at go/feed.go that mirrors
the TypeScript plugin. Both languages now parse RSS 0.90 / 0.91 / 0.92
/ 1.0 / 2.0 and Atom 0.3 / 1.0 into typed structures and default to a
normalised Atom shape; format=native preserves the source dialect's
structure and format=raw returns the underlying XmlElement tree from
@jsonic/xml.

Shared test fixtures live under test/specs/ — each base name has a
.xml input, a .detect.json (expected dialect/version), an .atom.json
(expected default output), and an optional .native.json. Both the TS
and Go test suites enumerate the directory and JSON-compare results
to expectations, so adding a fixture covers both languages
automatically. The two language test suites also run the
feedparser-wellformed corpus for no-error parsing plus a shared set of
targeted value checks.

Also fixes a re-entry bug in the bc hook: the xml rule's bc fires
twice when there is trailing whitespace after the root element, so
both implementations now skip the second invocation when r.Node has
already been replaced with a converted feed.

Makefile and CI workflow updated to build and test both languages.
The xml rule's bc fires once per close, including extra times when
`r: xml` recurses to consume trailing whitespace after the root
element. @jsonic/xml's own @xml-bc handles this idempotency by only
acting when r.child.node is set (i.e. an element was just parsed in
the current iteration). Mirror that idiom in our own bc hook so the
guard expresses the actual condition rather than relying on the
incidental fact that the previous iteration already replaced r.node.

Behavior is unchanged; this is a clarity / robustness fix.
@rjrodger rjrodger merged commit c7a6fd5 into main May 6, 2026
6 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15530f3c83

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/feed.ts
Comment on lines +485 to +486
const title = findChild(root, 'title')
if (title) feed.title = parseText(title)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict Atom element lookup to Atom namespace

The Atom parser reads core fields like title by local name only (findChild(root, 'title')), so any extension element with the same local name (for example dc:title) that appears earlier will be parsed as the feed title. This causes incorrect results on mixed-namespace feeds, which are common in Atom/RSS ecosystems, and can silently corrupt normalized output. Please pass the Atom namespace when selecting core Atom elements (and apply the same rule consistently for entry-level fields).

Useful? React with 👍 / 👎.

Comment thread go/feed.go
Comment on lines +653 to +654
if t := findChild(root, "title"); t != nil {
feed.Title = parseText(t)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce namespace when extracting Go Atom core fields

The Go Atom parser also matches core tags purely by local name (findChild(root, "title")), so extension tags such as dc:title can be mistaken for Atom core fields when they appear first. In real feeds with multiple namespaces, this produces wrong metadata without an error. The field extraction for Atom feed/entry elements should be namespace-aware to avoid collisions.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants