Skip to content

feat: add --detect-paywall flag for subscriber-only content classification#1

Merged
snapsynapse merged 1 commit intosnapsynapse:mainfrom
drewid74:feat/paywall-detection
Apr 16, 2026
Merged

feat: add --detect-paywall flag for subscriber-only content classification#1
snapsynapse merged 1 commit intosnapsynapse:mainfrom
drewid74:feat/paywall-detection

Conversation

@drewid74
Copy link
Copy Markdown

Summary

Adds opt-in paywall detection so users with paid subscriptions can identify which posts are subscriber-only and build guardrails to avoid accidentally sharing or redistributing content that creators intended for paying subscribers.

What it does

  • New --detect-paywall CLI flag (opt-in, no breaking changes)
  • Queries Substack's public /api/v1/posts/{slug} API endpoint (no additional auth required)
  • Adds is_paid (bool) and audience (str) fields to YAML frontmatter
  • Graceful fallback to null on API errors — never blocks the conversion pipeline

Frontmatter output (when enabled)

is_paid: true
audience: only_paid

or for free posts:

is_paid: false
audience: everyone

Why this matters

If you have a paid Substack subscription, CDP fetches the full content of subscriber-only posts. Without metadata indicating paywall status, there's no programmatic way to distinguish paid from free content in downstream workflows. This flag lets users respect creators' rights by tagging content appropriately — enabling automation that keeps subscriber-only content private while freely sharing public posts.

Aligns with the project's existing ethos in the Disclaimer: "Getting better utility from Substacks you already support is not [stealing]. Sharing without permission is the line, don't cross it."

Changes

  • substack2md.py: Added fetch_paywall_status() function + wired into process_url() and with_frontmatter()
  • README.md: Added Paywall Detection section, updated frontmatter example, CLI reference, and usage examples

…ation

Queries Substack's public API to add is_paid and audience fields to
YAML frontmatter. This lets users with paid subscriptions build
guardrails to avoid accidentally sharing or redistributing content
that creators intended for paying subscribers only.

- New fetch_paywall_status() function hits /api/v1/posts/{slug}
- Opt-in via --detect-paywall CLI flag (no breaking changes)
- Frontmatter gains is_paid (bool) and audience (str) fields
- Graceful fallback to null on API errors
- Updated README with feature docs and usage examples
@snapsynapse snapsynapse self-assigned this Apr 16, 2026
@snapsynapse snapsynapse self-requested a review April 16, 2026 15:37
@snapsynapse snapsynapse merged commit 2a93cc2 into snapsynapse:main Apr 16, 2026
snapsynapse added a commit that referenced this pull request Apr 16, 2026
Follow-up to #1. Two bugs surfaced while writing evals for the
--detect-paywall flag:

1. Founding-tier posts misclassified as free. `audience == "only_paid"`
   missed "founding" (paid founding-member tier), defeating the
   feature's stated purpose of flagging subscriber-only content.

2. Missing `audience` field in a 200 response was silently reported
   as "everyone"/is_paid=False, contradicting the docstring promise
   of "graceful fallback to null on API errors."

Fix: explicit `known_paid` + `known_free` sets. Values outside both
(including a future Substack tier) return is_paid=None with the raw
audience string preserved, so downstream workflows can treat it as
"unknown — handle with care" instead of silently publishing as free.

Also ships a 30-test eval suite covering audience decoding, HTTP
failure modes, request shape, frontmatter behavior, CLI wiring, and
publication-slug edge cases. Run with:

    pip install -r requirements.txt -r tests/requirements-dev.txt
    pytest tests/ -v

See tests/EVALS.md for the full merge-readiness report.

Credit to @drewid74 for the initial implementation in #1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@snapsynapse
Copy link
Copy Markdown
Owner

Thanks @drewid74, merged! 🙏

The opt-in design and the graceful-fallback contract in your docstring were exactly the right shape for this feature. Really appreciate the thoughtfulness about creators' rights in the PR description. That framing shaped how I thought about the follow-up.

I wrote an eval suite against the branch and it surfaced two edge cases worth tightening before real-world use:

  1. audience == "only_paid" misses "founding" (Substack's paid founding-member tier), verified empirically across 9 publications.
  2. A 200 response without the audience field defaulted to "everyone"/is_paid=False, which contradicted the null-on-uncertainty promise in your own docstring.

Rather than block your PR on those, I merged and pushed a follow-up in #2 with the fixes and a 30-test eval suite. You're credited in the commit message and in the PR body.

Thanks again for kicking this off. The feature wouldn't exist without your initial work.

I push a lot of code daily. It means a lot to me when people actually use it, and offer improvements. Sincere thanks!

snapsynapse added a commit that referenced this pull request Apr 16, 2026
fix(paywall): audience enum handling + eval suite (follow-up to #1)
snapsynapse added a commit that referenced this pull request Apr 16, 2026
Matrix: Python 3.10 / 3.11 / 3.12 / 3.13 on ubuntu-latest.
Installs both runtime and dev requirements, runs `pytest tests/`.

Would have caught the two bugs in #1 before they hit main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
snapsynapse added a commit that referenced this pull request Apr 16, 2026
- CONTRIBUTING.md documents local test setup, PR conventions (Conventional
  Commits), style rules, and how to run the opt-in live smoke test.
- CHANGELOG.md follows Keep a Changelog format. Captures the v1.2.0
  paywall feature + fix work (PR #1, PR #2) and an Unreleased section
  for the current improvement pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@drewid74
Copy link
Copy Markdown
Author

Awesome, happy to help and you had solved an issue I had tried working around before. Definitely a testament to the community...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants