feat: add --detect-paywall flag for subscriber-only content classification#1
Conversation
…ation
Queries Substack's public API to add is_paid and audience fields to
YAML frontmatter. This lets users with paid subscriptions build
guardrails to avoid accidentally sharing or redistributing content
that creators intended for paying subscribers only.
- New fetch_paywall_status() function hits /api/v1/posts/{slug}
- Opt-in via --detect-paywall CLI flag (no breaking changes)
- Frontmatter gains is_paid (bool) and audience (str) fields
- Graceful fallback to null on API errors
- Updated README with feature docs and usage examples
Follow-up to #1. Two bugs surfaced while writing evals for the --detect-paywall flag: 1. Founding-tier posts misclassified as free. `audience == "only_paid"` missed "founding" (paid founding-member tier), defeating the feature's stated purpose of flagging subscriber-only content. 2. Missing `audience` field in a 200 response was silently reported as "everyone"/is_paid=False, contradicting the docstring promise of "graceful fallback to null on API errors." Fix: explicit `known_paid` + `known_free` sets. Values outside both (including a future Substack tier) return is_paid=None with the raw audience string preserved, so downstream workflows can treat it as "unknown — handle with care" instead of silently publishing as free. Also ships a 30-test eval suite covering audience decoding, HTTP failure modes, request shape, frontmatter behavior, CLI wiring, and publication-slug edge cases. Run with: pip install -r requirements.txt -r tests/requirements-dev.txt pytest tests/ -v See tests/EVALS.md for the full merge-readiness report. Credit to @drewid74 for the initial implementation in #1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks @drewid74, merged! 🙏 The opt-in design and the graceful-fallback contract in your docstring were exactly the right shape for this feature. Really appreciate the thoughtfulness about creators' rights in the PR description. That framing shaped how I thought about the follow-up. I wrote an eval suite against the branch and it surfaced two edge cases worth tightening before real-world use:
Rather than block your PR on those, I merged and pushed a follow-up in #2 with the fixes and a 30-test eval suite. You're credited in the commit message and in the PR body. Thanks again for kicking this off. The feature wouldn't exist without your initial work. I push a lot of code daily. It means a lot to me when people actually use it, and offer improvements. Sincere thanks! |
fix(paywall): audience enum handling + eval suite (follow-up to #1)
Matrix: Python 3.10 / 3.11 / 3.12 / 3.13 on ubuntu-latest. Installs both runtime and dev requirements, runs `pytest tests/`. Would have caught the two bugs in #1 before they hit main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CONTRIBUTING.md documents local test setup, PR conventions (Conventional Commits), style rules, and how to run the opt-in live smoke test. - CHANGELOG.md follows Keep a Changelog format. Captures the v1.2.0 paywall feature + fix work (PR #1, PR #2) and an Unreleased section for the current improvement pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Awesome, happy to help and you had solved an issue I had tried working around before. Definitely a testament to the community... |
Summary
Adds opt-in paywall detection so users with paid subscriptions can identify which posts are subscriber-only and build guardrails to avoid accidentally sharing or redistributing content that creators intended for paying subscribers.
What it does
--detect-paywallCLI flag (opt-in, no breaking changes)/api/v1/posts/{slug}API endpoint (no additional auth required)is_paid(bool) andaudience(str) fields to YAML frontmatternullon API errors — never blocks the conversion pipelineFrontmatter output (when enabled)
or for free posts:
Why this matters
If you have a paid Substack subscription, CDP fetches the full content of subscriber-only posts. Without metadata indicating paywall status, there's no programmatic way to distinguish paid from free content in downstream workflows. This flag lets users respect creators' rights by tagging content appropriately — enabling automation that keeps subscriber-only content private while freely sharing public posts.
Aligns with the project's existing ethos in the Disclaimer: "Getting better utility from Substacks you already support is not [stealing]. Sharing without permission is the line, don't cross it."
Changes
substack2md.py: Addedfetch_paywall_status()function + wired intoprocess_url()andwith_frontmatter()README.md: Added Paywall Detection section, updated frontmatter example, CLI reference, and usage examples