feat: add --detect-paywall flag for subscriber-only content classification by drewid74 · Pull Request #1 · snapsynapse/substack2md

drewid74 · 2026-04-16T11:06:09Z

Summary

Adds opt-in paywall detection so users with paid subscriptions can identify which posts are subscriber-only and build guardrails to avoid accidentally sharing or redistributing content that creators intended for paying subscribers.

What it does

New --detect-paywall CLI flag (opt-in, no breaking changes)
Queries Substack's public /api/v1/posts/{slug} API endpoint (no additional auth required)
Adds is_paid (bool) and audience (str) fields to YAML frontmatter
Graceful fallback to null on API errors — never blocks the conversion pipeline

Frontmatter output (when enabled)

is_paid: true
audience: only_paid

or for free posts:

is_paid: false
audience: everyone

Why this matters

If you have a paid Substack subscription, CDP fetches the full content of subscriber-only posts. Without metadata indicating paywall status, there's no programmatic way to distinguish paid from free content in downstream workflows. This flag lets users respect creators' rights by tagging content appropriately — enabling automation that keeps subscriber-only content private while freely sharing public posts.

Aligns with the project's existing ethos in the Disclaimer: "Getting better utility from Substacks you already support is not [stealing]. Sharing without permission is the line, don't cross it."

Changes

substack2md.py: Added fetch_paywall_status() function + wired into process_url() and with_frontmatter()
README.md: Added Paywall Detection section, updated frontmatter example, CLI reference, and usage examples

…ation Queries Substack's public API to add is_paid and audience fields to YAML frontmatter. This lets users with paid subscriptions build guardrails to avoid accidentally sharing or redistributing content that creators intended for paying subscribers only. - New fetch_paywall_status() function hits /api/v1/posts/{slug} - Opt-in via --detect-paywall CLI flag (no breaking changes) - Frontmatter gains is_paid (bool) and audience (str) fields - Graceful fallback to null on API errors - Updated README with feature docs and usage examples

@drewid74

Follow-up to #1. Two bugs surfaced while writing evals for the --detect-paywall flag: 1. Founding-tier posts misclassified as free. `audience == "only_paid"` missed "founding" (paid founding-member tier), defeating the feature's stated purpose of flagging subscriber-only content. 2. Missing `audience` field in a 200 response was silently reported as "everyone"/is_paid=False, contradicting the docstring promise of "graceful fallback to null on API errors." Fix: explicit `known_paid` + `known_free` sets. Values outside both (including a future Substack tier) return is_paid=None with the raw audience string preserved, so downstream workflows can treat it as "unknown — handle with care" instead of silently publishing as free. Also ships a 30-test eval suite covering audience decoding, HTTP failure modes, request shape, frontmatter behavior, CLI wiring, and publication-slug edge cases. Run with: pip install -r requirements.txt -r tests/requirements-dev.txt pytest tests/ -v See tests/EVALS.md for the full merge-readiness report. Credit to @drewid74 for the initial implementation in #1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

snapsynapse · 2026-04-16T15:55:05Z

Thanks @drewid74, merged! 🙏

The opt-in design and the graceful-fallback contract in your docstring were exactly the right shape for this feature. Really appreciate the thoughtfulness about creators' rights in the PR description. That framing shaped how I thought about the follow-up.

I wrote an eval suite against the branch and it surfaced two edge cases worth tightening before real-world use:

audience == "only_paid" misses "founding" (Substack's paid founding-member tier), verified empirically across 9 publications.
A 200 response without the audience field defaulted to "everyone"/is_paid=False, which contradicted the null-on-uncertainty promise in your own docstring.

Rather than block your PR on those, I merged and pushed a follow-up in #2 with the fixes and a 30-test eval suite. You're credited in the commit message and in the PR body.

Thanks again for kicking this off. The feature wouldn't exist without your initial work.

I push a lot of code daily. It means a lot to me when people actually use it, and offer improvements. Sincere thanks!

fix(paywall): audience enum handling + eval suite (follow-up to #1)

Matrix: Python 3.10 / 3.11 / 3.12 / 3.13 on ubuntu-latest. Installs both runtime and dev requirements, runs `pytest tests/`. Would have caught the two bugs in #1 before they hit main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- CONTRIBUTING.md documents local test setup, PR conventions (Conventional Commits), style rules, and how to run the opt-in live smoke test. - CHANGELOG.md follows Keep a Changelog format. Captures the v1.2.0 paywall feature + fix work (PR #1, PR #2) and an Unreleased section for the current improvement pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

drewid74 · 2026-04-16T22:06:20Z

Awesome, happy to help and you had solved an issue I had tried working around before. Definitely a testament to the community...

snapsynapse self-assigned this Apr 16, 2026

snapsynapse self-requested a review April 16, 2026 15:37

snapsynapse merged commit 2a93cc2 into snapsynapse:main Apr 16, 2026

snapsynapse mentioned this pull request Apr 16, 2026

fix(paywall): audience enum handling + eval suite (follow-up to #1) #2

Merged

4 tasks

snapsynapse added a commit that referenced this pull request Apr 16, 2026

Merge pull request #2 from snapsynapse/cc/paywall-fixes

93475e1

fix(paywall): audience enum handling + eval suite (follow-up to #1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --detect-paywall flag for subscriber-only content classification#1

feat: add --detect-paywall flag for subscriber-only content classification#1
snapsynapse merged 1 commit intosnapsynapse:mainfrom
drewid74:feat/paywall-detection

drewid74 commented Apr 16, 2026

Uh oh!

snapsynapse commented Apr 16, 2026

Uh oh!

drewid74 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

drewid74 commented Apr 16, 2026

Summary

What it does

Frontmatter output (when enabled)

Why this matters

Changes

Uh oh!

snapsynapse commented Apr 16, 2026

Uh oh!

drewid74 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants