Exploring Native Paywall Support in EmDash Core #1467
Replies: 3 comments 3 replies
-
|
I just want to say this is really good, @CacheMeOwside. Thanks. I will get back to you with more feedback soon when I've had more time to look at it. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @CacheMeOwside User accountsI think there a lot of potential issues in using the same mechanism for editor and subscriber accounts, in security, flexibility and integration. I know we do currently have a subscriber role, but I think we need to think hard whether we do use this as the primary method for subscriber accounts. If we don't go with that, how should subscribers be represented? Should it delegate entirely to external systems (via plugins) or should there be something in core? How does this interact with user comments? Bot detectionAs you identified, there is a real conflict between paywalls and web crawlers. How do we handle crawler detection, so that we can serve full content to crawlers without making the paywall easy to bypass? This is something that Cloudflare can do, but we don't want to tie a solution to Cloudflare – it should be a generic system that can have signals from plugins etc. x402 integrationWe have first-class x402 support to allow AI tools to identify themselves. Can we combine these with the paywalls? Should we? Thanks again for doing this. It's an important contribution. |
Beta Was this translation helpful? Give feedback.
-
|
This is a great writeup. I've been tackling this challenge on some client work and arrived at a plugin solution for handling "memberships" I felt that mixing emdash users and readers into the same bucket wasn't the correct way forward and pivoted from opening a discussion proposing an implementation in emdash core. Payment integrations are necessary to drive these as well which is hard to not be opinionated about. Let me know if you'd like to checkout my plugin. I'm modeling it on WallKit. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
For most publishers, paywalled content is the core business model. Large media organizations depend on subscriptions as the primary revenue model, instead of relying entirely on ad revenue.
That's why most major publishing platforms have some solution for content gating, whether it's built into the platform itself (Ghost, Substack) or provided through a plugin ecosystem (WordPress). The idea is the same in both cases: show a preview to everyone, and deliver the full content only to readers who are paying or otherwise authorized.
What I take from open ecosystems
WordPress shows both sides of a very open, plugin-friendly platform. The openness is a real strength. There is a plugin for almost anything, and that ecosystem is a big part of why WordPress won.
But that same openness has a cost when it reaches critical areas, and paywalls are one of the example. Poor development practices, and an incomplete understanding of how the system actually serves content, leads to incorrect implementations. A common practice is gating content in the browser, where the full article is still sent in the response and only hidden from view (with the CSS
display:noneproperty). The paywalled content is still in the response, so it stays retrievable, and a service like smry.ai can access it.A few live examples of paywalls on some of the popular WordPress websites, where tools like smry.ai are easily able to access paywalled content. Feel free to give it a try:
This website has a metered paywall, where the first article is free and subsequent ones are paid.
To be fair to some of these sites, they are not all necessarily poor implementations. There is a genuine structural tension at play. Search engines expect the full article to be present so they can index and rank it. The moment websites try to serve full content only to verified search engine bots while showing a preview to everyone else, it risks being flagged for cloaking, which carries ranking penalties.
So sites are pushed toward putting the full content in the response for everyone (and then hiding it on the client-side using CSS), which is exactly what makes it retrievable by tools like smry.ai. In other words, the leak is often not a bug in the site's paywall but a side effect of staying on the right side of search engine policy.
This official Google article on using schema.org JSON-LD to mark up paywalled content was my starting point for researching the SEO side of this. The key is that this approach (exposing full content to crawlers without exposing it to everyone else) works securely only if we can reliably verify a request genuinely comes from a crawler rather than a spoofed user-agent.
I'd like to invite SEO experts in the community to share insights on the best practices for keeping paywalled articles secure without hurting SEO rankings.
Bugs are a normal part of software and no system can fully avoid them. But there is a difference between a cosmetic bug and a failure in a critical area. Anything that touches security, revenue, access control, or reader privacy deserves a higher bar, because the cost of getting it wrong is money and trust. One way to raise that bar is to be deliberate about what the CMS core owns versus what is handed to third party developers. The critical guarantees belong in core, designed carefully and tested once, instead of being re-implemented (often insecurely) by every plugin.
The proposal - Native paywall support in EmDash Core
Core gates. Plugins and external providers do everything else.
EmDash Core owns one guarantee: protected content is never delivered to a reader who is not authorized to see it. Everything else stays with third party plugins or external services: authentication, authorization, subscriptions, memberships, metering, and payment processing. Core provides the gate and a contract that plugins and providers build on. Core does not become an identity provider or a billing platform.
This keeps the part that is easy to get wrong, and expensive when you do (not leaking content), in one well-tested place, while leaving the parts that vary a lot between publishers (how you log readers in, how you charge them) fully open.
Design considerations at a high level
In order to build a paywall capability, here are some points I think need consideration. This is a starting list, not a spec. Under some points I have noted a candidate approach to keep things concrete, but these are illustrative starting points, not decisions. Delivery modes are covered in their own section below. Please feel free to add anything I may have missed.
1. Where gating happens
When EmDash itself enforces the gate (Mode A in the "Delivery Modes" section), the enforcement should sit where content is fetched i.e. at the data layer, not in the template that renders it. If a post is protected, every path that can return its body or an excerpt of it must respect that: the rendered page, the content API, search results, and any feeds or other content surfaces a site exposes, including ones that plugins add. Gating only the rendered page leaves the rest open. (In Mode B in the "Delivery Modes" section, this does not apply in the same way, since EmDash hands full content plus markers to an external proxy that does the gating).
In practice this points to a reader-aware content fetch that returns the preview or the full body based on access, or a content gateway that every read passes through, so the decision is made once for all of those surfaces.
2. The authoring primitive
Publishers need a simple way to mark where the free preview ends and protected content begins. A single in-content marker (a "paywall break" block) can be the source of truth that drives previews, feeds, search, and SEO. Since EmDash content is Portable Text, this fits naturally as a paywall break block in the body: everything before it is the preview, everything after is protected. Ghost does the same thing today with a paywall card that marks the cut point.
3. Caching
The public preview of posts should be cacheable and shared freely, since it is the same for everyone. Unlocked or reader-specific responses must never enter a shared cache. Concretely, the anonymous preview could go out as public with an s-maxage, while unlocked responses use private, no-store.
There also needs to be a clear way for invalidating cached responses when a post changes, which could hang off the existing content lifecycle hooks. Two cases matter, for different reasons:
4. The entitlement contract (the decision)
This is the decision layer of the gate. Core needs a clear interface it can call to ask "can this reader see this?" and get back allow, deny, or preview. The plugin or external provider makes the decision and supplies the reader identity, since core has no reader accounts of its own. Core enforces the result. When the answer is unclear or the check fails, the safe default is to keep content locked.
There are two situations to support, and the difference is just when the reader's access becomes known:
5. Delivering protected content without leaking it (the mechanism)
This is the mechanism layer: Given a decision, how the protected content actually reaches the page. The rule underneath it is simple: protected content should never be sent hidden in the initial response and revealed with client code.
When the reader is already known at request time (a session cookie the server understands), Astro Server Islands (server:defer) look like a good fit: the page shell and preview stay cacheable while the protected region is rendered per request on the server behind the access check. EmDash does not use Server Islands today, so this would be new ground, but it is built for this shape of problem.
When the reader only signs in later in the browser (the Piano case), server islands do not fit, because they fetch automatically on page load, before that sign-in exists. There are two options here: reload the page so the server re-renders it now that the reader is known (the simplest approach, and what Ghost does), or fetch just the protected fragment from a dedicated endpoint after sign-in completes (fetch-after-auth). Either way the server runs the same access check.
6. SEO and discoverability
To be discussed.
7. Abuse and rate limits
Even with the gate working, a reader with a valid account could use it to bulk-download the whole archive, so rate-limiting should make that impractical. This can lean on whatever the deployment already has at its CDN, proxy, or host layer, rather than being built from scratch.
8. Internationalization
Content is per locale. This needs a decision: whether "protected" is set per translation or shared across a translation group, so that a forgotten locale is not an accidental leak.
9. Editor experience
Inserting the paywall break easily (a slash command or toolbar button), and a warning when a protected post has no preview defined. There is also an open question about what "preview" should show for a paywalled post: the full content as the authenticated editor sees it, the anonymous non-subscriber view (the free portion plus the wall), or both via a toggle. This is a UX choice rather than a security gate, since the editor is already authenticated, so showing them the anonymous view is showing less, not a bypass.
Delivery modes
Two delivery models are widely used, and I think core should support both and let the publisher pick the one that matches their setup.
Because these are different models, core needs to let the publisher select the delivery mode rather than assume one.
Next Steps
This is a high-level idea, and I'd love the community's input on this proposal. Is anything missing in the considerations above? Are there delivery modes that real publishers depend on that aren't covered? Are there better ways to scope what belongs in core versus what belongs to plugins and providers? Suggestions, and ideas from the maintainers, community, plugin authors, SEO experts and anyone who has run a paywalled site would be very welcome.
Beta Was this translation helpful? Give feedback.
All reactions