Fix clipping for X/Twitter articles (longform) by davidgoldcode · Pull Request #124 · kepano/defuddle

davidgoldcode · 2026-01-22T03:26:31Z

Note

I realized after the fact that this PR was created beforehand: #89

Summary

This PR adds support for extracting content from X/Twitter Articles (longform content). Currently, the Obsidian Web Clipper only captures the banner image from articles because the existing TwitterExtractor doesn't handle the article-specific DOM structure.

Key changes:

New XArticleExtractor class that handles [data-testid="twitterArticleRichTextView"] containers
DOM-based canExtract() ensures it only activates for article pages, not regular tweets
Extracts title, author, paragraphs, embedded tweets, code blocks, headers, and images
Preserves formatting (bold, links, code) within paragraphs
Fixes caching bug in ExtractorRegistry where same-domain multi-extractors weren't properly differentiated

Extracted content includes:

Title from [data-testid="twitter-article-title"]
Author name and handle from schema.org metadata
Paragraphs from Draft.js editor structure (.longform-unstyled, .public-DraftStyleDefault-block)
Embedded tweets as blockquotes
Code blocks with language preservation
Images with quality upgrade (&name=large)

Related Issues

Testing

Added test fixture for /article/ URL pattern
All existing tests pass
Verified extraction works with realistic HTML structure matching live X Articles

Test Plan

Clip an X Article using Obsidian Web Clipper after updating Defuddle dependency
Verify title, author, and all paragraphs are captured
Verify embedded tweets appear as blockquotes
Verify code blocks preserve language specifier
Verify images are included with high-quality URLs

extracts title, author, and content from x articles using data-testid selectors. upgrades image urls to high quality. registered before twitter extractor to take priority. fixes obsidianmd/obsidian-clipper#666

…le extractor - convert embedded tweets to clean blockquotes with author/text only - extract code blocks with proper language class - flatten nested header content - unwrap images from direct anchor wrappers

This reverts commit 27c19bb.

This reverts commit 60bb5a0.

This reverts commit 9c7a732.

X articles can be accessed via both /article/ and /status/ URLs. the DOM-based canExtract() check ensures we only extract actual article content, not regular tweets.

…ractors domains like x.com have multiple content types (tweets vs articles) that need different extractors. the cache now checks canExtract() before returning and falls back to searching if the cached extractor can't handle the content.

add realistic dom nesting with DraftEditor, embedded tweets, code blocks, headers, and images matching actual x.com articles

- remove /status/ patterns from extractor registry and x-article extractor - replace real tweet data in test fixtures with lorem ipsum mock data - delete x.com-status-article test fixture

deadcoder0904 · 2026-01-22T07:27:58Z

Would love to see this merged. Been reading a lot of X Articles recently but Obsidian only saves links for now.

odysseus0 · 2026-01-25T19:50:25Z

Second. This would be huge!

Railly · 2026-02-01T19:20:40Z

This is highly needed @kepano 🙏

Whyjsee · 2026-02-02T06:01:56Z

looking forward to seeing this PR merged @kepano

likidu · 2026-02-14T00:20:48Z

+1 @kepano to have clipper support the X articles. A must have.

davidgoldcode added 14 commits January 21, 2026 21:01

add x article extractor for longform twitter/x content

f57cfa9

extracts title, author, and content from x articles using data-testid selectors. upgrades image urls to high quality. registered before twitter extractor to take priority. fixes obsidianmd/obsidian-clipper#666

add embedded tweet, code block, header, and image handling to x-artic…

668ef63

…le extractor - convert embedded tweets to clean blockquotes with author/text only - extract code blocks with proper language class - flatten nested header content - unwrap images from direct anchor wrappers

fix image extraction to unwrap from nested anchor structures

0ba3e04

preserve bold, links, and code formatting in draft paragraphs

8137559

add image caption support with figcaption

27c19bb

Revert "add image caption support with figcaption"

039c101

This reverts commit 27c19bb.

add data-lang attribute to code blocks for markdown language specifier

ee4bcb7

add fallback to set data-lang on raw code blocks with language class

85be1a5

remove ineffective code block language fixes

60bb5a0

Revert "remove ineffective code block language fixes"

9c7a732

This reverts commit 60bb5a0.

Reapply "remove ineffective code block language fixes"

a2d98b4

This reverts commit 9c7a732.

add /status/ URL pattern to XArticleExtractor

b88cef5

X articles can be accessed via both /article/ and /status/ URLs. the DOM-based canExtract() check ensures we only extract actual article content, not regular tweets.

update x article test fixture to match real html structure

9b2fe48

add realistic dom nesting with DraftEditor, embedded tweets, code blocks, headers, and images matching actual x.com articles

davidgoldcode mentioned this pull request Jan 22, 2026

BUG: When you click to save an article on Twitter, only images can be saved, and text cannot be saved. obsidianmd/obsidian-clipper#666

Closed

davidgoldcode added 2 commits January 21, 2026 22:38

restore package.json

4718ad0

remove /status/ URL support, use lorem ipsum test fixtures

186fe39

- remove /status/ patterns from extractor registry and x-article extractor - replace real tweet data in test fixtures with lorem ipsum mock data - delete x.com-status-article test fixture

davidgoldcode changed the title ~~add XArticleExtractor for X/Twitter longform articles~~ Fix clipping for X/Twitter articles (longform) Jan 22, 2026

kepano merged commit fede6d8 into kepano:main Feb 14, 2026

This was referenced Feb 14, 2026

feat: add support for extracting articles from X (formerly Twitter) #89

Closed

Fix title for X Articles #122

Closed

RKNST38 mentioned this pull request Feb 18, 2026

X Article Extractor does not capture article banner/hero images #129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix clipping for X/Twitter articles (longform)#124

Fix clipping for X/Twitter articles (longform)#124
kepano merged 16 commits intokepano:mainfrom
davidgoldcode:feat/x-article-extractor

davidgoldcode commented Jan 22, 2026 •

edited

Loading

Uh oh!

deadcoder0904 commented Jan 22, 2026

Uh oh!

odysseus0 commented Jan 25, 2026

Uh oh!

Railly commented Feb 1, 2026

Uh oh!

Whyjsee commented Feb 2, 2026

Uh oh!

likidu commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

davidgoldcode commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issues

Testing

Test Plan

Uh oh!

deadcoder0904 commented Jan 22, 2026

Uh oh!

odysseus0 commented Jan 25, 2026

Uh oh!

Railly commented Feb 1, 2026

Uh oh!

Whyjsee commented Feb 2, 2026

Uh oh!

likidu commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

davidgoldcode commented Jan 22, 2026 •

edited

Loading