Skip to content

Fix clipping for X/Twitter articles (longform)#124

Merged
kepano merged 16 commits intokepano:mainfrom
davidgoldcode:feat/x-article-extractor
Feb 14, 2026
Merged

Fix clipping for X/Twitter articles (longform)#124
kepano merged 16 commits intokepano:mainfrom
davidgoldcode:feat/x-article-extractor

Conversation

@davidgoldcode
Copy link
Copy Markdown
Contributor

@davidgoldcode davidgoldcode commented Jan 22, 2026

Note

I realized after the fact that this PR was created beforehand: #89

Summary

This PR adds support for extracting content from X/Twitter Articles (longform content). Currently, the Obsidian Web Clipper only captures the banner image from articles because the existing TwitterExtractor doesn't handle the article-specific DOM structure.

Key changes:

  • New XArticleExtractor class that handles [data-testid="twitterArticleRichTextView"] containers
  • DOM-based canExtract() ensures it only activates for article pages, not regular tweets
  • Extracts title, author, paragraphs, embedded tweets, code blocks, headers, and images
  • Preserves formatting (bold, links, code) within paragraphs
  • Fixes caching bug in ExtractorRegistry where same-domain multi-extractors weren't properly differentiated

Extracted content includes:

  • Title from [data-testid="twitter-article-title"]
  • Author name and handle from schema.org metadata
  • Paragraphs from Draft.js editor structure (.longform-unstyled, .public-DraftStyleDefault-block)
  • Embedded tweets as blockquotes
  • Code blocks with language preservation
  • Images with quality upgrade (&name=large)

Related Issues

Testing

  • Added test fixture for /article/ URL pattern
  • All existing tests pass
  • Verified extraction works with realistic HTML structure matching live X Articles

Test Plan

  • Clip an X Article using Obsidian Web Clipper after updating Defuddle dependency
  • Verify title, author, and all paragraphs are captured
  • Verify embedded tweets appear as blockquotes
  • Verify code blocks preserve language specifier
  • Verify images are included with high-quality URLs

extracts title, author, and content from x articles using
data-testid selectors. upgrades image urls to high quality.
registered before twitter extractor to take priority.

fixes obsidianmd/obsidian-clipper#666
…le extractor

- convert embedded tweets to clean blockquotes with author/text only
- extract code blocks with proper language class
- flatten nested header content
- unwrap images from direct anchor wrappers
X articles can be accessed via both /article/ and /status/ URLs.
the DOM-based canExtract() check ensures we only extract actual
article content, not regular tweets.
…ractors

domains like x.com have multiple content types (tweets vs articles) that need
different extractors. the cache now checks canExtract() before returning and
falls back to searching if the cached extractor can't handle the content.
add realistic dom nesting with DraftEditor, embedded tweets,
code blocks, headers, and images matching actual x.com articles
- remove /status/ patterns from extractor registry and x-article extractor
- replace real tweet data in test fixtures with lorem ipsum mock data
- delete x.com-status-article test fixture
@davidgoldcode davidgoldcode changed the title add XArticleExtractor for X/Twitter longform articles Fix clipping for X/Twitter articles (longform) Jan 22, 2026
@deadcoder0904
Copy link
Copy Markdown

Would love to see this merged. Been reading a lot of X Articles recently but Obsidian only saves links for now.

@odysseus0
Copy link
Copy Markdown

Second. This would be huge!

@Railly
Copy link
Copy Markdown

Railly commented Feb 1, 2026

This is highly needed @kepano 🙏

@Whyjsee
Copy link
Copy Markdown

Whyjsee commented Feb 2, 2026

looking forward to seeing this PR merged @kepano

@likidu
Copy link
Copy Markdown

likidu commented Feb 14, 2026

+1 @kepano to have clipper support the X articles. A must have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants