Fix clipping for X/Twitter articles (longform)#124
Merged
kepano merged 16 commits intokepano:mainfrom Feb 14, 2026
Merged
Conversation
extracts title, author, and content from x articles using data-testid selectors. upgrades image urls to high quality. registered before twitter extractor to take priority. fixes obsidianmd/obsidian-clipper#666
…le extractor - convert embedded tweets to clean blockquotes with author/text only - extract code blocks with proper language class - flatten nested header content - unwrap images from direct anchor wrappers
This reverts commit 27c19bb.
This reverts commit 60bb5a0.
This reverts commit 9c7a732.
X articles can be accessed via both /article/ and /status/ URLs. the DOM-based canExtract() check ensures we only extract actual article content, not regular tweets.
…ractors domains like x.com have multiple content types (tweets vs articles) that need different extractors. the cache now checks canExtract() before returning and falls back to searching if the cached extractor can't handle the content.
add realistic dom nesting with DraftEditor, embedded tweets, code blocks, headers, and images matching actual x.com articles
- remove /status/ patterns from extractor registry and x-article extractor - replace real tweet data in test fixtures with lorem ipsum mock data - delete x.com-status-article test fixture
|
Would love to see this merged. Been reading a lot of X Articles recently but Obsidian only saves links for now. |
|
Second. This would be huge! |
|
This is highly needed @kepano 🙏 |
|
looking forward to seeing this PR merged @kepano |
|
+1 @kepano to have clipper support the X articles. A must have. |
This was referenced Feb 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
I realized after the fact that this PR was created beforehand: #89
Summary
This PR adds support for extracting content from X/Twitter Articles (longform content). Currently, the Obsidian Web Clipper only captures the banner image from articles because the existing
TwitterExtractordoesn't handle the article-specific DOM structure.Key changes:
XArticleExtractorclass that handles[data-testid="twitterArticleRichTextView"]containerscanExtract()ensures it only activates for article pages, not regular tweetsExtractorRegistrywhere same-domain multi-extractors weren't properly differentiatedExtracted content includes:
[data-testid="twitter-article-title"].longform-unstyled,.public-DraftStyleDefault-block)&name=large)Related Issues
/article/Testing
/article/URL patternTest Plan