Skip to content

Universal content reader — fetch, normalize, and digest content from 7+ platforms (WeChat, Telegram, X, YouTube, Bilibili, Xiaohongshu, RSS)

License

Notifications You must be signed in to change notification settings

outer-snow/x-reader

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

x-reader

Python 3.10+ License: MIT

Universal content reader — fetch, transcribe, and digest content from any platform.

Give it a URL (article, video, podcast, tweet), get back structured content. Works as CLI, Python library, MCP server, or Claude Code skills.

What It Does

Any URL → Platform Detection → Fetch Content → Unified Output
              ↓                      ↓
         auto-detect           text: Jina Reader
         7+ platforms          video: yt-dlp subtitles
                               audio: Whisper transcription
                               API: Bilibili / RSS / Telegram

The Python layer handles text fetching and YouTube subtitle extraction. The Claude Code skills (optional) add full Whisper transcription for video/podcast and AI-powered content analysis.

Three Layers

x-reader is composable. Use the layers you need:

Layer What Format Install
Python CLI/Library Basic content fetching + unified schema See Install Required
Claude Code Skills Video transcription + AI analysis Copy skills/ to ~/.claude/skills/ Optional
MCP Server Expose reading as MCP tools python mcp_server.py Optional

Layer 1: Python CLI

# Fetch any URL
x-reader https://mp.weixin.qq.com/s/abc123

# Fetch a tweet
x-reader https://x.com/elonmusk/status/123456

# Fetch multiple URLs
x-reader https://url1.com https://url2.com

# Login to a platform (one-time, for browser fallback)
x-reader login xhs

# View inbox
x-reader list

Layer 2: Claude Code Skills

Requires cloning the repo (not included in pip install).

For video/podcast transcription and content analysis:

skills/
├── video/       # YouTube/Bilibili/podcast → full transcript via Whisper
└── analyzer/    # Any content → structured analysis report

Install:

cp -r skills/video ~/.claude/skills/video
cp -r skills/analyzer ~/.claude/skills/analyzer

Then in Claude Code, just send a YouTube/Bilibili/podcast link — the video skill auto-triggers and produces a full transcript + summary.

Layer 3: MCP Server

Requires cloning the repo (mcp_server.py is not included in pip install).

git clone https://github.com/runesleo/x-reader.git
cd x-reader
pip install -e ".[mcp]"
python mcp_server.py

Tools exposed:

  • read_url(url) — fetch any URL
  • read_batch(urls) — fetch multiple URLs concurrently
  • list_inbox() — view previously fetched content
  • detect_platform(url) — identify platform from URL

Claude Code config (~/.claude/claude_desktop_config.json):

{
    "mcpServers": {
        "x-reader": {
            "command": "python",
            "args": ["/path/to/x-reader/mcp_server.py"]
        }
    }
}

Supported Platforms

Platform Text Fetch Video/Audio Transcript
YouTube ✅ Jina ✅ yt-dlp subtitles → Groq Whisper fallback
Bilibili (B站) ✅ API ✅ via Claude Code skill
X / Twitter ✅ Jina → Playwright
WeChat (微信公众号) ✅ Jina → Playwright
Xiaohongshu (小红书) ✅ Jina → Playwright*
Telegram ✅ Telethon
RSS ✅ feedparser
小宇宙 (Xiaoyuzhou) ✅ via Claude Code skill
Apple Podcasts ✅ via Claude Code skill
Any web page ✅ Jina fallback

*XHS requires a one-time login: x-reader login xhs (saves session for Playwright fallback)

YouTube Whisper transcription requires GROQ_API_KEY — get a free key from Groq

Install

# From GitHub (recommended)
pip install git+https://github.com/runesleo/x-reader.git

# With Telegram support
pip install "x-reader[telegram] @ git+https://github.com/runesleo/x-reader.git"

# With browser fallback (Playwright — for XHS/WeChat anti-scraping)
pip install "x-reader[browser] @ git+https://github.com/runesleo/x-reader.git"
playwright install chromium

# With all optional dependencies
pip install "x-reader[all] @ git+https://github.com/runesleo/x-reader.git"
playwright install chromium

Or clone and install locally:

git clone https://github.com/runesleo/x-reader.git
cd x-reader
pip install -e ".[all]"
playwright install chromium

Dependencies for video/audio (optional)

# macOS
brew install yt-dlp ffmpeg

# Linux
pip install yt-dlp
apt install ffmpeg

For Whisper transcription, get a free API key from Groq and set:

export GROQ_API_KEY=your_key_here

Use as Library

import asyncio
from x_reader.reader import UniversalReader

async def main():
    reader = UniversalReader()
    content = await reader.read("https://mp.weixin.qq.com/s/abc123")
    print(content.title)
    print(content.content[:200])

asyncio.run(main())

Configuration

Copy .env.example to .env:

cp .env.example .env
Variable Required Description
TG_API_ID Telegram only From https://my.telegram.org
TG_API_HASH Telegram only From https://my.telegram.org
GROQ_API_KEY Whisper only From https://console.groq.com/keys (free)
INBOX_FILE No Path to inbox JSON (default: ./unified_inbox.json)
OUTPUT_DIR No Directory for Markdown output (default: disabled)
OBSIDIAN_VAULT No Path to Obsidian vault (writes to 01-收集箱/x-reader-inbox.md)

Architecture

x-reader/
├── x_reader/              # Python package
│   ├── cli.py             # CLI entry point
│   ├── reader.py          # URL dispatcher (UniversalReader)
│   ├── schema.py          # Unified data model (UnifiedContent + Inbox)
│   ├── login.py           # Browser login manager (saves sessions)
│   ├── fetchers/
│   │   ├── jina.py        # Jina Reader (universal fallback)
│   │   ├── browser.py     # Playwright headless (anti-scraping fallback)
│   │   ├── bilibili.py    # Bilibili API
│   │   ├── youtube.py     # yt-dlp subtitle extraction
│   │   ├── rss.py         # feedparser
│   │   ├── telegram.py    # Telethon
│   │   ├── twitter.py     # Jina-based
│   │   ├── wechat.py      # Jina → Playwright fallback
│   │   └── xhs.py         # Jina → Playwright + session fallback
│   └── utils/
│       └── storage.py     # JSON + Markdown dual output
├── skills/                # Claude Code skills
│   ├── video/             # Video/podcast → transcript + summary
│   └── analyzer/          # Content → structured analysis
├── mcp_server.py          # MCP server entry point
└── pyproject.toml

How the Layers Work Together

User sends URL
    │
    ├─ Text content (article, tweet, WeChat)
    │   └─ Python fetcher → UnifiedContent → inbox
    │
    ├─ Video (YouTube, Bilibili, X video)
    │   ├─ Python fetcher → metadata (title, description)
    │   └─ Video skill → full transcript via subtitles/Whisper
    │
    ├─ Podcast (小宇宙, Apple Podcasts)
    │   └─ Video skill → full transcript via Whisper
    │
    └─ Analysis requested
        └─ Analyzer skill → structured report + action items

Author

Built by @runes_leo — more AI tools at leolabs.me

License

MIT

About

Universal content reader — fetch, normalize, and digest content from 7+ platforms (WeChat, Telegram, X, YouTube, Bilibili, Xiaohongshu, RSS)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%