feat(douyin): add search command for keyword video search#1759
Open
Daily-AC wants to merge 3 commits into
Open
feat(douyin): add search command for keyword video search#1759Daily-AC wants to merge 3 commits into
Daily-AC wants to merge 3 commits into
Conversation
DOM extraction from www.douyin.com/search/<q>?type=video. Requires logged-in profile. plays/comments/shares exposed as 0 (card markup only surfaces likes); see Follow-ups for full-counter path. Schema aligned with tiktok search. Refs Daily-AC/omnireach#12
d381fb2 to
d76c4d9
Compare
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The douyin adapter ships 13 subcommands today (
activities,hashtag,publish,stats, ...) but no keyword video search.hashtag search --keywordreturns topic/challenge rows, not videos, so anything that wants "find me douyin videos for a query" has nowhere to go. This came up while wiring omnireach — its multi-source search needed a tiktok-mirror command for the China-region equivalent (see issue Daily-AC/omnireach#12).This PR adds
opencli douyin search <query>againstwww.douyin.com.Output shape (tiktok-aligned)
{"rank": 1, "desc": "...", "author": "...", "url": "...", "plays": 0, "likes": 0, "comments": 0, "shares": 0}Same eight columns as
tiktok search, same order. Downstream tools that already normalize tiktok search rows can ingest douyin rows without per-adapter mapping.Prerequisites
AuthRequiredError. Log in once via Chrome and the cookies persist.Implementation strategy: DOM extraction (post-rewrite)
The initial draft used
Strategy.INTERCEPTto capture the SPA's signed/aweme/v1/web/general/search/single/XHR. Live testing on a logged-in profile revealed this never works:wait xhr "general/search/single"times out at 20s even though[data-e2e="scroll-list"] lihas 16 rendered result cards in the DOM.device_platform=webapp,aid=6383,pc_client_type=1,version_code, etc.) returnsstatus_code: 0, data: [], search_nil_info.search_nil_type: "verify_check"— the endpoint validates ana_bogussignature that lives in the SPA bundle, and bare URLs miss it.<ul data-e2e="scroll-list">server-side during initial navigation. The data is in the HTML at navigation time; no client-side fetch is required.Final approach (
clis/douyin/search.js):https://www.douyin.com/search/<urlencoded>?type=video.page.evaluatea MutationObserver-backed waiter that resolves to one of{state: 'rendered', cards}/{state: 'login_wall'}/{state: 'timeout'}within a 15s budget.<li>inside[data-e2e="scroll-list"], harvest the row's<a href*="/video/">and the ordered list of leaf-elementtextContents. The classnames are obfuscated and churn between Douyin builds, so we pin onlydata-e2eselectors and identify fields by shape:^\d{1,2}:\d{2}(:\d{2})?$→ duration (skipped)^\d+(\.\d+)?[万亿]?$→likes(withparseDouyinCounthandling 万/亿)@, followed by →authordescplays/comments/sharesstay0per the caveat above.This survives classname obfuscation, doesn't need request signing, and matches the pattern xiaohongshu/rednote use for the same problem class.
Live test evidence: VERIFIED on logged-in profile
5 queries ×
--limit 5, all return 5 well-formed rows. Top-row samples (full JSON arrays validated locally; trimming to 1-2 rows per query for review):Query 1:
opencli douyin search "AI 编程" --limit 5 -f json[ {"rank": 1, "desc": "有了AI编程程序员还有出路吗? #科技科普 #计算机 #AI #编程 #程序员", "author": "英雄编程", "url": "https://www.douyin.com/video/7573894888656293163", "plays": 0, "likes": 17000, "comments": 0, "shares": 0}, {"rank": 4, "desc": "致全地球人,AI编程速成指南,代码改变命运! #AI在抖音 #AI编程 #编程入门#编程学习 #计算机专业", "author": "数字游牧人Samuel", "url": "https://www.douyin.com/video/7462667529274592547", "plays": 0, "likes": 35000, "comments": 0, "shares": 0} ]Query 2:
opencli douyin search "claude code" --limit 5 -f json[ {"rank": 1, "desc": "全网最全!60分钟全面掌握Claude Code~ 【附完整文档】\n#AI #秋芝2046 #ClaudeCode #AI教程 #前沿科技趋势发布月", "author": "秋芝2046", "url": "https://www.douyin.com/video/7636497165430394162", "plays": 0, "likes": 40000, "comments": 0, "shares": 0}, {"rank": 3, "desc": "Claude Code 零基础终极教程! ... #ai新星计划 #claudecode #ai教程 #claude #智能体", "author": "木子不写代码", "url": "https://www.douyin.com/video/7636872072064470298", "plays": 0, "likes": 408000, "comments": 0, "shares": 0} ]Query 3:
opencli douyin search "周杰伦演唱会现场" --limit 5 -f json[ {"rank": 1, "desc": "#周杰伦甜甜的南宁 嘉年华2 南宁站 day2 I Do 4k超清#演唱会现场 #神级现场 #神级live现场 #live现场", "author": "SecretXSPEC", "url": "https://www.douyin.com/video/7640855326389263801", "plays": 0, "likes": 1671, "comments": 0, "shares": 0}, {"rank": 4, "desc": "周杰伦2010年超时代世界巡回演唱会", "author": "清心影视", "url": "https://www.douyin.com/video/7032189958270045470", "plays": 0, "likes": 11000, "comments": 0, "shares": 0} ]Query 4:
opencli douyin search "#美食探店" --limit 5 -f json(hashtag form — works the same as plaintext)[ {"rank": 1, "desc": "特厨探店|不惊艳,但是吃着超级舒服的家常菜—老太婆家常菜 #美食探店 #隋坡 #美食 #黄山", "author": "特厨隋坡(重新出发版)", "url": "https://www.douyin.com/video/7643888049474063631", "plays": 0, "likes": 96000, "comments": 0, "shares": 0}, {"rank": 2, "desc": "一锅三吃太夯了!被万象城这家贵州烙锅惊艳到了#美食探店#长沙美食#贵州烙锅湖南首店#长沙正宗贵州烙锅#黔寨寨贵州烙锅", "author": "Seven不七亏", "url": "https://www.douyin.com/video/7643712014329827950", "plays": 0, "likes": 6091, "comments": 0, "shares": 0} ]Query 5:
opencli douyin search "rust" --limit 5 -f json[ {"rank": 1, "desc": "【Rust腐蚀】有史以来最棒的开荒之旅! #steam游戏 #多人联机 #生存游戏", "author": "Bone骨头碎片", "url": "https://www.douyin.com/video/7472103372993105161", "plays": 0, "likes": 57000, "comments": 0, "shares": 0}, {"rank": 3, "desc": "Rust大讲堂 #1 开局的思路详解,新手必看,老手一起交流 #Rust #腐蚀 #代号#前哨 #新手教程", "author": "Jullseye", "url": "https://www.douyin.com/video/7481895809458474240", "plays": 0, "likes": 4552, "comments": 0, "shares": 0} ]Coverage observed: English ("rust", "claude code"), CJK ("AI 编程", "周杰伦演唱会现场"), hashtag form ("#美食探店"). No
verify_checkrejections, no holdouts, no anti-bot drift in this run. Like counts span 8 to 408k, confirming the 万/亿 parser handles both small-creator and viral rows.Tests
clis/douyin/search.test.jscovering: arg validation,parseSearchLimitbounds,parseDouyinCountover 万/亿/plain/empty/non-string,normalizeDouyinVideoUrlover scheme-relative/absolute/empty,projectCardover classname-agnostic leaf-text shapes (duration skip, like-count parsing, author-after-@ extraction, longest-text fallback for desc, fused@authorprefix stripping, safe defaults for missingleafTexts), full func() flow including login-wall mapping, timeout-to-AuthRequired mapping,{session, data}envelope unwrap, malformed-payload CommandExecutionError,--limitcap, and Chinese URL encoding.npm test(442 files / 4664 tests / 1 skipped): all pass.npm run typecheck: clean.opencli validate: PASS (14 douyin commands, 0 errors, 0 warnings).Files
clis/douyin/search.js(new) — DOM-extraction adapterclis/douyin/search.test.js(new) — 30 unit testscli-manifest.json— regenerated entry, column list[rank, desc, author, url, plays, likes, comments, shares]