-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add XiGua Videos.js 西瓜视频网页版 #259
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我审查了一下
|
||
function doDebug() { | ||
scrapeProfile('https://www.ixigua.com/home/6519449097/', ''); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是有testCases吗,为什么安排这个函数?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是调试用的,因为结果输出还不理想,注册函数也要确认是否有效果(之前单独写在这里)。想过去掉但留着了,不然别人还得找适合的页面。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
测试用的那就注释起来吧
|
||
function detectWeb(doc, url) { | ||
//doDebug(); | ||
if (doc.querySelectorAll('.videoTitle').length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里只是判断元素是否存在,用querySelector
就好了,它找不到会返回undefined
,自然就是一个假值。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有道理。找不到是返回null的样子。
编写时印象中遇到querySelectorAll找不到(结果[])但是通过,没搞清原因。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
querySelectorAll找不到返回空NodeList,空的NodeList不是一个假值,详见MDN
if (doc.querySelectorAll('.videoTitle').length > 0) { | ||
return 'videoRecording'; | ||
} | ||
else if (doc.querySelectorAll('.teleplayPage__Description__header>h1').length > 0) { // FIXME: 支持不完整 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实我比较建议Selector内不同层级符号之间带空格,比如a > div
这样子,然后如果是同层级就不带空格,比如div.content+p
,这样可读性会好一点。那个空格不会和a p
这种空格关系符号混淆的,放心使用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯。是否空格随手写的。
return 'videoRecording'; | ||
} | ||
else if (doc.querySelectorAll('.teleplayPage__Description__header>h1').length > 0) { // FIXME: 支持不完整 | ||
return 'film'; // or tvBroadcast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
真正的电影才使用电影
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
非自媒体视频的,没特地区分,这里算是赶工随手写的。
网站上也有电影。https://www.ixigua.com/6532729716636385806
/** | ||
* @param {Date} date | ||
*/ | ||
function beautifyDate(date) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有现成的ZU.strToISO()
和ZU.strToDate()
,不知道是不是也能满足你的需求
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strToDate感觉不适合。不太想要dateToISO的UTC日期,收集后差8小时看着别扭。页面源码的meta name="op:video:release_date"有ISO日期时间。
不过这处代码我没考虑到跨时区的兼容性。
let pname = text(p, '.co-creator-list__item-name'); | ||
let proleLabel = text(p, '.co-creator-list__item-role'); | ||
let prole; | ||
switch (proleLabel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种
- 有明确映射关系
- 映射关系简单
的情形,我建议用对象的键值来反映这种关系,可以更简洁,至于默认值的问题,可以先附默认值,然后用 xxx = yyy ? zzz : www
这样的语句进行修补
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
写map确实可以,不过当前意图也简洁明了吧,还方便加额外处理。只写了我找到的几例。
|
||
let userinfo = ''; | ||
const originDomain = new URL(doc.location.href).origin; | ||
if (text(doc, '.co-creator-list')) { // 多人共创 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不妨使用逗号分隔的selector,无论多人还是单人都以数组对待,都遍历,就不用分成两个重复度比较高的子句来写了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为显示的信息乃至结构可能有差别,所以单独处理了。合并的话,如果网页有调整,适配难度可能变高。
} | ||
|
||
async function scrape(doc, url = doc.location.href) { | ||
let itemType = "videoRecording"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里把itemType
写死的话,那detectWeb()
就白忙活了呀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这算赶工,非自媒体的没做完善支持……想着占个坑,因为可能三分钟热度+重复作业+仅自用。
newItem.studio = text(doc, '.author__userName .user__name'); | ||
newItem.runningTime = formatRunningTime(attr(doc, 'meta[name="op:video:duration"]', 'content')); | ||
let viewStat = text(doc, '.videoDesc__videoStatics'); | ||
if (viewStat.includes('条弹幕 ·')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可用正则匹配(或替换)避免重复
newItem.url = url; | ||
|
||
let title = text(doc, '.videoTitle>h1').trim() || text(doc, '.teleplayPage__Description__header>h1').trim(); | ||
if (!title) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要在scrape
里面返回什么东西来中断函数执行,没有标题就应当引发错误
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiaojiaodubai , 新插件写脚本遇到一些问题想请教下, 能否留个邮箱
b349520
to
f406995
Compare
另:西瓜视频如果有类似 B 站的专栏或者图片动态的话,可能需要适配一下 |
西瓜视频前途未卜,被引导迁移到抖音的创作者账号在西瓜视频网页中已停止更新。 |
getSearchResults 参考了Baidu Scholar.js
ttwid 注册参考了 https://zhuanlan.zhihu.com/p/342436610