Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add XiGua Videos.js 西瓜视频网页版 #259

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yfdyh000
Copy link
Contributor

@yfdyh000 yfdyh000 commented Dec 15, 2023

getSearchResults 参考了Baidu Scholar.js
ttwid 注册参考了 https://zhuanlan.zhihu.com/p/342436610

Copy link
Collaborator

@jiaojiaodubai jiaojiaodubai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我审查了一下


function doDebug() {
scrapeProfile('https://www.ixigua.com/home/6519449097/', '');
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是有testCases吗,为什么安排这个函数?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是调试用的,因为结果输出还不理想,注册函数也要确认是否有效果(之前单独写在这里)。想过去掉但留着了,不然别人还得找适合的页面。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测试用的那就注释起来吧


function detectWeb(doc, url) {
//doDebug();
if (doc.querySelectorAll('.videoTitle').length > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里只是判断元素是否存在,用querySelector就好了,它找不到会返回undefined,自然就是一个假值。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有道理。找不到是返回null的样子。
编写时印象中遇到querySelectorAll找不到(结果[])但是通过,没搞清原因。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

querySelectorAll找不到返回空NodeList,空的NodeList不是一个假值,详见MDN

if (doc.querySelectorAll('.videoTitle').length > 0) {
return 'videoRecording';
}
else if (doc.querySelectorAll('.teleplayPage__Description__header>h1').length > 0) { // FIXME: 支持不完整
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实我比较建议Selector内不同层级符号之间带空格,比如a > div这样子,然后如果是同层级就不带空格,比如div.content+p,这样可读性会好一点。那个空格不会和a p这种空格关系符号混淆的,放心使用。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯。是否空格随手写的。

return 'videoRecording';
}
else if (doc.querySelectorAll('.teleplayPage__Description__header>h1').length > 0) { // FIXME: 支持不完整
return 'film'; // or tvBroadcast
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

真正的电影才使用电影

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

非自媒体视频的,没特地区分,这里算是赶工随手写的。
网站上也有电影。https://www.ixigua.com/6532729716636385806

/**
* @param {Date} date
*/
function beautifyDate(date) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有现成的ZU.strToISO()ZU.strToDate(),不知道是不是也能满足你的需求

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strToDate感觉不适合。不太想要dateToISO的UTC日期,收集后差8小时看着别扭。页面源码的meta name="op:video:release_date"有ISO日期时间。
不过这处代码我没考虑到跨时区的兼容性。

let pname = text(p, '.co-creator-list__item-name');
let proleLabel = text(p, '.co-creator-list__item-role');
let prole;
switch (proleLabel) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种

  1. 有明确映射关系
  2. 映射关系简单

的情形,我建议用对象的键值来反映这种关系,可以更简洁,至于默认值的问题,可以先附默认值,然后用 xxx = yyy ? zzz : www这样的语句进行修补

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写map确实可以,不过当前意图也简洁明了吧,还方便加额外处理。只写了我找到的几例。


let userinfo = '';
const originDomain = new URL(doc.location.href).origin;
if (text(doc, '.co-creator-list')) { // 多人共创
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不妨使用逗号分隔的selector,无论多人还是单人都以数组对待,都遍历,就不用分成两个重复度比较高的子句来写了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为显示的信息乃至结构可能有差别,所以单独处理了。合并的话,如果网页有调整,适配难度可能变高。

}

async function scrape(doc, url = doc.location.href) {
let itemType = "videoRecording";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里把itemType写死的话,那detectWeb()就白忙活了呀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这算赶工,非自媒体的没做完善支持……想着占个坑,因为可能三分钟热度+重复作业+仅自用。

newItem.studio = text(doc, '.author__userName .user__name');
newItem.runningTime = formatRunningTime(attr(doc, 'meta[name="op:video:duration"]', 'content'));
let viewStat = text(doc, '.videoDesc__videoStatics');
if (viewStat.includes('条弹幕 ·')) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可用正则匹配(或替换)避免重复

newItem.url = url;

let title = text(doc, '.videoTitle>h1').trim() || text(doc, '.teleplayPage__Description__header>h1').trim();
if (!title) return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要在scrape里面返回什么东西来中断函数执行,没有标题就应当引发错误

Copy link

@YaoLiMuMu YaoLiMuMu Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiaojiaodubai , 新插件写脚本遇到一些问题想请教下, 能否留个邮箱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants