Skip to content

Commit

Permalink
docs: ex 爬虫思路
Browse files Browse the repository at this point in the history
  • Loading branch information
orzyyyy committed Jun 13, 2019
1 parent c635a56 commit f47d424
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
12 changes: 11 additions & 1 deletion src/assets/mapping.json
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,22 @@
},
{
"id": "57ca721de1c14fb189d8bc5f6e14448c",
"title": "ex 改 cookie 步骤备忘",
"title": "改 cookie 步骤",
"url": "./assets/mapping/57ca721de1c14fb189d8bc5f6e14448c.json",
"createTime": 1560220086082,
"modifyTime": 1560220318066,
"type": "生活",
"subType": "ExHentai",
"category": "markdown"
},
{
"id": "56f14e6c24b6ece2e63bdf56accdef16",
"title": "爬虫思路",
"url": "./assets/mapping/56f14e6c24b6ece2e63bdf56accdef16.json",
"createTime": 1560392150068,
"modifyTime": 1560392654377,
"type": "生活",
"subType": "ExHentai",
"category": "markdown"
}
]
6 changes: 6 additions & 0 deletions src/assets/markdown/56f14e6c24b6ece2e63bdf56accdef16.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
1.`puppeteer` 绕过网站复杂的异步请求
2. ex 本身只需要有效 cookie 就可以进入,不需要额外信息。而这个 cookie 看上去没有敏感信息,所以不需要提前登录 e 站获得 cookie,只需要模拟一个即可
3. 所以 cookie 直接写在配置文件里
4. 每个网页打开的最短间隔大约是 3 秒,但如果数量多就得调到 5 秒
5. 服务每次爬取 40 页,也就是 1000 个漫画。因为数量巨大,图片本身需要在浏览时加载,这里需要懒加载
6. 详情页下载时,会先获得所有地址再一次下完,中间有一张失败就会返工

0 comments on commit f47d424

Please sign in to comment.