Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

另一个分类方法 #44

Closed
hsyysy opened this issue Sep 10, 2018 · 0 comments
Closed

另一个分类方法 #44

hsyysy opened this issue Sep 10, 2018 · 0 comments

Comments

@hsyysy
Copy link

hsyysy commented Sep 10, 2018

我有一个想法,仅供参考。
目前的分类机制是对于一个书名,有不同的提供网站。
但实际上每本书都有一个或几个官方发布网站,网站可以抓取到书的目录和每一章的前两段,那么
1.通过网友的共同劳动,像Wiki一样建立一个书名与官方发布页面地址的映射目录。
2.抓取官方发布的目录建立书籍的目录,然后通过每章前两段内容与其他网站内容的比对抽取完整章节。
而不用显示多个源。

可能存在以下问题
1.抓取官方目录和前两段可能涉嫌侵权。
2.文字对比可能正确率低。

@hsyysy hsyysy closed this as completed Sep 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant