GitHub - pplmx/Spider: pick data by multi-threading,and save to excel

pplmx / Spider Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

pick data by multi-threading,and save to excel

caoyu.info

1 star 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.idea		.idea
src		src
.gitignore		.gitignore
86c3b07f-cb04-47e1-853d-0663e87df8ce.xls		86c3b07f-cb04-47e1-853d-0663e87df8ce.xls
README.md		README.md
Spider.iml		Spider.iml
db67c068-174b-4a09-956c-d4ceee69b3a4.xls		db67c068-174b-4a09-956c-d4ceee69b3a4.xls
ecf40b82-845d-4f66-837b-0b556aaca816.xls		ecf40b82-845d-4f66-837b-0b556aaca816.xls
pom.xml		pom.xml
renovate.json		renovate.json

Repository files navigation

进行IP代理(未使用代理,http://www.xicidaili.com/ 找不到稳定可用的代理)
通过HttpClient获取到请求页面的String字符串
通过jsoup解析
(解析需要自己在页面查看源代码,分析DOM结构)
(通过使用jsoup的类似于css选择器的函数,获取元素,元素集,或者文本和属性值)
每一本书的值set进Book实体,并添加进List集合
获取页面底部的总页码数
循环创建线程(一个页面,一个线程)
List集合通过构造方法共享
运行结束后,应该获取到的是一个拥有所有页面的书的集合
根据score属性及num属性,实现Comparator接口,完成排序
遍历当前这个List集合,顺序为每个元素设置id属性
调用poi,遍历List,将每个元素按行写入excel文件

About

pick data by multi-threading,and save to excel

caoyu.info

Report repository

Releases

No releases published

Packages

No packages published

Contributors 3

Languages

Java 100.0%