a web crawler for single WordPress site
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.settings
bin/net/johnhany/wpcrawler
lib
src/net/johnhany/wpcrawler
.classpath
.project
README.md
result-2013-11-29.txt

README.md

WPCrawler

针对单个WordPress网站的网络爬虫程序

使用的开源类库如下:

Apache HttpComponents 4.3

HTML Parser 2.0

MySQL Connector/J 5.1.27

使用UTF-8编码以记录中文标签

使用XAMPP默认MySQL端口localhost:3306

需要本地XAMPP环境

下一次更新会加入统计每篇文章所使用的标签的功能

可以在我的博客内阅读详细原理:

http://johnhany.net/2013/11/web-crawler-using-java-and-mysql/

(博客空间是新近开通的,如果访问时出现问题烦请告知,我会想办法解决)

=========

a web crawler for single WordPress site

open source projects that I am using:

Apache HttpComponents 4.3

HTML Parser 2.0

MySQL Connector/J 5.1.27

Need XAMPP environment.

The program assume that there is a database called "crawler" in your localhost with port 3306.

Analyzing tags for each article will be added in the next update.

You can read about this in my blog:

http://johnhany.net/2013/11/web-crawler-using-java-and-mysql/

My blog is new and yet unstable. If you have any problems entering my blog, please notify me:)