Skip to content

Commit

Permalink
删除MV地址和全球时尚两个例子;优化爬去京东图片的例子;添加爬去推酷文章的例子;添加.gitignore文件
Browse files Browse the repository at this point in the history
  • Loading branch information
satrong committed Jul 8, 2016
1 parent 8d5d5e6 commit 24eef8c
Show file tree
Hide file tree
Showing 5 changed files with 52 additions and 4 deletions.
31 changes: 31 additions & 0 deletions .gitignore
@@ -0,0 +1,31 @@
*.log
npm-debug.log*

# Runtime data
pids
*.pid
*.seed

# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov

# Coverage directory used by tools like istanbul
coverage

# Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
.grunt

# node-waf configuration
.lock-wscript

# Compiled binary addons (http://nodejs.org/api/addons.html)
build/Release

# Dependency directory
node_modules

# Optional npm cache directory
.npm

# Optional REPL history
.node_repl_history
1 change: 0 additions & 1 deletion config/MV地址

This file was deleted.

1 change: 0 additions & 1 deletion config/全球时尚

This file was deleted.

17 changes: 17 additions & 0 deletions config/推酷文章
@@ -0,0 +1,17 @@
{
"url" : "http://www.tuicool.com/ah",
"type" : "text",
"from" : "1",
"to" : "2",
"charset" : "utf8",
"saveDir" : "e:/tuicool",
"selector" : [{
"$" : "$(\"div.single_fake\").find(\"a.article-list-title\")",
"attr" : "href"
}, {
"$" : "$(\"div.article_body\")"
}
],
"isPagination" : 1,
"mode" : "web"
}
6 changes: 4 additions & 2 deletions lib/crawler.js
Expand Up @@ -71,8 +71,10 @@ Crawler.prototype.crawl = function () {
var $$ = eval(item.$);
$$.each(function () {
var nextUrl = $(this).attr(item.attr);
if (!/^http:\/\//i.test(nextUrl)) {
nextUrl = rootsite + nextUrl;
if(/^\/{2}[^\/]+/.test(nextUrl)){
nextUrl = "http:" + nextUrl;
} else if (!/^http:\/\//i.test(nextUrl)) {
nextUrl = rootsite + nextUrl.replace(/^\/+/,'/');
}
urlLevels[0].push(nextUrl);
});
Expand Down

0 comments on commit 24eef8c

Please sign in to comment.