Skip to content

jadbin/gspider

Repository files navigation

gspider

image

image

image

A web spider framework based on gevent and requests.

Spider Example

以下是我们的一个爬虫类示例,其作用为爬取 百度新闻 的热点要闻:

from gspider import Spider, HttpRequest, run_spider, Selector


class BaiduNewsSpider(Spider):
    def start_requests(self):
        yield HttpRequest("http://news.baidu.com/")

    def parse(self, response):
        selector = Selector(response.text)
        hot = selector.css("div.hotnews a").text
        self.log("Hot News:")
        for i in range(len(hot)):
            self.log("%s: %s", i + 1, hot[i])


if __name__ == '__main__':
    run_spider(BaiduNewsSpider)

在爬虫类中我们定义了一些方法:

  • start_requests: 返回爬虫初始请求。
  • parse: 处理请求得到的页面,这里借助 Selector 及CSS Selector语法提取到了我们所需的数据。

Documentation

http://gspider.readthedocs.io/

About

A web spider framework based on gevent and requests

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published