Skip to content

Mojit000/gxrcwSpider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

广西人才网爬虫 gxrcwSpider

重要知识点

  1. Scrapy多页面抓取数据
  2. MongoDB保持抓取的数据

参考资料      

  1. Scrapy分页爬取四川大学公共管理学院全职教师信息及学院新闻
  2. Scrapy抓取在不同级别Request之间传递参数
  3. Scrapy官方文档:Requests and Responses
  4. Scrapy官方文档:Item pipeline
  5. PyMongo 3.4.0 documentation

Bug修复说明

# 错误的代码
def parse_jobs(self, response):
        jobs = response.css('div.rlOne')
        jobsItem = GxrcwItem()
        for job in jobs:
            # 抓取一部分数据
# 正确的代码
def parse_jobs(self, response):
        jobs = response.css('div.rlOne')
        for job in jobs:
            jobsItem = GxrcwItem()
            # 抓取一部分数据

Bug分析

About

广西人才网爬虫

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages